Re: [PyCUDA] mac64 PyCUDA

2010-09-20 Thread Andreas Kloeckner
Hi Art, Min, Bryan,

On Mon, 20 Sep 2010 15:25:24 -0700, Art  wrote:
> Thanks for posting the fork. I used your modification to compiler.py (my
> original one was incorrect) and I built a 64-bit only version of pycuda and
> all tests under tests/ passed for the first time. I also was able to call
> cublas and cufft using something similar to parret [1].

Thanks very much for getting to the bottom of this pesky problem! (Or
that's at least what it seems like to me--right?) I've pulled both Min's
and Bryan's fixes into PyCUDA's git.

Thanks again,
Andreas



pgpkAgLdtOfjF.pgp
Description: PGP signature
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda


Re: [PyCUDA] mac64 PyCUDA

2010-09-20 Thread Art
On Mon, Sep 20, 2010 at 2:08 PM, MinRK  wrote:

> Okay, with a tiny tweak to compiler.compile, I have UB pycuda working in
> both 64 and 32-bit.  All I did was tell compile() to add '-m64' to options
> if it detects 64-bit mode, in the same way as Bryan's trick.
>
> I pushed a branch with the patch to my GitHub:
> http://github.com/minrk/pycuda, as well as a siteconf that works for the
> build.  There are still a few failures in test_gpuarray for 64-bit, but I
> don't know what causes them.
>
> On Mon, Sep 20, 2010 at 11:30, Art  wrote:
>
>>
>> On Mon, Sep 20, 2010 at 8:34 AM, Bryan Catanzaro <
>> catan...@eecs.berkeley.edu> wrote:
>>
>>> I think it should be changed to check to see if the Python interpreter is
>>> currently running in 32 bit mode, and then compile to match:
>>>
>>> if 'darwin' in sys.platform and sys.maxint == 2147483647:
>>># The Python interpreter is running in 32 bit mode on OS X
>>> if "-arch" not in conf["CXXFLAGS"]:
>>>  conf["CXXFLAGS"].extend(['-arch', 'i386', '-m32'])
>>>if "-arch" not in conf["LDFLAGS"]:
>>>  conf["LDFLAGS"].extend(['-arch', 'i386', '-m32'])
>>>
>>> Some people (myself included) have to run Python in 32-bit mode on 64-bit
>>> OS X for various compatibility reasons (currently including libcudart.dylib,
>>> which is only shipped as a 32-bit library).  Since the Python which Apple
>>> ships is compiled as a fat binary with both 32 and 64 bit versions, we can't
>>> know a priori what the right compiler flags are.
>>>
>>> - bryan
>>
>>
>> I have driver version 3.1.14 and:
>>
>> $ file /usr/local/cuda/lib/libcudart.dylib
>> /usr/local/cuda/lib/libcudart.dylib: Mach-O universal binary with 2
>> architectures
>> /usr/local/cuda/lib/libcudart.dylib (for architecture x86_64): Mach-O
>> 64-bit dynamically linked shared library x86_64
>> /usr/local/cuda/lib/libcudart.dylib (for architecture i386): Mach-O
>> dynamically linked shared library i386
>>
>> Doesn't that mean it's UB?
>>
>
> You are right, that's UB, I guess the change was made at 3.1, not 3.2.  In
> 3.0, they introduced UB for libcuda, but left all other libraries as i386.
>
>
>> I can build and test cudamat [1] (which uses ctypes to call libcudart and
>> libcublas) fine in 64-bit macports python though I haven't otherwise used
>> it. I had to make the following small change in it's Makefile:
>>
>> nvcc -O -m 64 -L/usr/local/cuda/lib --ptxas-options=-v --compiler-options
>> '-fPIC' -o libcudamat.so --shared cudamat.cu cudamat_kernels.cu -lcublas
>>
>> from:
>>
>> nvcc -O --ptxas-options=-v --compiler-options '-fPIC' -o libcudamat.so
>> --shared cudamat.cu cudamat_kernels.cu -lcublas
>>
>> I built pycuda as 64-bit and changed pycuda/compiler.py to pass --machine
>> 64 to nvcc and got examples/demo.py to run but the other examples and tests
>> had failures and would eventually hang my machine. I don't know enough to
>> fix this myself but I can try suggestions. Would using 3.2RC make a
>> difference?
>>
>
> If the rest of libraries are UB, as cudart is at 3.1, then 3.2 shouldn't
> make a difference from 3.1.x, I just knew that the switch was somewhere
> between 3.0 and 3.2, and it appears to have been at 3.1.
>
> -MinRK
>
>
>>
>> [1] http://code.google.com/p/cudamat/
>>
>> cheers,
>> art
>>
>
>>
Thanks for posting the fork. I used your modification to compiler.py (my
original one was incorrect) and I built a 64-bit only version of pycuda and
all tests under tests/ passed for the first time. I also was able to call
cublas and cufft using something similar to parret [1].

This is the siteconf.py I used hacked from someone's earlier efforts on this
list:

BOOST_INC_DIR = ['/opt/local/include']
BOOST_LIB_DIR = ['/opt/local/lib']
BOOST_COMPILER = 'gcc-mp-4.4'  # not sure
USE_SHIPPED_BOOST = False
BOOST_PYTHON_LIBNAME = ['boost_python-mt']
BOOST_THREAD_LIBNAME = ['boost_thread-mt']
CUDA_TRACE = False
CUDA_ENABLE_GL = False
CUDADRV_LIB_DIR = []
CUDADRV_LIBNAME = ['cuda']
CXXFLAGS = ['-arch', 'x86_64', '-m64', '-isysroot',
'/Developer/SDKs/MacOSX10.6.sdk']
LDFLAGS = ['-arch', 'x86_64', '-m64', '-isysroot',
'/Developer/SDKs/MacOSX10.6.sdk']

[1] http://www.mathcs.emory.edu/~yfan/PARRET/doc/index.html
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda


Re: [PyCUDA] mac64 PyCUDA

2010-09-20 Thread MinRK
Okay, with a tiny tweak to compiler.compile, I have UB pycuda working in
both 64 and 32-bit.  All I did was tell compile() to add '-m64' to options
if it detects 64-bit mode, in the same way as Bryan's trick.

I pushed a branch with the patch to my GitHub:
http://github.com/minrk/pycuda, as well as a siteconf that works for the
build.  There are still a few failures in test_gpuarray for 64-bit, but I
don't know what causes them.

On Mon, Sep 20, 2010 at 11:30, Art  wrote:

>
> On Mon, Sep 20, 2010 at 8:34 AM, Bryan Catanzaro <
> catan...@eecs.berkeley.edu> wrote:
>
>> I think it should be changed to check to see if the Python interpreter is
>> currently running in 32 bit mode, and then compile to match:
>>
>> if 'darwin' in sys.platform and sys.maxint == 2147483647:
>># The Python interpreter is running in 32 bit mode on OS X
>> if "-arch" not in conf["CXXFLAGS"]:
>>  conf["CXXFLAGS"].extend(['-arch', 'i386', '-m32'])
>>if "-arch" not in conf["LDFLAGS"]:
>>  conf["LDFLAGS"].extend(['-arch', 'i386', '-m32'])
>>
>> Some people (myself included) have to run Python in 32-bit mode on 64-bit
>> OS X for various compatibility reasons (currently including libcudart.dylib,
>> which is only shipped as a 32-bit library).  Since the Python which Apple
>> ships is compiled as a fat binary with both 32 and 64 bit versions, we can't
>> know a priori what the right compiler flags are.
>>
>> - bryan
>
>
> I have driver version 3.1.14 and:
>
> $ file /usr/local/cuda/lib/libcudart.dylib
> /usr/local/cuda/lib/libcudart.dylib: Mach-O universal binary with 2
> architectures
> /usr/local/cuda/lib/libcudart.dylib (for architecture x86_64): Mach-O
> 64-bit dynamically linked shared library x86_64
> /usr/local/cuda/lib/libcudart.dylib (for architecture i386): Mach-O
> dynamically linked shared library i386
>
> Doesn't that mean it's UB?
>

You are right, that's UB, I guess the change was made at 3.1, not 3.2.  In
3.0, they introduced UB for libcuda, but left all other libraries as i386.


> I can build and test cudamat [1] (which uses ctypes to call libcudart and
> libcublas) fine in 64-bit macports python though I haven't otherwise used
> it. I had to make the following small change in it's Makefile:
>
> nvcc -O -m 64 -L/usr/local/cuda/lib --ptxas-options=-v --compiler-options
> '-fPIC' -o libcudamat.so --shared cudamat.cu cudamat_kernels.cu -lcublas
>
> from:
>
> nvcc -O --ptxas-options=-v --compiler-options '-fPIC' -o libcudamat.so
> --shared cudamat.cu cudamat_kernels.cu -lcublas
>
> I built pycuda as 64-bit and changed pycuda/compiler.py to pass --machine
> 64 to nvcc and got examples/demo.py to run but the other examples and tests
> had failures and would eventually hang my machine. I don't know enough to
> fix this myself but I can try suggestions. Would using 3.2RC make a
> difference?
>

If the rest of libraries are UB, as cudart is at 3.1, then 3.2 shouldn't
make a difference from 3.1.x, I just knew that the switch was somewhere
between 3.0 and 3.2, and it appears to have been at 3.1.

-MinRK


>
> [1] http://code.google.com/p/cudamat/
>
> cheers,
> art
>

>
> ___
> PyCUDA mailing list
> PyCUDA@tiker.net
> http://lists.tiker.net/listinfo/pycuda
>
>
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda


Re: [PyCUDA] weird behavior with complex input to kernel

2010-09-20 Thread Andreas Kloeckner
On Fri, 17 Sep 2010 14:54:11 -0400, Yiyin Zhou  wrote:
> Hi,
> I was trying to pass some complex valued numbers to a kernel, but somehow it 
> messed up. Here is an example with GPUArray that can be reproduced on several 
> of our linux servers:
> 
> initialize...
> 
> import pycuda.gpuarray as gpuarray
> import numpy as np
> d_A = gpuarray.empty((1,128), np.complex64)
> d_A.fill(1+2j)
> d_A
> The result is correct
> 
> d_A.fill(np.complex64(1+2j))
> d_A
> the imaginary part of the resulting array are all zeros
> 
> It's not necessarily a problem with complex64, in some kernels complex64 is 
> correct, but complex128 is not.
> What could be the cause for that?

Not my fault:
http://projects.scipy.org/numpy/ticket/1617
(just reported)

Andreas



pgpTG4iNfdl0o.pgp
Description: PGP signature
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda


Re: [PyCUDA] bug in __iadd__ and __isub__ in gpuarray

2010-09-20 Thread Andreas Kloeckner
Hi Nicolas,

On Mon, 20 Sep 2010 18:21:28 +0200, Nicolas Barbey  
wrote:
> Thanks for this great package which ease so much the use of cuda !
> I think I found out a small bug in the current git version of the code.
> If you run the following code in ipython :
> 
> >>> from pycuda import autoinit
> >>> from pycuda import driver, compiler, gpuarray, tools
> >>> a = gpuarray.ones(16, dtype=float32)
> >>> a += 1

Thanks for the patch/report. Fixed in pycuda and pyopencl git. I didn't
use your patch because 'return self.__iadd__(-other)' results in two
kernel invocations vs. 1 for the way sub is currently implemented.

Andreas


pgpuNFnrlj9S6.pgp
Description: PGP signature
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda


Re: [PyCUDA] mac64 PyCUDA

2010-09-20 Thread Art
On Mon, Sep 20, 2010 at 8:34 AM, Bryan Catanzaro  wrote:

> I think it should be changed to check to see if the Python interpreter is
> currently running in 32 bit mode, and then compile to match:
>
> if 'darwin' in sys.platform and sys.maxint == 2147483647:
># The Python interpreter is running in 32 bit mode on OS X
> if "-arch" not in conf["CXXFLAGS"]:
>  conf["CXXFLAGS"].extend(['-arch', 'i386', '-m32'])
>if "-arch" not in conf["LDFLAGS"]:
>  conf["LDFLAGS"].extend(['-arch', 'i386', '-m32'])
>
> Some people (myself included) have to run Python in 32-bit mode on 64-bit
> OS X for various compatibility reasons (currently including libcudart.dylib,
> which is only shipped as a 32-bit library).  Since the Python which Apple
> ships is compiled as a fat binary with both 32 and 64 bit versions, we can't
> know a priori what the right compiler flags are.
>
> - bryan


I have driver version 3.1.14 and:

$ file /usr/local/cuda/lib/libcudart.dylib
/usr/local/cuda/lib/libcudart.dylib: Mach-O universal binary with 2
architectures
/usr/local/cuda/lib/libcudart.dylib (for architecture x86_64): Mach-O 64-bit
dynamically linked shared library x86_64
/usr/local/cuda/lib/libcudart.dylib (for architecture i386): Mach-O
dynamically linked shared library i386

Doesn't that mean it's UB? I can build and test cudamat [1] (which uses
ctypes to call libcudart and libcublas) fine in 64-bit macports python
though I haven't otherwise used it. I had to make the following small change
in it's Makefile:

nvcc -O -m 64 -L/usr/local/cuda/lib --ptxas-options=-v --compiler-options
'-fPIC' -o libcudamat.so --shared cudamat.cu cudamat_kernels.cu -lcublas

from:

nvcc -O --ptxas-options=-v --compiler-options '-fPIC' -o libcudamat.so
--shared cudamat.cu cudamat_kernels.cu -lcublas

I built pycuda as 64-bit and changed pycuda/compiler.py to pass --machine 64
to nvcc and got examples/demo.py to run but the other examples and tests had
failures and would eventually hang my machine. I don't know enough to fix
this myself but I can try suggestions. Would using 3.2RC make a difference?

[1] http://code.google.com/p/cudamat/

cheers,
art
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda


Re: [PyCUDA] SparseSolve.py example

2010-09-20 Thread Brian Menounos
Thanks Andreas - 

Yes, unfortunately the patterns does change (the code I run grows and shrinks 
mountain glaciers) so I think I'm out of luck with GPU processing here. I'll 
consider some other options.  Pycuda, nevertheless, is a great package - thanks 
everyone for your hard work developing and maintaining it. 

Cheers, 


-Original Message-
From: Andreas Kloeckner [mailto:li...@informa.tiker.net] 
Sent: Sunday, September 19, 2010 6:07 PM
To: Brian Menounos; pycuda@tiker.net
Subject: RE: [PyCUDA] SparseSolve.py example

Hi Brian,

On Tue, 7 Sep 2010 19:56:43 +, Brian Menounos  wrote:
> Hi Andreas - I realize you're pretty busy answering emails of late, so answer 
> when you can... 

Yeah, sorry. Pretty swamped ATM. I hope things will clear out a bit during the 
fall semester, but so far I don't see when that would be happening...

Btw, I cc'd the list on my reply. Hope you don't mind. Please keep them in the 
loop (for archival purposes) unless you're discussing something confidential.

> I've attached your SparseSolve.py examples tweaked to deal with two 
> pickled numpy arrays (1D and 2D) in order to try out pycuda's 
> conjugate gradient (cg) function.
> 
> I'm typically building sparse matrices and doing iterative cg calls as 
> part of a numerical model for mountain glaciation. I was hoping to 
> speed up the cg function within scipy by sending the task to my gpu. 
> However, what is clear is that much time is spent assembling the 
> packets (your PacketedSpMV() function) before execution of 
> solve_pkt_with_cg().
> 
> I need to execute cg for each time step of my model (typically 1-1.yr 
> steps for 10,000 yr integration) and this is the part of the model 
> where most time is spent. Any speed up here would be ideal.
> 
> However, the performance is about 20 times slower than if run on a 
> single cpu using scipy's cg function. I knew there would be some 
> overhead for reading/writing to the GPU, but I wasn't expecting this 
> much time in packet assembly. Am I wasting my time trying to do this 
> on a GPU? I apologize in advance for my deficit in GPU/parallel 
> coding!

Does the sparsity structure of the matrix change? If not, you could simply 
scatter the new entries into the existing data structure, which would be pretty 
fast (but would still require a little additional code on top of what's there).

If your structure *does* change and can't be predicted/generalized over 
somehow, then the present code is simply not for you. It spends a significant 
amount of (CPU!) time building, partitioning and transferring, under the 
assumption that this only happens during preprocessing. The actual CG and 
matrix-vector products are tuned to be fast. If you'd want to accommodate 
changing sparsity patterns, you'd have to GPU-ify assembly, but I don't think 
even cusp [1] does that.

Andreas

[1] http://code.google.com/p/cusp-library/


___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda


Re: [PyCUDA] mac64 PyCUDA

2010-09-20 Thread MinRK
Bryan,

Note that in 3.2, all files in cuda/lib are UB, including cudart (finally!).

I build fat-binary on OS-X (up to date 10.6.4, cuda 3.2, etc.), simply by
setting the usual arch flags in siteconf.py:

CXXFLAGS = ["-arch", "x86_64", "-arch", "i386"]
LDFLAGS = ["-arch", "x86_64", "-arch", "i386"]

The build succeeds just fine, but most functions don't actually work, and
I'm not well versed enough in the details to figure it out.  It seems that
all the memory/mempool related tests pass, but the rest don't. Perhaps
because something should be passed differently to nvcc when running 64-bit,
as opposed to 32?

-MinRK

On Mon, Sep 20, 2010 at 08:34, Bryan Catanzaro
wrote:

> I think it should be changed to check to see if the Python interpreter is
> currently running in 32 bit mode, and then compile to match:
>
> if 'darwin' in sys.platform and sys.maxint == 2147483647:
># The Python interpreter is running in 32 bit mode on OS X
> if "-arch" not in conf["CXXFLAGS"]:
>  conf["CXXFLAGS"].extend(['-arch', 'i386', '-m32'])
>if "-arch" not in conf["LDFLAGS"]:
>  conf["LDFLAGS"].extend(['-arch', 'i386', '-m32'])
>
> Some people (myself included) have to run Python in 32-bit mode on 64-bit
> OS X for various compatibility reasons (currently including libcudart.dylib,
> which is only shipped as a 32-bit library).  Since the Python which Apple
> ships is compiled as a fat binary with both 32 and 64 bit versions, we can't
> know a priori what the right compiler flags are.
>
> - bryan
>
>
>
> On Sep 19, 2010, at 5:55 PM, Andreas Kloeckner wrote:
>
> > On Thu, 16 Sep 2010 14:37:31 -0400, gerald wrong <
> psillymathh...@gmail.com> wrote:
> >> Looking at PyCUDA setup.py, I found this:
> >>
> >>if 'darwin' in sys.platform:
> >># prevent from building ppc since cuda on OS X is not compiled
> for
> >> ppc
> >># also, default to 32-bit build, since there doesn't appear to be
> a
> >># 64-bit CUDA on Mac yet.
> >>if "-arch" not in conf["CXXFLAGS"]:
> >>conf["CXXFLAGS"].extend(['-arch', 'i386', '-m32'])
> >>if "-arch" not in conf["LDFLAGS"]:
> >>conf["LDFLAGS"].extend(['-arch', 'i386', '-m32'])
> >>
> >> Since 64bit CUDA support on Mac OS X is a reality since at least
> CUDA3.1,
> >> this should be changed.  I know there are other problems regarding 32/64
> >> python stopping 64bit mac users at the moment (including myself), but
> since
> >> changes are being made to the PyCUDA code to make it compatible with
> CUDA3.2
> >> anyway...
> >
> > It looks like this should just be killed wholesale--right? Or is there
> > anything more appropriate that it should be changed to?
> >
> > Andreas
> >
> > ___
> > PyCUDA mailing list
> > PyCUDA@tiker.net
> > http://lists.tiker.net/listinfo/pycuda
>
> - bryan
>
>
>
>
> ___
> PyCUDA mailing list
> PyCUDA@tiker.net
> http://lists.tiker.net/listinfo/pycuda
>
>
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda


[PyCUDA] bug in __iadd__ and __isub__ in gpuarray

2010-09-20 Thread Nicolas Barbey
Hello all,

Thanks for this great package which ease so much the use of cuda !
I think I found out a small bug in the current git version of the code.
If you run the following code in ipython :

>>> from pycuda import autoinit
>>> from pycuda import driver, compiler, gpuarray, tools
>>> a = gpuarray.ones(16, dtype=float32)
>>> a += 1

you get :

---
AttributeErrorTraceback (most recent call last)

/home/data/projets/pycuda_mult/ in ()

/usr/lib/python2.6/site-packages/pycuda-0.94rc-py2.6-linux-x86_64.egg/pycuda/gpuarray.py
in __iadd__(self, other)
258
259 def __iadd__(self, other):
--> 260 return self._axpbyz(1, other, 1, self)
261
262 def __isub__(self, other):

/usr/lib/python2.6/site-packages/pycuda-0.94rc-py2.6-linux-x86_64.egg/pycuda/gpuarray.py
in _axpbyz(self, selffac, other, otherfac, out, add_timer, stream)
140 """Compute ``out = selffac * self + otherfac*other``,
141 where `other` is a vector.."""
--> 142 assert self.shape == other.shape
143
144 func = elementwise.get_axpbyz_kernel(self.dtype,
other.dtype, out.dtype)

AttributeError: 'int' object has no attribute 'shape'



The following patch should solve this issue. It adds also a ones function as
it exist in numpy and define __sub__ using __add__ to avoid code
duplication.
Tell me what you think about those changes. Also I do not know if patches
should be submitted to the mailing list but could not find another way.

Cheers,

Nicolas

path follows:

diff --git a/pycuda/gpuarray.py b/pycuda/gpuarray.py
index a0a1da1..b685e52 100644
--- a/pycuda/gpuarray.py
+++ b/pycuda/gpuarray.py
@@ -235,17 +235,7 @@ class GPUArray(object):

 def __sub__(self, other):
 """Substract an array from an array or a scalar from an array."""
-
-if isinstance(other, GPUArray):
-result = self._new_like_me(_get_common_dtype(self, other))
-return self._axpbyz(1, other, -1, result)
-else:
-if other == 0:
-return self
-else:
-# create a new array for the result
-result = self._new_like_me()
-return self._axpbz(1, -other, result)
+return self.__add__(-other)

 def __rsub__(self,other):
 """Substracts an array by a scalar or an array::
@@ -257,10 +247,18 @@ class GPUArray(object):
 return self._axpbz(-1, other, result)

 def __iadd__(self, other):
-return self._axpbyz(1, other, 1, self)
+if isinstance(other, GPUArray):
+# add another vector
+return self._axpbyz(1, other, 1, self)
+else:
+# add a scalar
+if other == 0:
+return self
+else:
+return self._axpbz(1, other, self)

 def __isub__(self, other):
-return self._axpbyz(1, other, -1, self)
+return self.__iadd__(-other)

 def __neg__(self):
 result = self._new_like_me()
@@ -631,6 +629,13 @@ def zeros(shape, dtype, allocator=drv.mem_alloc):
 result.fill(0)
 return result

+def ones(shape, dtype, allocator=drv.mem_alloc):
+"""Returns an array of the given shape and dtype filled with 1's."""
+
+result = GPUArray(shape, dtype, allocator)
+result.fill(1)
+return result
+
 def empty_like(other_ary):
 result = GPUArray(
 other_ary.shape, other_ary.dtype, other_ary.allocator)
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda


Re: [PyCUDA] mac64 PyCUDA

2010-09-20 Thread Bryan Catanzaro
I think it should be changed to check to see if the Python interpreter is 
currently running in 32 bit mode, and then compile to match:

if 'darwin' in sys.platform and sys.maxint == 2147483647:
# The Python interpreter is running in 32 bit mode on OS X
if "-arch" not in conf["CXXFLAGS"]:
  conf["CXXFLAGS"].extend(['-arch', 'i386', '-m32'])
if "-arch" not in conf["LDFLAGS"]:
  conf["LDFLAGS"].extend(['-arch', 'i386', '-m32'])

Some people (myself included) have to run Python in 32-bit mode on 64-bit OS X 
for various compatibility reasons (currently including libcudart.dylib, which 
is only shipped as a 32-bit library).  Since the Python which Apple ships is 
compiled as a fat binary with both 32 and 64 bit versions, we can't know a 
priori what the right compiler flags are.

- bryan



On Sep 19, 2010, at 5:55 PM, Andreas Kloeckner wrote:

> On Thu, 16 Sep 2010 14:37:31 -0400, gerald wrong  
> wrote:
>> Looking at PyCUDA setup.py, I found this:
>> 
>>if 'darwin' in sys.platform:
>># prevent from building ppc since cuda on OS X is not compiled for
>> ppc
>># also, default to 32-bit build, since there doesn't appear to be a
>># 64-bit CUDA on Mac yet.
>>if "-arch" not in conf["CXXFLAGS"]:
>>conf["CXXFLAGS"].extend(['-arch', 'i386', '-m32'])
>>if "-arch" not in conf["LDFLAGS"]:
>>conf["LDFLAGS"].extend(['-arch', 'i386', '-m32'])
>> 
>> Since 64bit CUDA support on Mac OS X is a reality since at least CUDA3.1,
>> this should be changed.  I know there are other problems regarding 32/64
>> python stopping 64bit mac users at the moment (including myself), but since
>> changes are being made to the PyCUDA code to make it compatible with CUDA3.2
>> anyway...
> 
> It looks like this should just be killed wholesale--right? Or is there
> anything more appropriate that it should be changed to?
> 
> Andreas
> 
> ___
> PyCUDA mailing list
> PyCUDA@tiker.net
> http://lists.tiker.net/listinfo/pycuda

- bryan





smime.p7s
Description: S/MIME cryptographic signature
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda


Re: [PyCUDA] mac64 PyCUDA

2010-09-20 Thread Timothy O'Keefe
Of course, the most recent Mac Pros now ship with ATI.

http://www.apple.com/macpro/features/graphics.html

On Sun, Sep 19, 2010 at 9:24 PM, gerald wrong  wrote:
> I think PyCUDA should attempt to build correctly as 64bit so once the
> python64 bugs are shaken out, PyCUDA will be able to build on Mac OS X
> without special flags etc.
> My understanding from multiple build attempts is that there is a bug in the
> various Mac 64bit python implementations causing them to run in 32bit even
> when forced 64; this should eventually get fixed (if it has not already).  I
> admit after many days of frustration, I have not tried to build on mac64 in
> about 2 months; perhaps I will give it another go this week and report here
> to the list.
> The ability to devel on Macs is very useful, since newer macs include CUDA
> capable hardware... and there are lots of scientists with macs... Judging by
> the old queries on the mailing list, there are many people who have spent
> time fiddling with PyCUDA on mac64... so there must be significant interest
> in this feature.
> On Sun, Sep 19, 2010 at 8:55 PM, Andreas Kloeckner 
> wrote:
>>
>> On Thu, 16 Sep 2010 14:37:31 -0400, gerald wrong
>>  wrote:
>> > Looking at PyCUDA setup.py, I found this:
>> >
>> >     if 'darwin' in sys.platform:
>> >         # prevent from building ppc since cuda on OS X is not compiled
>> > for
>> > ppc
>> >         # also, default to 32-bit build, since there doesn't appear to
>> > be a
>> >         # 64-bit CUDA on Mac yet.
>> >         if "-arch" not in conf["CXXFLAGS"]:
>> >             conf["CXXFLAGS"].extend(['-arch', 'i386', '-m32'])
>> >         if "-arch" not in conf["LDFLAGS"]:
>> >             conf["LDFLAGS"].extend(['-arch', 'i386', '-m32'])
>> >
>> > Since 64bit CUDA support on Mac OS X is a reality since at least
>> > CUDA3.1,
>> > this should be changed.  I know there are other problems regarding 32/64
>> > python stopping 64bit mac users at the moment (including myself), but
>> > since
>> > changes are being made to the PyCUDA code to make it compatible with
>> > CUDA3.2
>> > anyway...
>>
>> It looks like this should just be killed wholesale--right? Or is there
>> anything more appropriate that it should be changed to?
>>
>> Andreas
>>
>
>
> ___
> PyCUDA mailing list
> PyCUDA@tiker.net
> http://lists.tiker.net/listinfo/pycuda
>
>

___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda