Hi Zach,

On 04/11/14 21:16, Zach Davis wrote:
> Thanks for your input.  It’s these nuances regarding PyFR’s use that
> make using it all the more familiar as I uncover them, so thanks for
> highlighting where my experiment may have gone astray.  Peter, I took a
> look at the paper you provided when it was first announced, so I’m
> familiar with what sort of comparable performance to expect—I guess I
> was hoping to realize these same trends myself as a means to get more
> acquainted with PyFR.  Both of your input and experience has helped in
> that regard.
> 
> With regards to pycuda on OS 10.10, importing the pucuda.autoinit module
> gives me the following stack trace:
> 
> Python 2.7.8 (default, Oct 17 2014, 18:21:39) 
> [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import pycuda.autoinit
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/local/lib/python2.7/site-packages/pycuda/autoinit.py", line
> 4, in <module>
>     cuda.init()
> pycuda._driver.RuntimeError: cuInit failed: no device

This suggests that either PyCUDA or CUDA are not set up correctly.  If
regular CUDA applications written in C/C++ work without issue then you
should try recompiling PyCUDA.

> This is with a meager NVIDIA GT 650m (1024 MB VRAM) card; thus, why I
> resorted to using the simple 2D examples.  I’ve got a Tesla K5000 in a
> workstation right next to me, so perhaps I’ll use it for some more
> testing; though, I was specifically interested in setting up and running
> PyFR under OS X, which I realize is probably a very niche use case.  I
> was particularly interested in the OpenMP backend, because I can imagine
> someone may have a model that wouldn’t fit on the available memory of
> the GPUs we provide currently, so better understanding the performance
> trade-off of running on a GPU cluster as opposed to a more traditional
> cluster of CPUs was worthwhile to me.

If you switch from double to single precision the card should be quite
capable.  The 1024 MiB of memory behaves more like 2048 MiB and the
FLOP/S increase by a factor of ~24 or so.  I expect this is enough to
run some interesting 3D cases (even if they're not scale resolving).

Regards, Freddie.

-- 
You received this message because you are subscribed to the Google Groups "PyFR 
Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send an email to [email protected].
Visit this group at http://groups.google.com/group/pyfrmailinglist.
For more options, visit https://groups.google.com/d/optout.

Reply via email to