I fixed the segfault in trunk recently.  This was a null dereference in the
memory freeing code.

You shouldn't have run out of memory there, it only needs about 80MB iirc
and you have 128.  Sometimes all the memory gets tied up in the windowing
system.  If you're using linux, switching to the text console and back often
flushes it back out and lets you run CUDA programs again.  I'm not sure if
there is an equivalent in windows.


On Thu, Nov 11, 2010 at 10:36 PM, Richard Gomes <rgomes1...@yahoo.co.uk>wrote:

> Hi Igor,
>
> I applied the patch onto SF_RELEASE_2_1_0 and it compiles fine.
> When I run the sample programs, CUDATopology and CUDAKernelTest work
> fine whilst others fail like this:
>
>
> $ runx10 KMeansCUDA
> points: 100000 clusters: 8 dim: 4
> Running using 1 GPUs.
> GPU known as (Place 1) gets role 0 offset 0 len 100000
> 100000 8 4 2.703
> kernel: 1.689
> dma: 0.447
> cpu: 0.463
> reduce: 0.101
> Segmentation fault
>
>
>
>
> $ runx10 CUDABlackScholes
> Using the GPU at place (Place 1)
> This program only supports a single GPU.
> CUDA_ERROR_OUT_OF_MEMORY (At common/x10rt_cuda.cc:452)
> Aborted
>
>
>
> Are these failures considered 'expected' or my graphics card is not very
> intelligent for such tasks? I have a GeForce 8300 GS
>
> Device 0: "GeForce 8300 GS"
>  >    CUDA Driver Version:                           3.20
>  >    CUDA Runtime Version:                          3.20
>  >    CUDA Capability Major/Minor version number:    1.1
>  >    Total amount of global memory:                 133496832 bytes
>  >    Multiprocessors x Cores/MP = Cores:            1 (MP) x 8 (Cores/MP)
>  > = 8 (Cores)
>  >    Total amount of constant memory:               65536 bytes
>  >    Total amount of shared memory per block:       16384 bytes
>  >    Total number of registers available per block: 8192
>  >    Warp size:                                     32
>  >    Maximum number of threads per block:           512
>  >    Maximum sizes of each dimension of a block:    512 x 512 x 64
>  >    Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
>  >    Maximum memory pitch:                          2147483647 bytes
>  >    Texture alignment:                             256 bytes
>  >    Clock rate:                                    0.92 GHz
>  >    Concurrent copy and execution:                 No
>  >    Run time limit on kernels:                     Yes
>  >    Integrated:                                    No
>  >    Support host page-locked memory mapping:       Yes
>  >    Compute mode:                                  Default (multiple host
>
>
> Thanks
>
>
> Richard Gomes
> M: +44(77)9955-6813
> http://tinyurl.com/frgomes
> twitter: frgomes
>
> JQuantLib is a library for Quantitative Finance written in Java.
> http://www.jquantlib.org/
> twitter: jquantlib
>
> On 09/11/10 22:22, Igor Peshansky wrote:
> > Richard,
> >
> > "svn diff -r18088:18092 x10.runtime/x10rt/common/x10rt_cuda.cc" in our
> > repo should generate that patch.
> >          Igor
> >
> > Richard Gomes<rgomes1...@yahoo.co.uk>  wrote on 11/09/2010 05:11:43 PM:
> >
> >> Hi Dave,
> >>
> >> Could you please send the patch again?
> >>
> >> Thanks
> >>
> >> Richard Gomes
> >> M: +44(77)9955-6813
> >> http://tinyurl.com/frgomes
> >> twitter: frgomes
> >>
> >> JQuantLib is a library for Quantitative Finance written in Java.
> >> http://www.jquantlib.org/
> >> twitter: jquantlib
> >>
> >> On 09/11/10 03:45, Dave Cunningham wrote:
> >>> Thanks for trying out X10/CUDA
> >>>
> >>> Your initial problem with CUDATopology is due to the fact that
> > X10RT_ACCELS
> >>> is ineffective if X10 was built without -DX10RT_CUDA=true, this means
> > the
> >>> X10 application was unable to 'see' the accelerators.  You correctly
> >>> surmised that building X10 from the source release was necessary.
> >>>
> >>> The build errors are due to nvidia adding more error codes and making
> >>> backwards incompatible changes in the CUDA API.  This is now fixed in
> > SVN.
> >>> I checked the build with the following CUDA versions:
> >>>
> >>> cuda-2.2  cuda-2.3  cuda-3.0  cuda-3.1  cuda-3.2.12
> >>>
> >>> If you decide to use SVN, there are some changes to the way kernels
> > should
> >>> be written in X10 that are currently undocumented (except via the code
> > in
> >>> the samples dir).  Also, we can't guarantee there won't be more
> > changes (and
> >>> breakages) before the next release.  However you will be able to try
> > new
> >>> features like clocks and constant memory on the GPU.
> >>>
> >>> If using SVN does not appeal, you can also patch the source release to
> > fix
> >>> this problem.  Apply the attached patch from the root of the source
> > release
> >>> as follows, and rebuild:
> >>>
> >>> patch -p0<   cuda_3.2.patch
> >>>
> >>> hope this helps
>
>
> ------------------------------------------------------------------------------
> Centralized Desktop Delivery: Dell and VMware Reference Architecture
> Simplifying enterprise desktop deployment and management using
> Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
> client virtualization framework. Read more!
> http://p.sf.net/sfu/dell-eql-dev2dev
> _______________________________________________
> X10-users mailing list
> X10-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/x10-users
>
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
_______________________________________________
X10-users mailing list
X10-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/x10-users

Reply via email to