Fwd: Fwd: cuda error cudastreamcreate,

2011-06-16 Thread Francesco Pietra
I forgot the list.
f.


-- Forwarded message --
From: Francesco Pietra chiendar...@gmail.com
Date: Thu, Jun 16, 2011 at 4:11 PM
Subject: Re: Fwd: cuda error cudastreamcreate,
To: Brian Morris cymraeg...@gmail.com


Oh, no, absolutely no. Where are scientific apencl applications? And
not only for that.
f.

On Thu, Jun 16, 2011 at 3:59 AM, Brian Morris cymraeg...@gmail.com wrote:
 Why are you using Cuda rather than OpenCL ? Nvidia has said they are cutting
 back on their GPU business and moving into CPUs for tablets which are now
 appearing on the market. If you have to move to AMD/ATI in the future OpenCL
 will still work, but CUDA will not.



 On Wed, Jun 15, 2011 at 8:22 AM, Francesco Pietra chiendar...@gmail.com
 wrote:

 Running nvidia-smi -L as root restores the visibility of the graphic
 cards. At any boot such visibility vanishes. So, it is a small
 problem, or no problem. francesco


 -- Forwarded message --
 From: Francesco Pietra chiendar...@gmail.com
 Date: Wed, Jun 15, 2011 at 4:37 PM
 Subject: Fwd: Fwd: cuda error cudastreamcreate,
 To: Lennart Sorensen lsore...@csclub.uwaterloo.ca, amd64 Debian
 debian-amd64@lists.debian.org


 The simulation (pressure equilibration) was completed successfully.
 Next run (just a continuation of previous pressure equilibration)
 failed, again 'Device Emulation (CPU' , see log file below. Attempted
 again, same error.

 # modinfo nvidia
 filename:       /lib/modules/2.6.38-2-amd64/updates/dkms/nvidia.ko
 alias:          char-major-195-*
 supported:      external
 license:        NVIDIA
 alias:          pci:v10DEd0E00sv*sd*bc04sc80i00*
 alias:          pci:v10DEd0AA3sv*sd*bc0Bsc40i00*
 alias:          pci:v10DEd*sv*sd*bc03sc02i00*
 alias:          pci:v10DEd*sv*sd*bc03sc00i00*
 depends:        i2c-core
 vermagic:       2.6.38-2-amd64 SMP mod_unload modversions
 parm:           NVreg_EnableVia4x:int
 parm:           NVreg_EnableALiAGP:int
 parm:           NVreg_ReqAGPRate:int
 parm:           NVreg_EnableAGPSBA:int
 parm:           NVreg_EnableAGPFW:int
 parm:           NVreg_Mobile:int
 parm:           NVreg_ResmanDebugLevel:int
 parm:           NVreg_RmLogonRC:int
 parm:           NVreg_ModifyDeviceFiles:int
 parm:           NVreg_DeviceFileUID:int
 parm:           NVreg_DeviceFileGID:int
 parm:           NVreg_DeviceFileMode:int
 parm:           NVreg_RemapLimit:int
 parm:           NVreg_UpdateMemoryTypes:int
 parm:           NVreg_InitializeSystemMemoryAllocations:int
 parm:           NVreg_UseVBios:int
 parm:           NVreg_RMEdgeIntrCheck:int
 parm:           NVreg_UsePageAttributeTable:int
 parm:           NVreg_EnableMSI:int
 parm:           NVreg_MapRegistersEarly:int
 parm:           NVreg_RegisterForACPIEvents:int
 parm:           NVreg_RegistryDwords:charp
 parm:           NVreg_RmMsg:charp
 parm:           NVreg_NvAGP:int

 However:

 $ nvidia-smi -L
 Could not open device /dev/nvidia1 (no such file)
 Failed to initialize NVML: unknown error.


 I am unable to draw technical conclusions from this 'unknown error'. I
 wonder whether other information can be extracted to fix the problems.

 Thanks for advice.

 francesco




 Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
 Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
 Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6    gig64  francesco
 Info: Running on 6 processors, 6 nodes, 1 physical nodes.
 Info: CPU topology information available.
 Info: Charm++/Converse parallel runtime startup completed at 0.00658393 s
 Pe 2 sharing CUDA device 0 first 0 next 3
 Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'Device
 Emulation (CPU)'  Mem: 0MB  Rev: .
 FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device 0): no
 CUDA-capable device is available


 -- Forwarded message --
 From: Francesco Pietra chiendar...@gmail.com
 Date: Wed, Jun 15, 2011 at 9:04 AM
 Subject: Re: Fwd: cuda error cudastreamcreate,
 To: Fabricio Cannini fabri...@versatushpc.com.br, Lennart Sorensen
 lsore...@csclub.uwaterloo.ca, amd64 Debian
 debian-amd64@lists.debian.org


 The nvidia-smi -L  output was for a machine of Jim Phillips, the
 main developer of NAMD. He provided that to show that it should also
 work with my GTX 470 cards.

 That said, my problems seem to have been solved by following Lennart's
 indications. The driver was rebuilt, date 15 June, and NAMD simulation
 could be started regularly. However, we have to wait before claiming
 full victory. Please see below..

 In retrospect, the nvidia.ko I had before, dated 5 June, must have
 also been built within Debian. Renaming it no_nvidia.ko prevented
 rebuilding for the reasons that Lennart clarified.

 For some reasons, previous installation of nvidia.ko must have had
 some problems, as, for example, nvidia-smi -L did not work (there
 was a single installation of nvidia-smi, nvidia-smi 270.41.19-1),
 while modinfo nvidia output

Re: Fwd: cuda error cudastreamcreate,

2011-06-16 Thread Brian Morris
Huh ?

As far as I can see other than for technical details OpenCL and CUDA are the
same, except that OpenCL works for both NVIDIA and ATI/AMD. *I don't mean
OpenGL !!*

Besides being  cross platform, OpenCL is, well, open. Indeed Microsoft has
also their own idea which only complicates things further.

For scientic purposes there is strong compelling reason to use OpenCL (given
that development tools are available which they are), that is the
repeatability and reviewability of scientific results. If I have ATI GPU and
you have NVIDIA we cannot share code very well unless we have a standard. If
you are doing scientific research rather than commercial development, it is
self-defeating to be supporting proprietary standards, unless of course your
funding is tied to it in which case well it is bad for science, but a good
reason for supporting open standards is so that researchers are not subject
to this sort of manipulation.

Personally I have just bought a new rather expensive (for me) new machine
which has ATI GPU, which I intend to use with OpenCL for Machine Learning
research and (open source) code development. There is plenty of support in
code libraries an open source projects examples here to get me started.




On Thu, Jun 16, 2011 at 7:12 AM, Francesco Pietra chiendar...@gmail.comwrote:

 I forgot the list.
 f.


 -- Forwarded message --
 From: Francesco Pietra chiendar...@gmail.com
 Date: Thu, Jun 16, 2011 at 4:11 PM
 Subject: Re: Fwd: cuda error cudastreamcreate,
 To: Brian Morris cymraeg...@gmail.com


 Oh, no, absolutely no. Where are scientific apencl applications? And
 not only for that.
 f.

 On Thu, Jun 16, 2011 at 3:59 AM, Brian Morris cymraeg...@gmail.com
 wrote:
  Why are you using Cuda rather than OpenCL ? Nvidia has said they are
 cutting
  back on their GPU business and moving into CPUs for tablets which are now
  appearing on the market. If you have to move to AMD/ATI in the future
 OpenCL
  will still work, but CUDA will not.
 
 
 
  On Wed, Jun 15, 2011 at 8:22 AM, Francesco Pietra chiendar...@gmail.com
 
  wrote:
 
  Running nvidia-smi -L as root restores the visibility of the graphic
  cards. At any boot such visibility vanishes. So, it is a small
  problem, or no problem. francesco
 
 
  -- Forwarded message --
  From: Francesco Pietra chiendar...@gmail.com
  Date: Wed, Jun 15, 2011 at 4:37 PM
  Subject: Fwd: Fwd: cuda error cudastreamcreate,
  To: Lennart Sorensen lsore...@csclub.uwaterloo.ca, amd64 Debian
  debian-amd64@lists.debian.org
 
 
  The simulation (pressure equilibration) was completed successfully.
  Next run (just a continuation of previous pressure equilibration)
  failed, again 'Device Emulation (CPU' , see log file below. Attempted
  again, same error.
 
  # modinfo nvidia
  filename:   /lib/modules/2.6.38-2-amd64/updates/dkms/nvidia.ko
  alias:  char-major-195-*
  supported:  external
  license:NVIDIA
  alias:  pci:v10DEd0E00sv*sd*bc04sc80i00*
  alias:  pci:v10DEd0AA3sv*sd*bc0Bsc40i00*
  alias:  pci:v10DEd*sv*sd*bc03sc02i00*
  alias:  pci:v10DEd*sv*sd*bc03sc00i00*
  depends:i2c-core
  vermagic:   2.6.38-2-amd64 SMP mod_unload modversions
  parm:   NVreg_EnableVia4x:int
  parm:   NVreg_EnableALiAGP:int
  parm:   NVreg_ReqAGPRate:int
  parm:   NVreg_EnableAGPSBA:int
  parm:   NVreg_EnableAGPFW:int
  parm:   NVreg_Mobile:int
  parm:   NVreg_ResmanDebugLevel:int
  parm:   NVreg_RmLogonRC:int
  parm:   NVreg_ModifyDeviceFiles:int
  parm:   NVreg_DeviceFileUID:int
  parm:   NVreg_DeviceFileGID:int
  parm:   NVreg_DeviceFileMode:int
  parm:   NVreg_RemapLimit:int
  parm:   NVreg_UpdateMemoryTypes:int
  parm:   NVreg_InitializeSystemMemoryAllocations:int
  parm:   NVreg_UseVBios:int
  parm:   NVreg_RMEdgeIntrCheck:int
  parm:   NVreg_UsePageAttributeTable:int
  parm:   NVreg_EnableMSI:int
  parm:   NVreg_MapRegistersEarly:int
  parm:   NVreg_RegisterForACPIEvents:int
  parm:   NVreg_RegistryDwords:charp
  parm:   NVreg_RmMsg:charp
  parm:   NVreg_NvAGP:int
 
  However:
 
  $ nvidia-smi -L
  Could not open device /dev/nvidia1 (no such file)
  Failed to initialize NVML: unknown error.
 
 
  I am unable to draw technical conclusions from this 'unknown error'. I
  wonder whether other information can be extracted to fix the problems.
 
  Thanks for advice.
 
  francesco
 
 
 
 
  Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
  Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
  Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6gig64  francesco
  Info: Running on 6 processors, 6 nodes, 1 physical nodes.
  Info: CPU topology information available.
  Info: Charm++/Converse parallel runtime startup completed at 0.00658393

Re: Fwd: cuda error cudastreamcreate,

2011-06-15 Thread Francesco Pietra
The nvidia-smi -L  output was for a machine of Jim Phillips, the
main developer of NAMD. He provided that to show that it should also
work with my GTX 470 cards.

That said, my problems seem to have been solved by following Lennart's
indications. The driver was rebuilt, date 15 June, and NAMD simulation
could be started regularly. However, we have to wait before claiming
full victory. Please see below..

In retrospect, the nvidia.ko I had before, dated 5 June, must have
also been built within Debian. Renaming it no_nvidia.ko prevented
rebuilding for the reasons that Lennart clarified.

For some reasons, previous installation of nvidia.ko must have had
some problems, as, for example, nvidia-smi -L did not work (there
was a single installation of nvidia-smi, nvidia-smi 270.41.19-1),
while modinfo nvidia output was correct. Now, both are correct:

$ nvidia-smi -L
GPU 0: GeForce GTX 470 (UUID: N/A)
GPU 1: GeForce GTX 470 (UUID: N/A)

# modinfo nvidia
filename:   /lib/modules/2.6.38-2-amd64/updates/dkms/nvidia.ko
alias:  char-major-195-*
supported:  external
license:NVIDIA
alias:  pci:v10DEd0E00sv*sd*bc04sc80i00*
alias:  pci:v10DEd0AA3sv*sd*bc0Bsc40i00*
alias:  pci:v10DEd*sv*sd*bc03sc02i00*
alias:  pci:v10DEd*sv*sd*bc03sc00i00*
depends:i2c-core
vermagic:   2.6.38-2-amd64 SMP mod_unload modversions
parm:   NVreg_EnableVia4x:int
parm:   NVreg_EnableALiAGP:int
parm:   NVreg_ReqAGPRate:int
parm:   NVreg_EnableAGPSBA:int
parm:   NVreg_EnableAGPFW:int
parm:   NVreg_Mobile:int
parm:   NVreg_ResmanDebugLevel:int
parm:   NVreg_RmLogonRC:int
parm:   NVreg_ModifyDeviceFiles:int
parm:   NVreg_DeviceFileUID:int
parm:   NVreg_DeviceFileGID:int
parm:   NVreg_DeviceFileMode:int
parm:   NVreg_RemapLimit:int
parm:   NVreg_UpdateMemoryTypes:int
parm:   NVreg_InitializeSystemMemoryAllocations:int
parm:   NVreg_UseVBios:int
parm:   NVreg_RMEdgeIntrCheck:int
parm:   NVreg_UsePageAttributeTable:int
parm:   NVreg_EnableMSI:int
parm:   NVreg_MapRegistersEarly:int
parm:   NVreg_RegisterForACPIEvents:int
parm:   NVreg_RegistryDwords:charp
parm:   NVreg_RmMsg:charp
parm:   NVreg_NvAGP:int


I said above that time will show if the system is stable. In fact,
this morning, NAMD simulation did not start (I am using the console
memory to recover commands, so that no error of digitizing). I had not
carried out any amd64 upgrade in between. From the simulation log:


Info: Charm++/Converse parallel runtime startup completed at 0.00989103 s
Pe 2 sharing CUDA device 0 first 0 next 3
Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)'  Mem: 0MB  Rev: .
FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device 0): no
CUDA-capable device is available

'Device Emulation (CPU)' indicates (for some to me unclear reasons)
that things have gone bad.

On a second identical attempt (after having explored the driver
location and carried out info commands), NAMD simulation started, with
the correct log output:

Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6gig64  francesco
Info: Running on 6 processors, 6 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.00650811 s


We will see if failure/success will be presented again (now a
simulation lasts several hours (which would be days on a 8 processor
machine). If failure will occur again, there are so many possible
reasons, including problems with the NAMD code.

I was so discomforted yesterday to allude to a change of driver
source. Which was unfair.

Thanks a lot
francesco

On Wed, Jun 15, 2011 at 2:22 AM, Fabricio Cannini
fabri...@versatushpc.com.br wrote:
 Em terça-feira 14 junho 2011, às 16:01:57, Lennart Sorensen escreveu:
 On Tue, Jun 14, 2011 at 07:23:38PM +0200, Francesco Pietra wrote:
  I forgot to answer: yes, sometime it works, sometimes not, everything
  being the same.
 
  As a matter of fact, after a day of failure, I have now renamed back
 
  /lib/modules/2.638-2-amd64/updatesdkms/no_nvidia.ko
 
  to
 
  /lib/modules/2.638-2-amd64/updatesdkms/nvidia.ko
 
  and the NAMD simulation started regularly using both gtx 470. The
  machine had not been touched either.

 I wonder if having the 9800 card in there along with the 470 gtx cards
 is confusing the driver.  Maybe the card order is getting swapped around
 on some boots.

 What is the 9800 doing in the box anyhow?

 Hi All.

 I'm thinking the same as Lennart. It seems to me that the order which the
 cards are named varies, thus confusing the application( s ). I'd try to fix 
 the
 order in /etc/X11/xorg.conf

Fwd: Fwd: cuda error cudastreamcreate,

2011-06-15 Thread Francesco Pietra
The simulation (pressure equilibration) was completed successfully.
Next run (just a continuation of previous pressure equilibration)
failed, again 'Device Emulation (CPU' , see log file below. Attempted
again, same error.

# modinfo nvidia
filename:   /lib/modules/2.6.38-2-amd64/updates/dkms/nvidia.ko
alias:  char-major-195-*
supported:  external
license:NVIDIA
alias:  pci:v10DEd0E00sv*sd*bc04sc80i00*
alias:  pci:v10DEd0AA3sv*sd*bc0Bsc40i00*
alias:  pci:v10DEd*sv*sd*bc03sc02i00*
alias:  pci:v10DEd*sv*sd*bc03sc00i00*
depends:i2c-core
vermagic:   2.6.38-2-amd64 SMP mod_unload modversions
parm:   NVreg_EnableVia4x:int
parm:   NVreg_EnableALiAGP:int
parm:   NVreg_ReqAGPRate:int
parm:   NVreg_EnableAGPSBA:int
parm:   NVreg_EnableAGPFW:int
parm:   NVreg_Mobile:int
parm:   NVreg_ResmanDebugLevel:int
parm:   NVreg_RmLogonRC:int
parm:   NVreg_ModifyDeviceFiles:int
parm:   NVreg_DeviceFileUID:int
parm:   NVreg_DeviceFileGID:int
parm:   NVreg_DeviceFileMode:int
parm:   NVreg_RemapLimit:int
parm:   NVreg_UpdateMemoryTypes:int
parm:   NVreg_InitializeSystemMemoryAllocations:int
parm:   NVreg_UseVBios:int
parm:   NVreg_RMEdgeIntrCheck:int
parm:   NVreg_UsePageAttributeTable:int
parm:   NVreg_EnableMSI:int
parm:   NVreg_MapRegistersEarly:int
parm:   NVreg_RegisterForACPIEvents:int
parm:   NVreg_RegistryDwords:charp
parm:   NVreg_RmMsg:charp
parm:   NVreg_NvAGP:int

However:

$ nvidia-smi -L
Could not open device /dev/nvidia1 (no such file)
Failed to initialize NVML: unknown error.


I am unable to draw technical conclusions from this 'unknown error'. I
wonder whether other information can be extracted to fix the problems.

Thanks for advice.

francesco




Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6gig64  francesco
Info: Running on 6 processors, 6 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.00658393 s
Pe 2 sharing CUDA device 0 first 0 next 3
Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)'  Mem: 0MB  Rev: .
FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device 0): no
CUDA-capable device is available


-- Forwarded message --
From: Francesco Pietra chiendar...@gmail.com
Date: Wed, Jun 15, 2011 at 9:04 AM
Subject: Re: Fwd: cuda error cudastreamcreate,
To: Fabricio Cannini fabri...@versatushpc.com.br, Lennart Sorensen
lsore...@csclub.uwaterloo.ca, amd64 Debian
debian-amd64@lists.debian.org


The nvidia-smi -L  output was for a machine of Jim Phillips, the
main developer of NAMD. He provided that to show that it should also
work with my GTX 470 cards.

That said, my problems seem to have been solved by following Lennart's
indications. The driver was rebuilt, date 15 June, and NAMD simulation
could be started regularly. However, we have to wait before claiming
full victory. Please see below..

In retrospect, the nvidia.ko I had before, dated 5 June, must have
also been built within Debian. Renaming it no_nvidia.ko prevented
rebuilding for the reasons that Lennart clarified.

For some reasons, previous installation of nvidia.ko must have had
some problems, as, for example, nvidia-smi -L did not work (there
was a single installation of nvidia-smi, nvidia-smi 270.41.19-1),
while modinfo nvidia output was correct. Now, both are correct:

$ nvidia-smi -L
GPU 0: GeForce GTX 470 (UUID: N/A)
GPU 1: GeForce GTX 470 (UUID: N/A)

# modinfo nvidia
filename:       /lib/modules/2.6.38-2-amd64/updates/dkms/nvidia.ko
alias:          char-major-195-*
supported:      external
license:        NVIDIA
alias:          pci:v10DEd0E00sv*sd*bc04sc80i00*
alias:          pci:v10DEd0AA3sv*sd*bc0Bsc40i00*
alias:          pci:v10DEd*sv*sd*bc03sc02i00*
alias:          pci:v10DEd*sv*sd*bc03sc00i00*
depends:        i2c-core
vermagic:       2.6.38-2-amd64 SMP mod_unload modversions
parm:           NVreg_EnableVia4x:int
parm:           NVreg_EnableALiAGP:int
parm:           NVreg_ReqAGPRate:int
parm:           NVreg_EnableAGPSBA:int
parm:           NVreg_EnableAGPFW:int
parm:           NVreg_Mobile:int
parm:           NVreg_ResmanDebugLevel:int
parm:           NVreg_RmLogonRC:int
parm:           NVreg_ModifyDeviceFiles:int
parm:           NVreg_DeviceFileUID:int
parm:           NVreg_DeviceFileGID:int
parm:           NVreg_DeviceFileMode:int
parm:           NVreg_RemapLimit:int
parm:           NVreg_UpdateMemoryTypes:int
parm:           NVreg_InitializeSystemMemoryAllocations:int
parm:           NVreg_UseVBios:int
parm:           NVreg_RMEdgeIntrCheck:int
parm

Fwd: Fwd: cuda error cudastreamcreate,

2011-06-15 Thread Francesco Pietra
Running nvidia-smi -L as root restores the visibility of the graphic
cards. At any boot such visibility vanishes. So, it is a small
problem, or no problem. francesco


-- Forwarded message --
From: Francesco Pietra chiendar...@gmail.com
Date: Wed, Jun 15, 2011 at 4:37 PM
Subject: Fwd: Fwd: cuda error cudastreamcreate,
To: Lennart Sorensen lsore...@csclub.uwaterloo.ca, amd64 Debian
debian-amd64@lists.debian.org


The simulation (pressure equilibration) was completed successfully.
Next run (just a continuation of previous pressure equilibration)
failed, again 'Device Emulation (CPU' , see log file below. Attempted
again, same error.

# modinfo nvidia
filename:       /lib/modules/2.6.38-2-amd64/updates/dkms/nvidia.ko
alias:          char-major-195-*
supported:      external
license:        NVIDIA
alias:          pci:v10DEd0E00sv*sd*bc04sc80i00*
alias:          pci:v10DEd0AA3sv*sd*bc0Bsc40i00*
alias:          pci:v10DEd*sv*sd*bc03sc02i00*
alias:          pci:v10DEd*sv*sd*bc03sc00i00*
depends:        i2c-core
vermagic:       2.6.38-2-amd64 SMP mod_unload modversions
parm:           NVreg_EnableVia4x:int
parm:           NVreg_EnableALiAGP:int
parm:           NVreg_ReqAGPRate:int
parm:           NVreg_EnableAGPSBA:int
parm:           NVreg_EnableAGPFW:int
parm:           NVreg_Mobile:int
parm:           NVreg_ResmanDebugLevel:int
parm:           NVreg_RmLogonRC:int
parm:           NVreg_ModifyDeviceFiles:int
parm:           NVreg_DeviceFileUID:int
parm:           NVreg_DeviceFileGID:int
parm:           NVreg_DeviceFileMode:int
parm:           NVreg_RemapLimit:int
parm:           NVreg_UpdateMemoryTypes:int
parm:           NVreg_InitializeSystemMemoryAllocations:int
parm:           NVreg_UseVBios:int
parm:           NVreg_RMEdgeIntrCheck:int
parm:           NVreg_UsePageAttributeTable:int
parm:           NVreg_EnableMSI:int
parm:           NVreg_MapRegistersEarly:int
parm:           NVreg_RegisterForACPIEvents:int
parm:           NVreg_RegistryDwords:charp
parm:           NVreg_RmMsg:charp
parm:           NVreg_NvAGP:int

However:

$ nvidia-smi -L
Could not open device /dev/nvidia1 (no such file)
Failed to initialize NVML: unknown error.


I am unable to draw technical conclusions from this 'unknown error'. I
wonder whether other information can be extracted to fix the problems.

Thanks for advice.

francesco




Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6    gig64  francesco
Info: Running on 6 processors, 6 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.00658393 s
Pe 2 sharing CUDA device 0 first 0 next 3
Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)'  Mem: 0MB  Rev: .
FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device 0): no
CUDA-capable device is available


-- Forwarded message --
From: Francesco Pietra chiendar...@gmail.com
Date: Wed, Jun 15, 2011 at 9:04 AM
Subject: Re: Fwd: cuda error cudastreamcreate,
To: Fabricio Cannini fabri...@versatushpc.com.br, Lennart Sorensen
lsore...@csclub.uwaterloo.ca, amd64 Debian
debian-amd64@lists.debian.org


The nvidia-smi -L  output was for a machine of Jim Phillips, the
main developer of NAMD. He provided that to show that it should also
work with my GTX 470 cards.

That said, my problems seem to have been solved by following Lennart's
indications. The driver was rebuilt, date 15 June, and NAMD simulation
could be started regularly. However, we have to wait before claiming
full victory. Please see below..

In retrospect, the nvidia.ko I had before, dated 5 June, must have
also been built within Debian. Renaming it no_nvidia.ko prevented
rebuilding for the reasons that Lennart clarified.

For some reasons, previous installation of nvidia.ko must have had
some problems, as, for example, nvidia-smi -L did not work (there
was a single installation of nvidia-smi, nvidia-smi 270.41.19-1),
while modinfo nvidia output was correct. Now, both are correct:

$ nvidia-smi -L
GPU 0: GeForce GTX 470 (UUID: N/A)
GPU 1: GeForce GTX 470 (UUID: N/A)

# modinfo nvidia
filename:       /lib/modules/2.6.38-2-amd64/updates/dkms/nvidia.ko
alias:          char-major-195-*
supported:      external
license:        NVIDIA
alias:          pci:v10DEd0E00sv*sd*bc04sc80i00*
alias:          pci:v10DEd0AA3sv*sd*bc0Bsc40i00*
alias:          pci:v10DEd*sv*sd*bc03sc02i00*
alias:          pci:v10DEd*sv*sd*bc03sc00i00*
depends:        i2c-core
vermagic:       2.6.38-2-amd64 SMP mod_unload modversions
parm:           NVreg_EnableVia4x:int
parm:           NVreg_EnableALiAGP:int
parm:           NVreg_ReqAGPRate:int
parm:           NVreg_EnableAGPSBA:int
parm:           NVreg_EnableAGPFW:int
parm:           NVreg_Mobile:int
parm

Re: Fwd: cuda error cudastreamcreate,

2011-06-15 Thread Brian Morris
Why are you using Cuda rather than OpenCL ? Nvidia has said they are cutting
back on their GPU business and moving into CPUs for tablets which are now
appearing on the market. If you have to move to AMD/ATI in the future OpenCL
will still work, but CUDA will not.



On Wed, Jun 15, 2011 at 8:22 AM, Francesco Pietra chiendar...@gmail.comwrote:

 Running nvidia-smi -L as root restores the visibility of the graphic
 cards. At any boot such visibility vanishes. So, it is a small
 problem, or no problem. francesco


 -- Forwarded message --
 From: Francesco Pietra chiendar...@gmail.com
 Date: Wed, Jun 15, 2011 at 4:37 PM
 Subject: Fwd: Fwd: cuda error cudastreamcreate,
 To: Lennart Sorensen lsore...@csclub.uwaterloo.ca, amd64 Debian
 debian-amd64@lists.debian.org


 The simulation (pressure equilibration) was completed successfully.
 Next run (just a continuation of previous pressure equilibration)
 failed, again 'Device Emulation (CPU' , see log file below. Attempted
 again, same error.

 # modinfo nvidia
 filename:   /lib/modules/2.6.38-2-amd64/updates/dkms/nvidia.ko
 alias:  char-major-195-*
 supported:  external
 license:NVIDIA
 alias:  pci:v10DEd0E00sv*sd*bc04sc80i00*
 alias:  pci:v10DEd0AA3sv*sd*bc0Bsc40i00*
 alias:  pci:v10DEd*sv*sd*bc03sc02i00*
 alias:  pci:v10DEd*sv*sd*bc03sc00i00*
 depends:i2c-core
 vermagic:   2.6.38-2-amd64 SMP mod_unload modversions
 parm:   NVreg_EnableVia4x:int
 parm:   NVreg_EnableALiAGP:int
 parm:   NVreg_ReqAGPRate:int
 parm:   NVreg_EnableAGPSBA:int
 parm:   NVreg_EnableAGPFW:int
 parm:   NVreg_Mobile:int
 parm:   NVreg_ResmanDebugLevel:int
 parm:   NVreg_RmLogonRC:int
 parm:   NVreg_ModifyDeviceFiles:int
 parm:   NVreg_DeviceFileUID:int
 parm:   NVreg_DeviceFileGID:int
 parm:   NVreg_DeviceFileMode:int
 parm:   NVreg_RemapLimit:int
 parm:   NVreg_UpdateMemoryTypes:int
 parm:   NVreg_InitializeSystemMemoryAllocations:int
 parm:   NVreg_UseVBios:int
 parm:   NVreg_RMEdgeIntrCheck:int
 parm:   NVreg_UsePageAttributeTable:int
 parm:   NVreg_EnableMSI:int
 parm:   NVreg_MapRegistersEarly:int
 parm:   NVreg_RegisterForACPIEvents:int
 parm:   NVreg_RegistryDwords:charp
 parm:   NVreg_RmMsg:charp
 parm:   NVreg_NvAGP:int

 However:

 $ nvidia-smi -L
 Could not open device /dev/nvidia1 (no such file)
 Failed to initialize NVML: unknown error.


 I am unable to draw technical conclusions from this 'unknown error'. I
 wonder whether other information can be extracted to fix the problems.

 Thanks for advice.

 francesco




 Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
 Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
 Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6gig64  francesco
 Info: Running on 6 processors, 6 nodes, 1 physical nodes.
 Info: CPU topology information available.
 Info: Charm++/Converse parallel runtime startup completed at 0.00658393 s
 Pe 2 sharing CUDA device 0 first 0 next 3
 Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'Device
 Emulation (CPU)'  Mem: 0MB  Rev: .
 FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device 0): no
 CUDA-capable device is available


 -- Forwarded message --
 From: Francesco Pietra chiendar...@gmail.com
 Date: Wed, Jun 15, 2011 at 9:04 AM
 Subject: Re: Fwd: cuda error cudastreamcreate,
 To: Fabricio Cannini fabri...@versatushpc.com.br, Lennart Sorensen
 lsore...@csclub.uwaterloo.ca, amd64 Debian
 debian-amd64@lists.debian.org


 The nvidia-smi -L  output was for a machine of Jim Phillips, the
 main developer of NAMD. He provided that to show that it should also
 work with my GTX 470 cards.

 That said, my problems seem to have been solved by following Lennart's
 indications. The driver was rebuilt, date 15 June, and NAMD simulation
 could be started regularly. However, we have to wait before claiming
 full victory. Please see below..

 In retrospect, the nvidia.ko I had before, dated 5 June, must have
 also been built within Debian. Renaming it no_nvidia.ko prevented
 rebuilding for the reasons that Lennart clarified.

 For some reasons, previous installation of nvidia.ko must have had
 some problems, as, for example, nvidia-smi -L did not work (there
 was a single installation of nvidia-smi, nvidia-smi 270.41.19-1),
 while modinfo nvidia output was correct. Now, both are correct:

 $ nvidia-smi -L
 GPU 0: GeForce GTX 470 (UUID: N/A)
 GPU 1: GeForce GTX 470 (UUID: N/A)

 # modinfo nvidia
 filename:   /lib/modules/2.6.38-2-amd64/updates/dkms/nvidia.ko
 alias:  char-major-195-*
 supported:  external
 license:NVIDIA
 alias:  pci:v10DEd0E00sv*sd*bc04sc80i00*
 alias:  pci:v10DEd0AA3sv*sd*bc0Bsc40i00

Re: cuda error cudastreamcreate,

2011-06-14 Thread Lennart Sorensen
On Tue, Jun 14, 2011 at 07:54:16AM +0200, Francesco Pietra wrote:
 Hello:
 With a gaming machine
 Gigabyte GA 890FXAUD5
 Six-core AMD PhenomII 1075T
 2x GTX 470
 Debian GNU-Linux amd64 wheezy
 
 
 I run successfully NAMD code (molecular dynamics simulations). Now I
 am having problems getting GTX 470 to work and I can't understand
 whether it is hardware or software problem, and if software the OS is
 concerned. I am submitting the same problem to NAMD, s it might be
 NAMD specific.
 
 When the code works, the top of the log file says:
 
 nfo: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
 Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
 Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6gig64  francesco
 Info: Running on 6 processors, 6 nodes, 1 physical nodes.
 Info: CPU topology information available.
 Info: Charm++/Converse parallel runtime startup completed at 0.00650811 s
 Pe 5 sharing CUDA device 1 first 1 next 1
 Pe 2 sharing CUDA device 0 first 0 next 4
 Did not find +devices i,j,k,... argument, using all
 Pe 5 physical rank 5 binding to CUDA device 1 on gig64: 'GeForce GTX
 470'  Mem: 1279MB  Rev: 2.0
 Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'GeForce GTX
 470'  Mem: 1279MB  Rev: 2.0
 Pe 0 sharing CUDA device 0 first 0 next 2
 Pe 3 sharing CUDA device 1 first 1 next 5
 Pe 1 sharing CUDA device 1 first 1 next 3
 Pe 1 physical rank 1 binding to CUDA device 1 on gig64: 'GeForce GTX
 470'  Mem: 1279MB  Rev: 2.0
 Pe 0 physical rank 0 binding to CUDA device 0 on gig64: 'GeForce GTX
 470'  Mem: 1279MB  Rev: 2.0
 Pe 3 physical rank 3 binding to CUDA device 1 on gig64: 'GeForce GTX
 470'  Mem: 1279MB  Rev: 2.0
 Pe 4 sharing CUDA device 0 first 0 next 0
 Pe 4 physical rank 4 binding to CUDA device 0 on gig64: 'GeForce GTX
 470'  Mem: 1279MB  Rev: 2.0
 Info: 1.64104 MB of memory in use based on CmiMemoryUsage
 Info: Configuration file is min-02.conf
 
 When failure:
 
 Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
 Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
 Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6gig64  francesco
 Info: Running on 6 processors, 6 nodes, 1 physical nodes.
 Info: CPU topology information available.
 Info: Charm++/Converse parallel runtime startup completed at 0.0124412 s
 Pe 5 sharing CUDA device 0 first 0 next 0
 Pe 5 physical rank 5 binding to CUDA device 0 on gig64: 'Device
 Emulation (CPU)'  Mem: 0MB  Rev: .
 FATAL ERROR: CUDA error cudaStreamCreate on Pe 5 (gig64 device 0): no
 CUDA-capable device is available
 - Processor 5 Exiting: Called CmiAbort 
 Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 5 (gig64 device
 0): no CUDA-capable device is available
 
 Did not find +devices i,j,k,... argument, using all
 Pe 0 sharing CUDA device 0 first 0 next 1
 Pe 0 physical rank 0 binding to CUDA device 0 on gig64: 'Device
 Emulation (CPU)'  Mem: 0MB  Rev: .
 Pe 3 sharing CUDA device 0 first 0 next 4
 Pe 3 physical rank 3 binding to CUDA device 0 on gig64: 'Device
 Emulation (CPU)'  Mem: 0MB  Rev: .
 Pe 1 sharing CUDA device 0 first 0 next 2
 Pe 1 physical rank 1 binding to CUDA device 0 on gig64: 'Device
 Emulation (CPU)'  Mem: 0MB  Rev: .
 FATAL ERROR: CUDA error cudaStreamCreate on Pe 0 (gig64 device 0): no
 CUDA-capable device is available
 - Processor 0 Exiting: Called CmiAbort 
 Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 0 (gig64 device
 0): no CUDA-capable device is available
 
 FATAL ERROR: CUDA error cudaStreamCreate on Pe 3 (gig64 device 0): no
 CUDA-capable device is available
 - Processor 3 Exiting: Called CmiAbort 
 Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 3 (gig64 device
 0): no CUDA-capable device is available
 
 FATAL ERROR: CUDA error cudaStreamCreate on Pe 1 (gig64 device 0): no
 CUDA-capable device is available
 - Processor 1 Exiting: Called CmiAbort 
 Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 1 (gig64 device
 0): no CUDA-capable device is available
 
 Pe 2 sharing CUDA device 0 first 0 next 3
 Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'Device
 Emulation (CPU)'  Mem: 0MB  Rev: .
 FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device 0): no
 CUDA-capable device is available
 - Processor 2 Exiting: Called CmiAbort 
 Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device
 0): no CUDA-capable device is available
 
 Pe 4 sharing CUDA device 0 first 0 next 5
 Pe 4 physical rank 4 binding to CUDA device 0 on gig64: 'Device
 Emulation (CPU)'  Mem: 0MB  Rev: .
 FATAL ERROR: CUDA error cudaStreamCreate on Pe 4 (gig64 device 0): no
 CUDA-capable device is available
 - Processor 4 Exiting: Called CmiAbort 
 Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 4 (gig64 device
 0): no CUDA-capable

Fwd: cuda error cudastreamcreate,

2011-06-14 Thread Francesco Pietra
I forgot to answer: yes, sometime it works, sometimes not, everything
being the same.

As a matter of fact, after a day of failure, I have now renamed back

/lib/modules/2.638-2-amd64/updatesdkms/no_nvidia.ko

to

/lib/modules/2.638-2-amd64/updatesdkms/nvidia.ko

and the NAMD simulation started regularly using both gtx 470. The
machine had not been touched either.

francesco


-- Forwarded message --
From: Francesco Pietra chiendar...@gmail.com
Date: Tue, Jun 14, 2011 at 6:38 PM
Subject: Re: cuda error cudastreamcreate,
To: Lennart Sorensen lsore...@csclub.uwaterloo.ca


The two gtx 470 are in place, and are seen by unix commands. However,
the specific check, like

jim@aberdeennvidia-smi -L
GPU 0: Tesla C870 (UUID:
GPU-798dee8502c5e13c-7dd72cfe-6069e259-8fd36a96-5163bf00fbbcb8e9f61eda54)
GPU 1: Tesla C870 (UUID:
GPU-ed96e9c4afb70d35-694f6869-981de52a-23e64327-917becef3aa20bfd0d66432c)
GPU 2: GeForce 9800 GTX/9800 GTX+ (UUID: N/A)

provided by NAMD people fails.

$ which nvidia-smi
/usr/bin/nvidia-smi

$ nvidia-smi -L
could not open device file /dev/nvidiaactl (no such device or address)


I renamed nvidia.ko present in
/lib/modules/2.638-2-amd64/updatesdkms/ (which was copied there from
another amd64 machine with a lower GeForce card)

# modinfo nvidia
 no  /lib/modules/2.638-2-amd64/updatesdkms/nvidia.ko

Rebooting the machine did not create the nvidia.ko, modinfo gave the
same answer. It must be that something is wrong with my installation,
PROVIDED THAT on merely rebooting should build the module. Included in
the list of installed packages are:

gcc-4.4, 4.5, 4-6
libcuda1 270.41.19-1
libgl1-nvidia-glx 270.41.19-1
libnvidia-ml1 270.41.19-1
linux-headers-2.6-amd64  (2.6.38+34)
linux-headers-2.6.38-2-amd64  (2.6.38-5)
linux-headers-2.6.38-2-common (2.6.38-5)
linux-image-2.6-amd64 (2.38+34)
linux-image-2.6-38-2-amd64 (2.6.38-5)
linux-kbuild-2.6.38 (2.6.38-1)
nvidia-cuda-dev 3.2.16.2
nvidia-cuda-toolkit 3.2.16-2
nvidia-glx 270.41.19-1
nvidia-installer-cleanup 20110515+1
nvidia-kernel-common 20110515+1
nvidia-kernel-dkms 270.41.19-1
nvidia-smi 20110515+1
nvidia-smi 270.41.19-1
nvidiasupport 20110515+1
nvidia-vdpau-driver 270.41.19-1
nvidia-xconfig 270.41.06.1

Really painful. Users in the NAMD list utilize the nvidia.ko
installation according to
http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Getting_Started_Linux.pdf;
so that they can't help much. Still, I refrain to use that method to
avoid frequent rebuilding.

Thanks
francesco

On Tue, Jun 14, 2011 at 5:57 PM, Lennart Sorensen
lsore...@csclub.uwaterloo.ca wrote:
 On Tue, Jun 14, 2011 at 07:54:16AM +0200, Francesco Pietra wrote:
 Hello:
 With a gaming machine
 Gigabyte GA 890FXAUD5
 Six-core AMD PhenomII 1075T
 2x GTX 470
 Debian GNU-Linux amd64 wheezy


 I run successfully NAMD code (molecular dynamics simulations). Now I
 am having problems getting GTX 470 to work and I can't understand
 whether it is hardware or software problem, and if software the OS is
 concerned. I am submitting the same problem to NAMD, s it might be
 NAMD specific.

 When the code works, the top of the log file says:

 nfo: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
 Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
 Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6    gig64  francesco
 Info: Running on 6 processors, 6 nodes, 1 physical nodes.
 Info: CPU topology information available.
 Info: Charm++/Converse parallel runtime startup completed at 0.00650811 s
 Pe 5 sharing CUDA device 1 first 1 next 1
 Pe 2 sharing CUDA device 0 first 0 next 4
 Did not find +devices i,j,k,... argument, using all
 Pe 5 physical rank 5 binding to CUDA device 1 on gig64: 'GeForce GTX
 470'  Mem: 1279MB  Rev: 2.0
 Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'GeForce GTX
 470'  Mem: 1279MB  Rev: 2.0
 Pe 0 sharing CUDA device 0 first 0 next 2
 Pe 3 sharing CUDA device 1 first 1 next 5
 Pe 1 sharing CUDA device 1 first 1 next 3
 Pe 1 physical rank 1 binding to CUDA device 1 on gig64: 'GeForce GTX
 470'  Mem: 1279MB  Rev: 2.0
 Pe 0 physical rank 0 binding to CUDA device 0 on gig64: 'GeForce GTX
 470'  Mem: 1279MB  Rev: 2.0
 Pe 3 physical rank 3 binding to CUDA device 1 on gig64: 'GeForce GTX
 470'  Mem: 1279MB  Rev: 2.0
 Pe 4 sharing CUDA device 0 first 0 next 0
 Pe 4 physical rank 4 binding to CUDA device 0 on gig64: 'GeForce GTX
 470'  Mem: 1279MB  Rev: 2.0
 Info: 1.64104 MB of memory in use based on CmiMemoryUsage
 Info: Configuration file is min-02.conf

 When failure:

 Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
 Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
 Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6    gig64  francesco
 Info: Running on 6 processors, 6 nodes, 1 physical nodes.
 Info: CPU topology information available.
 Info: Charm++/Converse parallel runtime startup completed at 0.0124412 s
 Pe 5 sharing CUDA device 0 first 0 next 0
 Pe 5

Re: Fwd: cuda error cudastreamcreate,

2011-06-14 Thread Lennart Sorensen
On Tue, Jun 14, 2011 at 07:23:38PM +0200, Francesco Pietra wrote:
 I forgot to answer: yes, sometime it works, sometimes not, everything
 being the same.
 
 As a matter of fact, after a day of failure, I have now renamed back
 
 /lib/modules/2.638-2-amd64/updatesdkms/no_nvidia.ko
 
 to
 
 /lib/modules/2.638-2-amd64/updatesdkms/nvidia.ko
 
 and the NAMD simulation started regularly using both gtx 470. The
 machine had not been touched either.

I wonder if having the 9800 card in there along with the 470 gtx cards
is confusing the driver.  Maybe the card order is getting swapped around
on some boots.

What is the 9800 doing in the box anyhow?

-- 
Len Sorensen


-- 
To UNSUBSCRIBE, email to debian-amd64-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20110614190157.gf21...@caffeine.csclub.uwaterloo.ca



Re: Fwd: cuda error cudastreamcreate,

2011-06-14 Thread Fabricio Cannini
Em terça-feira 14 junho 2011, às 16:01:57, Lennart Sorensen escreveu:
 On Tue, Jun 14, 2011 at 07:23:38PM +0200, Francesco Pietra wrote:
  I forgot to answer: yes, sometime it works, sometimes not, everything
  being the same.
  
  As a matter of fact, after a day of failure, I have now renamed back
  
  /lib/modules/2.638-2-amd64/updatesdkms/no_nvidia.ko
  
  to
  
  /lib/modules/2.638-2-amd64/updatesdkms/nvidia.ko
  
  and the NAMD simulation started regularly using both gtx 470. The
  machine had not been touched either.
 
 I wonder if having the 9800 card in there along with the 470 gtx cards
 is confusing the driver.  Maybe the card order is getting swapped around
 on some boots.
 
 What is the 9800 doing in the box anyhow?

Hi All.

I'm thinking the same as Lennart. It seems to me that the order which the 
cards are named varies, thus confusing the application( s ). I'd try to fix the 
order in /etc/X11/xorg.conf and see if it works. Look in the cuda docs how to 
do that.

Good luck.


--
To UNSUBSCRIBE, email to debian-amd64-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201106142122.04376.fcann...@gmail.com



cuda error cudastreamcreate,

2011-06-13 Thread Francesco Pietra
Hello:
With a gaming machine
Gigabyte GA 890FXAUD5
Six-core AMD PhenomII 1075T
2x GTX 470
Debian GNU-Linux amd64 wheezy


I run successfully NAMD code (molecular dynamics simulations). Now I
am having problems getting GTX 470 to work and I can't understand
whether it is hardware or software problem, and if software the OS is
concerned. I am submitting the same problem to NAMD, s it might be
NAMD specific.

When the code works, the top of the log file says:

nfo: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6gig64  francesco
Info: Running on 6 processors, 6 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.00650811 s
Pe 5 sharing CUDA device 1 first 1 next 1
Pe 2 sharing CUDA device 0 first 0 next 4
Did not find +devices i,j,k,... argument, using all
Pe 5 physical rank 5 binding to CUDA device 1 on gig64: 'GeForce GTX
470'  Mem: 1279MB  Rev: 2.0
Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'GeForce GTX
470'  Mem: 1279MB  Rev: 2.0
Pe 0 sharing CUDA device 0 first 0 next 2
Pe 3 sharing CUDA device 1 first 1 next 5
Pe 1 sharing CUDA device 1 first 1 next 3
Pe 1 physical rank 1 binding to CUDA device 1 on gig64: 'GeForce GTX
470'  Mem: 1279MB  Rev: 2.0
Pe 0 physical rank 0 binding to CUDA device 0 on gig64: 'GeForce GTX
470'  Mem: 1279MB  Rev: 2.0
Pe 3 physical rank 3 binding to CUDA device 1 on gig64: 'GeForce GTX
470'  Mem: 1279MB  Rev: 2.0
Pe 4 sharing CUDA device 0 first 0 next 0
Pe 4 physical rank 4 binding to CUDA device 0 on gig64: 'GeForce GTX
470'  Mem: 1279MB  Rev: 2.0
Info: 1.64104 MB of memory in use based on CmiMemoryUsage
Info: Configuration file is min-02.conf

When failure:

Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6gig64  francesco
Info: Running on 6 processors, 6 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.0124412 s
Pe 5 sharing CUDA device 0 first 0 next 0
Pe 5 physical rank 5 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)'  Mem: 0MB  Rev: .
FATAL ERROR: CUDA error cudaStreamCreate on Pe 5 (gig64 device 0): no
CUDA-capable device is available
- Processor 5 Exiting: Called CmiAbort 
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 5 (gig64 device
0): no CUDA-capable device is available

Did not find +devices i,j,k,... argument, using all
Pe 0 sharing CUDA device 0 first 0 next 1
Pe 0 physical rank 0 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)'  Mem: 0MB  Rev: .
Pe 3 sharing CUDA device 0 first 0 next 4
Pe 3 physical rank 3 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)'  Mem: 0MB  Rev: .
Pe 1 sharing CUDA device 0 first 0 next 2
Pe 1 physical rank 1 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)'  Mem: 0MB  Rev: .
FATAL ERROR: CUDA error cudaStreamCreate on Pe 0 (gig64 device 0): no
CUDA-capable device is available
- Processor 0 Exiting: Called CmiAbort 
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 0 (gig64 device
0): no CUDA-capable device is available

FATAL ERROR: CUDA error cudaStreamCreate on Pe 3 (gig64 device 0): no
CUDA-capable device is available
- Processor 3 Exiting: Called CmiAbort 
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 3 (gig64 device
0): no CUDA-capable device is available

FATAL ERROR: CUDA error cudaStreamCreate on Pe 1 (gig64 device 0): no
CUDA-capable device is available
- Processor 1 Exiting: Called CmiAbort 
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 1 (gig64 device
0): no CUDA-capable device is available

Pe 2 sharing CUDA device 0 first 0 next 3
Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)'  Mem: 0MB  Rev: .
FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device 0): no
CUDA-capable device is available
- Processor 2 Exiting: Called CmiAbort 
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device
0): no CUDA-capable device is available

Pe 4 sharing CUDA device 0 first 0 next 5
Pe 4 physical rank 4 binding to CUDA device 0 on gig64: 'Device
Emulation (CPU)'  Mem: 0MB  Rev: .
FATAL ERROR: CUDA error cudaStreamCreate on Pe 4 (gig64 device 0): no
CUDA-capable device is available
- Processor 4 Exiting: Called CmiAbort 
Reason: FATAL ERROR: CUDA error cudaStreamCreate on Pe 4 (gig64 device
0): no CUDA-capable device is available

[0] Stack Traceback:



In both cases:

/var/lib/dkms/nvidia/270.41.19/2.6.38-2-amd64/x86_64/module/nvidia.ko

/lib