[boinc_dev] [patch] mystery solved? Aw: Re: BOINC having too many open files - failure in opendir()

2013-05-18 Thread Steffen Möller
Dear all,

I skimmed through all invocations of (boinc_)?fopen() in api/ and lib/, seeking 
the respective matching fclose().
What I found missing I placed here
http://anonscm.debian.org/gitweb/?p=pkg-boinc/boinc.git;a=blob;f=debian/patches/fopen_closing.patch;hb=HEAD
as a patch. The trickiest and possibly the most important one is the omission 
of a close in the destructor of the MFILE class.

Cheers,

Steffen


 Gesendet: Freitag, 17. Mai 2013 um 22:00 Uhr
 Von: Nicolás Alvarez nicolas.alva...@gmail.com
 An: boinc_dev@ssl.berkeley.edu boinc_dev@ssl.berkeley.edu
 Betreff: Re: [boinc_dev] BOINC having too many open files - failure in 
 opendir()

 Get the list of open files (ls -l /proc/$(pidof boinc)/fd) when that
 happens. Does the client die after that last fopen() failure? Maybe
 you could write a script to log the open file list every few minutes.

 --
 Nicolás

 2013/5/16 Steffen Möller steffen_moel...@gmx.de:
  Dear all,
 
  every few months I get an error like the one below (taken from the 
  stdoutdae.txt) the report too many open files. This is see for about three 
  years on several Linux machines, I only recall such with many cores (12 or 
  24), though, Opterons and Xeons alike. Is anything jumping at you where to 
  look?
 
  Cheers,
 
  Steffen
 
  16-May-2013 16:58:33 [World Community Grid] Sending scheduler request: To 
  fetch work.
  16-May-2013 16:58:33 [World Community Grid] Requesting new tasks for CPU
  16-May-2013 16:58:36 [World Community Grid] Scheduler request completed: 
  got 0 new tasks
  16-May-2013 16:58:36 [World Community Grid] No tasks sent
  16-May-2013 16:58:36 [World Community Grid] No tasks are available for The 
  Clean Energy Project - Phase 2
  16-May-2013 16:58:36 [World Community Grid] No tasks are available for the 
  applications you have selected.
  16-May-2013 16:58:42 [Einstein@Home] Sending scheduler request: To fetch 
  work.
  16-May-2013 16:58:42 [Einstein@Home] Reporting 4 completed tasks
  16-May-2013 16:58:42 [Einstein@Home] Requesting new tasks for CPU
  16-May-2013 16:58:46 [Einstein@Home] Scheduler request completed: got 1 new 
  tasks
  16-May-2013 17:15:53 [Einstein@Home] Sending scheduler request: To fetch 
  work.
  16-May-2013 17:15:53 [Einstein@Home] Requesting new tasks for CPU
  16-May-2013 17:15:56 [Einstein@Home] Scheduler request completed: got 1 new 
  tasks
  16-May-2013 17:30:11 [World Community Grid] Can't get task disk usage: 
  opendir() failed
  16-May-2013 17:30:11 [Einstein@Home] Can't get task disk usage: opendir() 
  failed
  16-May-2013 17:30:11 [Einstein@Home] Can't get task disk usage: opendir() 
  failed
  16-May-2013 17:30:11 [Einstein@Home] Can't get task disk usage: opendir() 
  failed
  16-May-2013 17:30:11 [Einstein@Home] Can't get task disk usage: opendir() 
  failed
  16-May-2013 17:30:11 [Einstein@Home] Can't get task disk usage: opendir() 
  failed
  16-May-2013 17:30:11 [Einstein@Home] Can't get task disk usage: opendir() 
  failed
  16-May-2013 17:30:11 [Einstein@Home] Can't get task disk usage: opendir() 
  failed
  16-May-2013 17:30:11 [Einstein@Home] Can't get task disk usage: opendir() 
  failed
  16-May-2013 17:32:31 [Einstein@Home] read_stderr_file(): malloc() failed
  16-May-2013 17:32:31 [Einstein@Home] Computation for task 
  LATeah0024U_80.0_500_-4.66e-10_1 finished
  16-May-2013 17:32:31 [Einstein@Home] md5_file failed for 
  projects/einstein.phys.uwm.edu/einstein_S6BucketLVE_1.04_i686-pc-linux-gnu__SSE2:
   fopen() failed
  16-May-2013 17:32:31 [---] Can't open client_state_next.xml: fopen() failed
  16-May-2013 17:32:31 [---] Couldn't write state file: fopen() failed; 
  giving up
 ___
 boinc_dev mailing list
 boinc_dev@ssl.berkeley.edu
 http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
 To unsubscribe, visit the above URL and
 (near bottom of page) enter your email address.
___
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] [patch] mystery solved? Aw: Re: BOINC having too many open files - failure in opendir()

2013-05-18 Thread David Anderson

As far as I can tell, none of these changes would fix
a file descriptor leak in the client.
We need a system-call trace that shows open()s.

It's also possible that the system is running out of file descriptors
because of software other than BOINC.

-- David

On 18-May-2013 4:00 AM, Steffen Möller wrote:

Dear all,

I skimmed through all invocations of (boinc_)?fopen() in api/ and lib/,
seeking the respective matching fclose(). What I found missing I placed here
http://anonscm.debian.org/gitweb/?p=pkg-boinc/boinc.git;a=blob;f=debian/patches/fopen_closing.patch;hb=HEAD


as a patch. The trickiest and possibly the most important one is the omission of 
a close in the destructor of the MFILE class.


Cheers,

Steffen



Gesendet: Freitag, 17. Mai 2013 um 22:00 Uhr Von: Nicolás Alvarez
nicolas.alva...@gmail.com An: boinc_dev@ssl.berkeley.edu
boinc_dev@ssl.berkeley.edu Betreff: Re: [boinc_dev] BOINC having too many
open files - failure in opendir()

Get the list of open files (ls -l /proc/$(pidof boinc)/fd) when that
happens. Does the client die after that last fopen() failure? Maybe you
could write a script to log the open file list every few minutes.

-- Nicolás

2013/5/16 Steffen Möller steffen_moel...@gmx.de:

Dear all,

every few months I get an error like the one below (taken from the
stdoutdae.txt) the report too many open files. This is see for about
three years on several Linux machines, I only recall such with many cores
(12 or 24), though, Opterons and Xeons alike. Is anything jumping at you
where to look?

Cheers,

Steffen

16-May-2013 16:58:33 [World Community Grid] Sending scheduler request: To
fetch work. 16-May-2013 16:58:33 [World Community Grid] Requesting new
tasks for CPU 16-May-2013 16:58:36 [World Community Grid] Scheduler
request completed: got 0 new tasks 16-May-2013 16:58:36 [World Community
Grid] No tasks sent 16-May-2013 16:58:36 [World Community Grid] No tasks
are available for The Clean Energy Project - Phase 2 16-May-2013 16:58:36
[World Community Grid] No tasks are available for the applications you
have selected. 16-May-2013 16:58:42 [Einstein@Home] Sending scheduler
request: To fetch work. 16-May-2013 16:58:42 [Einstein@Home] Reporting 4
completed tasks 16-May-2013 16:58:42 [Einstein@Home] Requesting new tasks
for CPU 16-May-2013 16:58:46 [Einstein@Home] Scheduler request completed:
got 1 new tasks 16-May-2013 17:15:53 [Einstein@Home] Sending scheduler
request: To fetch work. 16-May-2013 17:15:53 [Einstein@Home] Requesting
new tasks for CPU 16-May-2013 17:15:56 [Einstein@Home] Scheduler request
completed: got 1 new tasks 16-May-2013 17:30:11 [World Community Grid]
Can't get task disk usage: opendir() failed 16-May-2013 17:30:11
[Einstein@Home] Can't get task disk usage: opendir() failed 16-May-2013
17:30:11 [Einstein@Home] Can't get task disk usage: opendir() failed
16-May-2013 17:30:11 [Einstein@Home] Can't get task disk usage: opendir()
failed 16-May-2013 17:30:11 [Einstein@Home] Can't get task disk usage:
opendir() failed 16-May-2013 17:30:11 [Einstein@Home] Can't get task disk
usage: opendir() failed 16-May-2013 17:30:11 [Einstein@Home] Can't get
task disk usage: opendir() failed 16-May-2013 17:30:11 [Einstein@Home]
Can't get task disk usage: opendir() failed 16-May-2013 17:30:11
[Einstein@Home] Can't get task disk usage: opendir() failed 16-May-2013
17:32:31 [Einstein@Home] read_stderr_file(): malloc() failed 16-May-2013
17:32:31 [Einstein@Home] Computation for task
LATeah0024U_80.0_500_-4.66e-10_1 finished 16-May-2013 17:32:31
[Einstein@Home] md5_file failed for
projects/einstein.phys.uwm.edu/einstein_S6BucketLVE_1.04_i686-pc-linux-gnu__SSE2:
fopen() failed 16-May-2013 17:32:31 [---] Can't open
client_state_next.xml: fopen() failed 16-May-2013 17:32:31 [---] Couldn't
write state file: fopen() failed; giving up

___ boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe,
visit the above URL and (near bottom of page) enter your email address.

___ boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe,
visit the above URL and (near bottom of page) enter your email address.


___
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.