Re: [boinc_dev] [boinc_alpha] BOINC re-using slot directories without ensuring they're empty

2015-06-10 Thread Richard Haselgrove
Rom, if you could build a private drop, I'll report what the log says. 


 On Wednesday, 10 June 2015, 4:28, David Anderson da...@ssl.berkeley.edu 
wrote:
   
 

  I added a log message that may help a bit.
 I'd like to track this down, even though it's minor.
 -- David
 
 On 19-May-2015 12:15 PM, Richard Haselgrove wrote:
  
  OK, the delay happened again, and I captured a procmon log. 
  Copy of the BOINC log attached (period of interest is 19:35:30 to 19:35:41): 
also a simple extract of ProcMon for the same period. It has to be said, 
boinc.exe was doing surprisingly little. 
  I have kept the full ~200 MB native ProcMon log, which can be re-filtered to 
look for anything else of interest, if you can suggest some likely targets. 
 
 
   On Monday, 18 May 2015, 20:57, David Anderson da...@ssl.berkeley.edu 
wrote:
   
 
 
 That looks like what's needed.
 Richard, if you can repro the inter-job delay,
 you could try using Process Monitor to capture as much
 as possible from the client during that period.
 -- David
 
 On 18-May-2015 11:12 AM, Jacob Klein wrote:
  Process Monitor can be used to watch the things a process does (you have 
  to set 
  up correct filters, etc.)... but I'm not sure if that includes sleeps. But 
  if the 
  process is waiting on a file or something, though, it should be able to tell 
  you. 
  Worth looking into.
 
  https://technet.microsoft.com/en-us/library/bb896645.aspx
 
  Regards,
  Jacob
 
 
 
  Date: Mon, 18 May 2015 10:41:16 -0700
  From: da...@ssl.berkeley.edu
  To: r.haselgr...@btopenworld.com; onec...@hotmail.com; jacob_w_kl...@msn.com
  CC: boinc_dev@ssl.berkeley.edu
  Subject: Re: [boinc_dev] [boinc_alpha] BOINC re-using slot directories 
  without 
  ensuring they're empty
 
  I looked at this and couldn't figure out the source of the 12-sec delay.
  In general, delays could happen because
  1) the client does something that takes a long time (like copying a 5 GB 
  file)
  2) the client sleeps (i.e. calls boinc_sleep()).
     It does this in a few situations,
     like backing off and retrying a file system operation.
  But there's no indication that either of these is happening here.
 
  Does Windows have a way of logging the system calls that a process makes
  (like strace on Unix)?
  If so that might reveal what the client is doing during those 12 seconds.
 
  -- David
 
  On 16-May-2015 8:01 AM, Richard Haselgrove wrote:
 
     Here is the message log file for a GPUGrid task finish. The 12-second 
 delay
     appears again between 14:26:35 and 14:26:47 - that's after the slot 
 directory
     has been cleared, and the exiting task has changed state from 'running' to
     'uploading'. Two new tasks have been assigned to the GPU, but their 
 (small)
     startup files have not yet been linked to their respective slot 
 directories.
 
     I also attach directory listings for the slot and GPUGrid project folders 
 at
     various stages of the cleanup: the slot held 34 files totalling 44,186,727
     bytes, which doesn't sound excessive: the largest file deletion 
 (94,783,960
     bytes) occurred several minutes later, when that file finished uploading.
 
     I'll enable similar logging and watch what happens when the next GPUGrid 
 task
     starts up, but from memory, the disruption to BOINC is less severe at 
 startup.
 
 
 
     On Tuesday, 12 May 2015, 23:29, David Anderson da...@ssl.berkeley.edu
     mailto:da...@ssl.berkeley.edu wrote:
 
 
 
         BTW: the client isn't completely single-threaded;
         it uses a separate thread to do CPU throttling.
         It would be feasible to also use separate threads
         for serving GUI RPC connections,
         which would allow client to remain responsive even while
         e.g. copying thousands of files to a slot dir.
         -- David
 
         On 12-May-2015 2:40 AM, Seke Rob wrote:
          Reminds me of the Clean Energy Project, Phase 2 and why we have
         app_config and
          max_concurrent and a default control of allowing 1 'In Progress' 
 on a
         host. This
          project sets up in slot copying near 6700 files [symlinking proposed
         long ago as
          is done on several other WCG projects for the static files]. If more
         than one CEP2
          is started the machine feels at times like a snail, responsiveness 
 of
         the BOINC
          manager is poor, many a time the less powerful systems incurring 
 error
         zero status
          exits or total fail. On an 8 core observed it could take over an 
 hour
         before
          actual computing commenced [CPU time logged]. Boot cycle requires 
 manually
          starting of tasks one by one. Kevin Reed few years ago raised a 
 ticket for
          staggered starting, where the models can reach several GB and 
 bigger in the
          coming. At any rate, as much as these 6700 files are copied, 

Re: [boinc_dev] [boinc_alpha] BOINC re-using slot directories without ensuring they're empty

2015-06-10 Thread Rom Walton
Here is a private drop:
http://boinc.berkeley.edu/dl/boinc.100615.x64.zip

- Rom

-Original Message-
From: boinc_dev [mailto:boinc_dev-boun...@ssl.berkeley.edu] On Behalf Of 
Richard Haselgrove
Sent: Wednesday, June 10, 2015 3:34 AM
To: David Anderson; Jacob Klein; Seke Rob
Cc: BOINC Development
Subject: Re: [boinc_dev] [boinc_alpha] BOINC re-using slot directories without 
ensuring they're empty

Rom, if you could build a private drop, I'll report what the log says. 


 On Wednesday, 10 June 2015, 4:28, David Anderson da...@ssl.berkeley.edu 
wrote:
   
 

  I added a log message that may help a bit.
 I'd like to track this down, even though it's minor.
 -- David
 
 On 19-May-2015 12:15 PM, Richard Haselgrove wrote:
  
  OK, the delay happened again, and I captured a procmon log. 
  Copy of the BOINC log attached (period of interest is 19:35:30 to 19:35:41): 
also a simple extract of ProcMon for the same period. It has to be said, 
boinc.exe was doing surprisingly little. 
  I have kept the full ~200 MB native ProcMon log, which can be re-filtered to 
look for anything else of interest, if you can suggest some likely targets. 
 
 
   On Monday, 18 May 2015, 20:57, David Anderson da...@ssl.berkeley.edu 
wrote:
   
 
 
 That looks like what's needed.
 Richard, if you can repro the inter-job delay,  you could try using Process 
Monitor to capture as much  as possible from the client during that period.
 -- David
 
 On 18-May-2015 11:12 AM, Jacob Klein wrote:
  Process Monitor can be used to watch the things a process does (you have 
  to set   up correct filters, etc.)... but I'm not sure if that includes 
  sleeps. But if the   process is waiting on a file or something, though, it 
  should be able to tell you. 
  Worth looking into.
 
  https://technet.microsoft.com/en-us/library/bb896645.aspx
 
  Regards,
  Jacob
 
 
 
  Date: Mon, 18 May 2015 10:41:16 -0700   From: da...@ssl.berkeley.edu   To: 
  r.haselgr...@btopenworld.com; onec...@hotmail.com; jacob_w_kl...@msn.com   
  CC: boinc_dev@ssl.berkeley.edu   Subject: Re: [boinc_dev] [boinc_alpha] 
  BOINC re-using slot directories without   ensuring they're empty I 
  looked at this and couldn't figure out the source of the 12-sec delay.
  In general, delays could happen because   1) the client does something that 
  takes a long time (like copying a 5 GB file)   2) the client sleeps (i.e. 
  calls boinc_sleep()).
     It does this in a few situations,
     like backing off and retrying a file system operation.
  But there's no indication that either of these is happening here.
 
  Does Windows have a way of logging the system calls that a process makes   
  (like strace on Unix)?
  If so that might reveal what the client is doing during those 12 seconds.
 
  -- David
 
  On 16-May-2015 8:01 AM, Richard Haselgrove wrote:
 
     Here is the message log file for a GPUGrid task finish. The 12-second 
 delay      appears again between 14:26:35 and 14:26:47 - that's after the 
 slot directory      has been cleared, and the exiting task has changed state 
 from 'running' to      'uploading'. Two new tasks have been assigned to the 
 GPU, but their (small)      startup files have not yet been linked to their 
 respective slot directories.
 
     I also attach directory listings for the slot and GPUGrid project folders 
 at      various stages of the cleanup: the slot held 34 files totalling 
 44,186,727      bytes, which doesn't sound excessive: the largest file 
 deletion (94,783,960      bytes) occurred several minutes later, when that 
 file finished uploading.
 
     I'll enable similar logging and watch what happens when the next GPUGrid 
 task      starts up, but from memory, the disruption to BOINC is less severe 
 at startup.
 
 
 
     On Tuesday, 12 May 2015, 23:29, David Anderson da...@ssl.berkeley.edu  
     mailto:da...@ssl.berkeley.edu wrote:
 
 
 
         BTW: the client isn't completely single-threaded;          it uses a 
 separate thread to do CPU throttling.
         It would be feasible to also use separate threads          for 
 serving GUI RPC connections,          which would allow client to remain 
 responsive even while          e.g. copying thousands of files to a slot dir.
         -- David
 
         On 12-May-2015 2:40 AM, Seke Rob wrote:
          Reminds me of the Clean Energy Project, Phase 2 and why we have    
       app_config and           max_concurrent and a default control of 
 allowing 1 'In Progress' on a          host. This           project sets 
 up in slot copying near 6700 files [symlinking proposed          long ago as 
           is done on several other WCG projects for the static files]. If 
 more          than one CEP2           is started the machine feels at 
 times like a snail, responsiveness of          the BOINC           manager 
 is poor, many a time the less powerful systems incurring error         

Re: [boinc_dev] I: Question about changes to boinc/7.4.7+dfsg-1exp1

2015-06-10 Thread David Anderson

Fixed.
-- David

On 03-Apr-2015 1:40 AM, Gianfranco Costamagna wrote:

Hi Boinc developers,

Michael Tautschnig, has discovered a bug in the zip code with a really nice 
code checker tool and reported on debian bug 747964

https://bugs.debian.org/747964

can you please apply the attached patch from him?

Have many thanks,

Gianfranco




Il Giovedì 2 Aprile 2015 10:41, Michael Tautschnig m...@debian.org ha scritto:
Hi,

Many thanks for getting back so quickly.

On Thu, Apr 02, 2015 at  8:09:57 +, Gianfranco Costamagna wrote:
[...]

Yes, IIRC upstream told me the code was actually not used by boinc, it was a 
bundled zip library, but we should use a subset of it,
and not the line above.

But this is upstream, not me :)

http://lists.ssl.berkeley.edu/pipermail/boinc_dev/2014-May/020956.html

I don't remember the exact mail, I just found the thread above...

Do you still have the build failure?

I might add the patch again if needed!


I am attaching a patch that more or less has the effect of your proposal in that
thread. Yet if I understand the response in

http://lists.ssl.berkeley.edu/pipermail/boinc_dev/2014-May/020958.html

correctly, one should rather change the call so as not to do any results
checking?

Anyway, the attached patch makes things compile in a consistent manner. It it's
not more broken than the existing code :-)

Best,

Michael


___
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.


___
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.