Re: [boinc_dev] BOINC 7.0.64 weirdness?
Hello, I've seen this on 7.1.1 as well. The machine was attached to a mix of CPU only and GPU projects. It ran normally for a long time but later I switched to the boinc manager and found it looping. It skipped the CPU only projects and only looped through the 4 GPU projects asking for 0 seconds of CPU and 0 seconds of GPU work. I stopped and restarted boinc to get it to stop looping asking for 0 seconds of work. It only happened that one time. David Ball Hi! FWIW, we get reports of this as well on Einstein@Home, this one is for BOINC 7.0.62: http://einstein.phys.uwm.edu/forum_thread.php?id=10134nowrap=true#124926 On the server side it appears to be a request for 0 seconds of CPU and 0 seconds of GPU work Confirmed. We see this in the scheduler log: 2013-06-03 10:46:02.4045 [PID=30856] Request: [USER#x] [HOST#6119565] [IP xxx.xxx.xxx.38] client 7.0.62 2013-06-03 10:46:02.4187 [PID=30856] [send] effective_ncpus 1 max_jobs_on_host_cpu 99 max_jobs_on_host 99 2013-06-03 10:46:02.4187 [PID=30856] [send] effective_ngpus 1 max_jobs_on_host_gpu 99 2013-06-03 10:46:02.4187 [PID=30856] [send] Not using matchmaker scheduling; Not using EDF sim 2013-06-03 10:46:02.4187 [PID=30856] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00 2013-06-03 10:46:02.4187 [PID=30856] [send] CUDA: req 0.00 sec, 0.00 instances; est delay 0.00 2013-06-03 10:46:02.4187 [PID=30856] [send] work_req_seconds: 0.00 secs 2013-06-03 10:46:02.4187 [PID=30856] [send] available disk 23.89 GB, work_buf_min 0 2013-06-03 10:46:02.4187 [PID=30856] [send] active_frac 0.99 on_frac 0.44 DCF 1.226839 2013-06-03 10:46:02.4222 [PID=30856] Sending reply to [HOST#6119565]: 0 results, delay req 60.00 2013-06-03 10:46:02.4225 [PID=30856] Scheduler ran 0.024 seconds The polling interval seems to be once per minute. Cheers HBE - Heinz-Bernd Eggenstein Max Planck Institute for Gravitational Physics Callinstrasse 38 D-30167 Hannover, Germany Tel.: +49-511-762-19466 (Room 037) From: Eric J Korpela korp...@ssl.berkeley.edu To: boinc_dev@ssl.berkeley.edu boinc_dev@ssl.berkeley.edu, Date: 06/03/2013 07:17 PM Subject:[boinc_dev] BOINC 7.0.64 weirdness? Sent by:boinc_dev boinc_dev-boun...@ssl.berkeley.edu Some BOINC v7 clients are getting into a weird state where they contact the server every few minutes to request no work. I haven't been able to reproduce it, but people have reported that it goes away when they select read config file, even if they don't have a config file. Here's a not very detailed log that someone sent me. On the server side it appears to be a request for 0 seconds of CPU and 0 seconds of GPU work (essentially the same as requesting an update when no work is required). 5/31/2013 10:30:23 AM | SETI@home | Starting task 23jn12ab.24163.21032.12.11.50_1 using setiathome_enhanced version 609 (cuda23) in slot 2 5/31/2013 10:30:25 AM | SETI@home | Started upload of 23oc12ac.25717.18472.12.11.50_0_0 5/31/2013 10:30:28 AM | SETI@home | Finished upload of 23oc12ac.25717.18472.12.11.50_0_0 5/31/2013 11:31:14 AM | SETI@home | Computation for task 23jn12ab.24163.21032.12.11.50_1 finished 5/31/2013 11:31:14 AM | SETI@home | Starting task 23oc12ac.25646.13973.15.12.45_0 using setiathome_v7 version 700 (cuda32) in slot 2 5/31/2013 11:31:18 AM | SETI@home | Started upload of 23jn12ab.24163.21032.12.11.50_1_0 5/31/2013 11:31:25 AM | SETI@home | Finished upload of 23jn12ab.24163.21032.12.11.50_1_0 5/31/2013 11:31:28 AM | SETI@home | Sending scheduler request: To fetch work. 5/31/2013 11:31:28 AM | SETI@home | Reporting 2 completed tasks 5/31/2013 11:31:28 AM | SETI@home | Not requesting tasks 5/31/2013 11:31:30 AM | SETI@home | Scheduler request completed 5/31/2013 11:33:26 AM | SETI@home | Computation for task 26mr10ab.24819.17826.5.11.138_0 finished 5/31/2013 11:33:26 AM | SETI@home | Starting task 23oc12ac.25646.13973.15.12.40_0 using setiathome_v7 version 700 in slot 1 5/31/2013 11:33:29 AM | SETI@home | Started upload of 26mr10ab.24819.17826.5.11.138_0_0 5/31/2013 11:33:32 AM | SETI@home | Finished upload of 26mr10ab.24819.17826.5.11.138_0_0 5/31/2013 11:36:35 AM | SETI@home | Sending scheduler request: To fetch work. 5/31/2013 11:36:35 AM | SETI@home | Reporting 1 completed tasks 5/31/2013 11:36:35 AM | SETI@home | Not requesting tasks 5/31/2013 11:36:37 AM | SETI@home | Scheduler request completed 5/31/2013 1:28:09 PM | SETI@home | Sending scheduler request: To fetch work. 5/31/2013 1:28:09 PM | SETI@home | Not requesting tasks 5/31/2013 1:28:11 PM | SETI@home | Scheduler request completed 5/31/2013 1:33:15 PM | SETI@home | Sending scheduler request: To fetch work. 5/31/2013 1:33:15 PM | SETI@home | Not requesting tasks 5/31/2013 1:33:19 PM | SETI@home | Scheduler request completed 5/31/2013 1:38:23 PM | SETI@home | Sending
[boinc_dev] Problems with 7.1.1 work fetch on projects set to No new tasks
Yesterday, I upgraded to 7.1.1 and had to set the constellation project to no new tasks. I would have suspended it but it's work units take about 10 hours and it only had about 2 hours to go on the one work unit it had. Sometime late yesterday or overnight, it finished that work unit. However, even with no new tasks set, it fetched another work unit at 11:11:41. I've included the relevant section of stdoutdae.txt below. I included enough before and after the work fetch so you can see that it definitely had no new tasks set. FYI, the reason I set Constellation to no new tasks was because after the upgrade to 7.1.1, it started executing Constellation work units as NCI aqain, which it had not done in 7.0.64, although it had this same problem in some earlier versions of 7.0. Constellation is not NCI so this caused an extra work unit to be executed on my C2D E6420, which slowed down other work units and anything that ran on the system. I aborted the new Constellation work unit if fetched at 11:11:41 and suspended Constellation now that it has no work units. Here is the section of stdoutdae.txt where it fetched a work unit while set to no new tasks : 21-May-2013 11:11:34 [---] [work_fetch] --- start work fetch state --- 21-May-2013 11:11:34 [---] [work_fetch] target work buffer: 83808.00 + 11232.00 sec 21-May-2013 11:11:34 [---] [work_fetch] --- project states --- 21-May-2013 11:11:34 [The Lattice Project] [work_fetch] REC 0.000 prio 0.00 can't req work: suspended via Manager 21-May-2013 11:11:34 [superlinkattechnion] [work_fetch] REC 0.000 prio 0.00 can't req work: suspended via Manager 21-May-2013 11:11:34 [MindModeling@Beta] [work_fetch] REC 0.000 prio 0.00 can't req work: suspended via Manager 21-May-2013 11:11:34 [Constellation] [work_fetch] REC 24.318 prio -0.00 can't req work: no new tasks requested via Manager 21-May-2013 11:11:34 [Docking] [work_fetch] REC 70.270 prio -0.053371 can req work 21-May-2013 11:11:34 [malariacontrol.net] [work_fetch] REC 35.233 prio -0.053520 can req work 21-May-2013 11:11:34 [rosetta@home] [work_fetch] REC 28.369 prio -0.055149 can req work 21-May-2013 11:11:34 [correlizer] [work_fetch] REC 27.663 prio -0.055464 can req work 21-May-2013 11:11:34 [eon2] [work_fetch] REC 14.738 prio -0.055967 can req work 21-May-2013 11:11:34 [World Community Grid] [work_fetch] REC 275.469 prio -0.059589 can req work 21-May-2013 11:11:34 [NumberFields@home] [work_fetch] REC 15.186 prio -0.060085 can req work 21-May-2013 11:11:34 [SZTAKI Desktop Grid] [work_fetch] REC 15.104 prio -0.063915 can req work 21-May-2013 11:11:34 [boincsimap] [work_fetch] REC 32.650 prio -0.070539 can req work 21-May-2013 11:11:34 [ibercivis] [work_fetch] REC 14.749 prio -0.074577 can req work 21-May-2013 11:11:34 [fightmalaria@home] [work_fetch] REC 10.813 prio -0.082123 can req work 21-May-2013 11:11:34 [Asteroids@home] [work_fetch] REC 16.048 prio -0.093301 can req work 21-May-2013 11:11:34 [Milkyway@Home] [work_fetch] REC 11.449 prio -0.180597 can req work 21-May-2013 11:11:34 [NFS@Home] [work_fetch] REC 28.673 prio -0.217775 can req work 21-May-2013 11:11:34 [LHC@home 1.0] [work_fetch] REC 35.037 prio -0.266106 can req work 21-May-2013 11:11:34 [Poem@Home] [work_fetch] REC 4056.596 prio -3.851261 can req work 21-May-2013 11:11:34 [SETI@home] [work_fetch] REC 2544.520 prio -5.105209 can req work 21-May-2013 11:11:34 [Einstein@Home] [work_fetch] REC 4200.380 prio -8.210770 can req work 21-May-2013 11:11:34 [PrimeGrid] [work_fetch] REC 1403.029 prio -10.714000 can req work 21-May-2013 11:11:34 [---] [work_fetch] --- state for CPU --- 21-May-2013 11:11:34 [---] [work_fetch] shortfall 9834.29 nidle 0.00 saturated 85205.71 busy 0.00 21-May-2013 11:11:34 [The Lattice Project] [work_fetch] fetch share 0.000 21-May-2013 11:11:34 [superlinkattechnion] [work_fetch] fetch share 0.000 21-May-2013 11:11:34 [MindModeling@Beta] [work_fetch] fetch share 0.000 21-May-2013 11:11:34 [Constellation] [work_fetch] fetch share 0.000 21-May-2013 11:11:34 [Docking] [work_fetch] fetch share 0.124 21-May-2013 11:11:34 [malariacontrol.net] [work_fetch] fetch share 0.062 21-May-2013 11:11:34 [rosetta@home] [work_fetch] fetch share 0.050 21-May-2013 11:11:34 [correlizer] [work_fetch] fetch share 0.050 21-May-2013 11:11:34 [eon2] [work_fetch] fetch share 0.025 21-May-2013 11:11:34 [World Community Grid] [work_fetch] fetch share 0.497 21-May-2013 11:11:34 [NumberFields@home] [work_fetch] fetch share 0.025 21-May-2013 11:11:34 [SZTAKI Desktop Grid] [work_fetch] fetch share 0.025 21-May-2013 11:11:34 [boincsimap] [work_fetch] fetch share 0.050 21-May-2013 11:11:34 [ibercivis] [work_fetch] fetch share 0.025 21-May-2013 11:11:34 [fightmalaria@home] [work_fetch] fetch share 0.012 21-May-2013 11:11:34 [Asteroids@home] [work_fetch] fetch share 0.025 21-May-2013 11:11:34 [Milkyway@Home] [work_fetch] fetch share 0.006 21-May-2013 11:11:34 [NFS@Home] [work_fetch] fetch share 0.012 21-May-2013
[boinc_dev] 7.1.1 also getting new work for suspended project
I suspended Constellation since No New Work wasn't preventing boinc from requesting new work units from it and just noticed that it's still requesting work units from constellation. Here's the section of stdoutdae.txt where it got another WU. 5/21/2013 2:29:52 PM | | [work_fetch] --- start work fetch state --- 5/21/2013 2:29:52 PM | | [work_fetch] target work buffer: 83808.00 + 11232.00 sec 5/21/2013 2:29:52 PM | | [work_fetch] --- project states --- 5/21/2013 2:29:52 PM | The Lattice Project | [work_fetch] REC 0.000 prio 0.00 can't req work: suspended via Manager 5/21/2013 2:29:52 PM | superlinkattechnion | [work_fetch] REC 0.000 prio 0.00 can't req work: suspended via Manager 5/21/2013 2:29:52 PM | MindModeling@Beta | [work_fetch] REC 0.000 prio 0.00 can't req work: suspended via Manager 5/21/2013 2:29:52 PM | Constellation | [work_fetch] REC 24.280 prio 0.00 can't req work: suspended via Manager 5/21/2013 2:29:52 PM | rosetta@home | [work_fetch] REC 28.099 prio -0.065670 can req work 5/21/2013 2:29:52 PM | correlizer | [work_fetch] REC 27.875 prio -0.066564 can req work 5/21/2013 2:29:52 PM | eon2 | [work_fetch] REC 14.597 prio -0.066898 can req work 5/21/2013 2:29:52 PM | World Community Grid | [work_fetch] REC 276.058 prio -0.067595 can req work 5/21/2013 2:29:52 PM | Docking | [work_fetch] REC 71.034 prio -0.069294 can req work 5/21/2013 2:29:52 PM | NumberFields@home | [work_fetch] REC 15.042 prio -0.071349 can req work 5/21/2013 2:29:52 PM | SZTAKI Desktop Grid | [work_fetch] REC 14.960 prio -0.075118 can req work 5/21/2013 2:29:52 PM | malariacontrol.net | [work_fetch] REC 34.898 prio -0.077509 can req work 5/21/2013 2:29:52 PM | boincsimap | [work_fetch] REC 32.339 prio -0.082647 can req work 5/21/2013 2:29:52 PM | ibercivis | [work_fetch] REC 15.922 prio -0.088790 can req work 5/21/2013 2:29:52 PM | fightmalaria@home | [work_fetch] REC 10.710 prio -0.098163 can req work 5/21/2013 2:29:52 PM | Asteroids@home | [work_fetch] REC 15.895 prio -0.105204 can req work 5/21/2013 2:29:52 PM | Milkyway@Home | [work_fetch] REC 11.340 prio -0.215259 can req work 5/21/2013 2:29:52 PM | NFS@Home | [work_fetch] REC 28.400 prio -0.260310 can req work 5/21/2013 2:29:52 PM | LHC@home 1.0 | [work_fetch] REC 34.703 prio -0.318082 can req work 5/21/2013 2:29:52 PM | Poem@Home | [work_fetch] REC 4017.988 prio -4.603479 can req work 5/21/2013 2:29:52 PM | SETI@home | [work_fetch] REC 2618.643 prio -6.264621 can't req work: scheduler RPC backoff (backoff: 297.68 sec) 5/21/2013 2:29:52 PM | Einstein@Home | [work_fetch] REC 4160.403 prio -9.768530 can req work 5/21/2013 2:29:52 PM | PrimeGrid | [work_fetch] REC 1389.675 prio -12.795318 can req work 5/21/2013 2:29:52 PM | | [work_fetch] --- state for CPU --- 5/21/2013 2:29:52 PM | | [work_fetch] shortfall 3133.46 nidle 0.00 saturated 91906.54 busy 0.00 5/21/2013 2:29:52 PM | The Lattice Project | [work_fetch] fetch share 0.000 5/21/2013 2:29:52 PM | superlinkattechnion | [work_fetch] fetch share 0.000 5/21/2013 2:29:52 PM | MindModeling@Beta | [work_fetch] fetch share 0.000 5/21/2013 2:29:52 PM | Constellation | [work_fetch] fetch share 0.000 5/21/2013 2:29:52 PM | rosetta@home | [work_fetch] fetch share 0.050 5/21/2013 2:29:52 PM | correlizer | [work_fetch] fetch share 0.050 5/21/2013 2:29:52 PM | eon2 | [work_fetch] fetch share 0.025 5/21/2013 2:29:52 PM | World Community Grid | [work_fetch] fetch share 0.497 5/21/2013 2:29:52 PM | Docking | [work_fetch] fetch share 0.124 5/21/2013 2:29:52 PM | NumberFields@home | [work_fetch] fetch share 0.025 5/21/2013 2:29:52 PM | SZTAKI Desktop Grid | [work_fetch] fetch share 0.025 5/21/2013 2:29:52 PM | malariacontrol.net | [work_fetch] fetch share 0.062 5/21/2013 2:29:52 PM | boincsimap | [work_fetch] fetch share 0.050 5/21/2013 2:29:52 PM | ibercivis | [work_fetch] fetch share 0.025 5/21/2013 2:29:52 PM | fightmalaria@home | [work_fetch] fetch share 0.012 5/21/2013 2:29:52 PM | Asteroids@home | [work_fetch] fetch share 0.025 5/21/2013 2:29:52 PM | Milkyway@Home | [work_fetch] fetch share 0.006 5/21/2013 2:29:52 PM | NFS@Home | [work_fetch] fetch share 0.012 5/21/2013 2:29:52 PM | LHC@home 1.0 | [work_fetch] fetch share 0.012 5/21/2013 2:29:52 PM | Poem@Home | [work_fetch] fetch share 0.000 (blocked by prefs) 5/21/2013 2:29:52 PM | SETI@home | [work_fetch] fetch share 0.000 (blocked by prefs) 5/21/2013 2:29:52 PM | Einstein@Home | [work_fetch] fetch share 0.000 (blocked by prefs) 5/21/2013 2:29:52 PM | PrimeGrid | [work_fetch] fetch share 0.000 (blocked by prefs) 5/21/2013 2:29:52 PM | | [work_fetch] --- state for NVIDIA --- 5/21/2013 2:29:52 PM | | [work_fetch] shortfall 5949.70 nidle 0.00 saturated 89090.30 busy 0.00 5/21/2013 2:29:52 PM | The Lattice Project | [work_fetch] fetch share 0.000 (no apps) 5/21/2013 2:29:52 PM | superlinkattechnion | [work_fetch] fetch share 0.000 (blocked by configuration file) 5/21/2013 2:29:52 PM | MindModeling@Beta | [work_fetch]
Re: [boinc_dev] [boinc_alpha] 7.1.1 also getting new work for suspended project
David, For both your No New Tasks and Project Suspended scenarios, where BOINC still fetched work Did you manually click the Update button in both of those scenarios? No, IIRC, although when I aborted the WU and updated the project to turn it in, I believe it got another WU then. In fact, I had loaded 7.1.1 on a different quad yesterday (one that doesn't use a GPU) and when I flipped over to look at it (they're on the same KVM so share the screen and keyboard but each has it's own mouse), it had gotten work for 2 projects that were suspended on it. Einstein and Primegrid were the projects, IIRC. I unsuspended them since they didn't have large shares anyway. On the C2D machine which has a GPU I finally had to remove the Constellation project. It was still getting work while it was suspended and I had set the resource share to 0.001 BTW, on the 3 test machines (2 Vista32 and 1 Vista64) I downloaded boinc 7.1.1 on each machine. I didn't move it between machines because I didn't have shares setup for that. Each machine downloaded it's own separate copy of 7.1.1 so even if one machine somehow got a bad copy, the other machines shouldn't have had the same problem. The 3rd test machine (the only one running 64 bit Vista) didn't have any projects on it that were suspended or NNW so it didn't show the work fetch problem although it does seem to have tried to make sure it had at least one job from each project (see next paragraph). 7.1.1 is behaving differently on work fetch. 7.0.64 seemed to just grab all the work it needed from the current priority project. It seems to me like 7.1.1 is trying to get at least one work unit from each project it is attached to. ISTR David Anderson saying something about the new work fetch simulator on the boinc alpha website doing that for some reason so I guess 7.1.1 has the same logic in it. David Ball ___ boinc_dev mailing list boinc_dev@ssl.berkeley.edu http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
Re: [boinc_dev] GPU Plan Classes
Have a little bit of additional info. Apparently the HD 7600M series and below are slightly updated versions of Turks and Caicos. It appears the HD 7400M series are an updated version of Caicos called Seymour (40nm, VLIW5, 160 stream processors). The HD 7500M and 7600M series are an updated version of Turks called Thames (40nm, VLIW5, 480 stream processors). The 7500M series has a 64 bit memory interface and the 7600M series has a 128 bit memory interface. References: http://news.softpedia.com/news/AMD-Clarifies-Radeon-HD-7000M-Notebook-Strategy-246594.shtml http://semiaccurate.com/2011/12/07/amd-launches-three-low-end-hd7000m-gpus/ Regards, David Ball As of BOINC 7.0.65, the BOINC client assumed that any CAL-capable GPU was also OpenCL-capable, and so reported the to the server that way. I have since checked in code that corrects this assumption and should report correctly to the server. Because of limited information available from CAL and OpenCL, it is not possible to do this perfectly in situations where one computer has a mix of GPUs some of which are openCL capable and some of which are not, but I think my new code should work _almost_ every case. I did a bunch of internet research, and I believe Jon Sonntag's approach similarly should work in _almost_ every case. Unfortunately, some AMD target values include GPUs that are both openCL capable and ones that are not. Here is what I wrote earlier to David and Rom: he information on AMD's web site is hard to find and very incomplete and ambiguous. They seem to be pushing only their version 2.8 SDK which supports OpenCL 1.2, and give a list of compatible GPUs, but it is unclear whether this means that: [1]their latest software does not support older GPUs at all, or [2]it supports CAL but not OpenCL on their older GPUs, or [3]the newer GPUs are needed for newer OpenCL features but older GPUs are still supported for the features which were present in older versions of openCL. I suspect it means [2] based on what Jord has written me. Because of all the confusion, and because the only information CAL gives us about GPU model numbers is the CALtargetEnum value, I compiled a list the best I could mapping CALtargetEnum to GPU model and OpenCL capability. I got my information from the tables in http://en.wikipedia.org/wiki/Comparison_of_ATI_graphics_processing_units. This correlates model numbers and other info with the engineering code names listed in cal_boinc.h and the switch / case statement in COPROC_ATI::get() in gpu_amd.cpp. I also found a newer version of cal.h with a more recent listing for CALtargetEnum values at http://gpuocelot.googlecode.com/svn/trunk/ocelot/ocelot/cal/include/cal.h. Finally, I made the somewhat questionable assumption that if a table section makes no mention of OpenCL that means those GPU models have no OpenCL support. Here is my list, which I am sure is not perfect: Based on http://en.wikipedia.org/wiki/Comparison_of_ATI_graphics_processing_units HD 83xx - 84xx models include DirectX 11, OpenGL 4.2 and OpenCL 1.1[19] case CAL_TARGET_600,/** R600 GPU ISA */ ATI Radeon HD 2900 (RV600) Radeon HD 2900 GT Nov 6, 2007 OpenCL: NO Radeon HD 2900 Pro Sep 25, 2007OpenCL: NO case CAL_TARGET_610,/** RV610 GPU ISA */ ATI Radeon HD 2300/2400/3200/4200 (RV610) Radeon HD 2350 Jun 28, 2007OpenCL: NO Radeon HD 2400 Pro Jun 28, 2007OpenCL: NO Radeon HD 2400 XT Jun 28, 2007OpenCL: NO Mobility Radeon HD 2400 May 14, 2007OpenCL: NO Mobility Radeon HD 2400 XT May 14, 2007OpenCL: NO Radeon 3000 Graphics (760G Chipset) 2009OpenCL: NO Radeon 3100 Graphics (780V Chipset) Jan 23, 2008OpenCL: NO Radeon HD 3200 Graphics (780G Chipset) Jan 23, 2008OpenCL: NO case CAL_TARGET_630,/** RV630 GPU ISA */ ATI Radeon HD 2600 (RV630) Radeon HD 2600 Pro Jun 28, 2007OpenCL: NO Radeon HD 2600 XT Jun 28, 2007OpenCL: NO Mobility Radeon HD 2600 May 14, 2007OpenCL: NO Mobility Radeon HD 2600 XT May 14, 2007OpenCL: NO Mobility Radeon HD 2700 December 12, 2007 OpenCL: NO Radeon HD 3650 Jan 23, 2008 (RV635)OpenCL: NO All-In-Wonder HD 3650 Jun 28, 2008 (RV635)OpenCL: NO Mobility Radeon HD 3650 January 7, 2008 OpenCL: NO Mobility Radeon HD 3670 January 7, 2008 OpenCL: NO case CAL_TARGET_670,/** RV670 GPU ISA */ ATI Radeon HD 3800 (RV670) FireStream 9170 November 8, 2007OpenCL 1.0 * Radeon HD 3850 Nov 19, 2007OpenCL: NO Radeon HD 3870 Nov 19, 2007
Re: [boinc_dev] The reason for a local DCF.
We need some simple way for a project admin to be able to tell the server that the estimate for a batch of jobs should be adjusted. The server should update the estimate in the unsent jobs in that batch and somehow pass that info along to clients that already have jobs in that batch the next time they contact the server. One of the projects using the new version of the server (version 701 IIRC) that sends dont_use_dcf/ as part of the sched reply had the estimates suddenly drop by 1/10th. The project admin said he was changing the estimated time of WUs and there was one 0 missing for that batch of work units. These WU continued to be sent out for several days before another batch started and the time estimates went back to reality. David Ball I can't speak specifically for TrainWreck@home, but I think you'll find that if it's running generic BOINC server code that's less than three years old (and if it's telling the client to turn off DCF, I think it must be), then the project server does *NOT* calculate its own DCF. I invite you to review what happened to DCF in sched_send.cpp (server code), in http://boinc.berkeley.edu/trac/changeset/1d765245ed6ea666a46b2b5878371c4183accbeb/boinc-v2/sched/sched_send.cpp From: McLeod, John john.mcl...@sap.com To: Richard Haselgrove r.haselgr...@btopenworld.com; boinc_dev@ssl.berkeley.edu boinc_dev@ssl.berkeley.edu Sent: Wednesday, 3 April 2013, 15:43 Subject: Re: [boinc_dev] The reason for a local DCF. Currently the server calculates its own DCF. And when asked for 43200 seconds of work would inflate the fpops number to account for the difference. This would mean that the work that is received would have a sort of correct value for time before being inflated again by the DCF calculation. No, this is a startup issue, but it can happen any time: 1) A new project is joined 2) A new application is pushed down 3) A new dataset that has a greatly different run time than expected is pushed down. A possible way out: If a project has do not use DCF set, modify the meaning of this somewhat. Instead of ignoring the DCF entirely, add a DCF modifier to each task of a project which is 1/DCF at time of acceptance of the task (this counteracts the fact that the DCF is calculated twice, once at the server and once at the client). Each time the DCF is used to calculate the remaining time to run, multiply by this value. When the DCF for the project is recalculated, recalculate as normal ignoring this modifier. This will eventually have the DCF stabilize near 1, and allow the server to calculate what the fpops ought and have the client responsive to massive miscalculations in initial state. From: Richard Haselgrove [mailto:r.haselgr...@btopenworld.com] Sent: Wednesday, April 03, 2013 10:22 AM To: McLeod, John; boinc_dev@ssl.berkeley.edu Subject: Re: [boinc_dev] The reason for a local DCF. Fully agreed. But remember that you have to follow the logic and also re-instate the DCF code on that project's server. Say you set work fetch limits of 0.5 days minimum and 0.5 days additional - or a target work buffer: 43200.00 + 43200.00 sec Once TrainWreck@home (eventually) becomes the highest priority project and your client issues a work request, it will request 43200 seconds of work. The *server*, which currently ignores DCF in its calculations, will still use the 1hr 17mn estimation - 4620 seconds. The server will assign 10 jobs to fill the request. Once those 10 jobs arrive at the client, they will be re-estimated by the client using DCF, which by then will be about 19.27 And your client will announce that it has received 10 days 7 hours of new work. And no doubt panic. [I am assuming that TrainWreck@home's previous batch of work for this application was correctly estimated, and that John has volunteered for TrainWreck@home for long enough to have an established and stable APR for HitTheBuffers v1.01] From: McLeod, John john.mcl...@sap.commailto:john.mcl...@sap.com To: boinc_dev@ssl.berkeley.edumailto:boinc_dev@ssl.berkeley.edu boinc_dev@ssl.berkeley.edumailto:boinc_dev@ssl.berkeley.edu Sent: Wednesday, 3 April 2013, 13:40 Subject: [boinc_dev] The reason for a local DCF. I am currently watching a train wreck that would not be happening if DCF was turned on for a particular project. The initial estimate is a wall time of one hour seventeen minutes. The actual wall time is twenty four hour 44 minutes. The problem is that work fetch and the scheduler do not know that the problem exists for tasks #2 through #20 and are downloading work from other projects, not realizing that the saturated time is 20 days and not 20 hours. ___ boinc_dev mailing list boinc_dev@ssl.berkeley.edu http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the
Re: [boinc_dev] Recommend not detect 8400 GS video
OK, forget about it then. I only use that system for crunching so I just told it not to recognize the card. It's a 256MB card and I thought it was on the motherboard at first and was concerned about losing the whole system. In researching temps for it, I did find people talking about screen blanking or thermal cutoff at 110 C. Now that I know it's an actual card, I'll just replace it with something more modern that has good temperature control and uses very little power. Going from 80nm to 28nm is a big step. Apparently, these 8400 gs chips are quite variable and this machine has an overheating chip in it or a bad cooling solution. I also saw mention of some versions of the driver causing the chip to go into thermal runaway. I might even replace the system. It's the original Core 2 quad and isn't upgradable to even a later C2Q. A current Ivy Bridge 22nm Intel I3 could probably match it for about a third the power of a Q6600. Some or all of the Ivy bridge parts can run OpenCL and Haswell is due out later this year which is supposed to have up to a 5x improvement on the graphics, depending on model, and includes the AVX2 extensions with larger FP units on the CPU cores IIRC (could be broadwell that has those). APUs are eating much of the discrete graphics card market. Now that ST_E (sp??) has got FD-SOI working and global foundries has licensed it, expect some improvements from AMD APUs too. Now, if they can just solve the power supply problem for EUV lithography, they'll be set for 10nm and below. It's scary how they power EUV. Think very high powered lasers vaporizing a stream of metal droplets and missing the droplet a lot of the time. Anyway, since it seems that the chip failing will not take out the motherboard, and most peoples cards are better than mine or have already been replaced, don't worry about it. Sorry to have bothered you about it. David Ball I would be opposed to BOINC telling me I can't use my 8400 GS because it might overheat. If it is decided that it is a good idea and they should be banned, then BOINC should not allow crunching on laptops, cell phones, etc. because they also tend to overheat. Or, we let users decide whether to crunch or not and allow them to tune the GPU apps such that they can run at whatever temperatures they are comfortable with. Jon Sonntag ___ boinc_dev mailing list boinc_dev@ssl.berkeley.edu http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
[boinc_dev] Recommend not detect 8400 GS video
While testing an HP m9047c (completely stock hardware - never overclocked) for boinc alpha, I upgraded some drivers and somehow 7.0.56 of the boinc client started detecting that it could run OpenCL jobs on the machine which has a NVIDIA GeForce 8400 GS (256MB) driver: 314.07 GPU which had previously gone undetected until a series of driver updates. I was surprised that with so little memory, Seti assigned it 2 AstroPulse v6 v6.04 (opencl_nvidia_100) tasks. Seti machine: 4719778 http://setiathome.berkeley.edu/show_host_detail.php?hostid=4719778 Fortunately, through just blind good luck, I was on the machine when the huge Seti download finally finished and watched to see how it did. It was working ok in the boinc manager but I decided to see what was happening with GPU-z. It was reaching over 90% GPU utilization and about 48% memory bandwidth utilization. However, after watching the temperature for the GPU chip climb through 107 degrees C, I suspended GPU processing and set the no_gpus flag in cc_config.xml. I aborted the running job and the second job aborted with a status of 201 (0xc9) EXIT_MISSING_COPROC. I restarted the boinc client. Old nVidia chips are known in the trade press as having problems at high temps because of a mismatch in the internal expansion properties resulting in breakage. I know this was mentioned for the 65nm and 55nm chips in a 2008 article but I don't know about these chips, which are 80 nm IIRC. You can read some reprints of the bumpgate articles starting at the address below. http://semiaccurate.com/2010/07/11/why-nvidias-chips-are-defective/ If it was me, I'd refuse to let the boinc client recognize these chips as usable GPUs. David Ball ___ boinc_dev mailing list boinc_dev@ssl.berkeley.edu http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
Re: [boinc_dev] Preferences Override
Hello, I think max_cpus is deprecated and has been replaced by max_ncpus_pct, which is the percentage of the total cores to use. If you have 4 cores and you only want to use 1 core, try modifying the global_prefs_override.xml to include max_cpus1/max_cpus max_ncpus_pct25.0/max_ncpus_pct and leaving the rest of the file intact. Then reload it. BTW, it seems to round down. On my dual core machine I can set it to 99.0 percent and it will only use 1 cpu core. On your 4 core machine, anything from 25.0 to 49.0 should just use 1 core. Hope this helps, David Ball On 3/6/2013 3:10 PM, Jöbstl, Emanuel wrote: Hello David and John, thank you for your fast replies. I investigated the configuration files located in C:\ProgramData\Boinc, and also validated that read_global_prefs_override is set. In my config files (global_prefs and global_prefs_override), max_cpus is set to 1, but still there are four tasks running, using all the processor cores. What could cause this issue? I attached the configuration files and the output of boinccmd.exe --get_tasks. with best regards, Emi Von: boinc_dev [boinc_dev-boun...@ssl.berkeley.edu] im Auftrag von David Anderson [da...@ssl.berkeley.edu] Gesendet: Dienstag, 5. März 2013 22:01 An: boinc_dev@ssl.berkeley.edu Betreff: Re: [boinc_dev] Preferences Override Here's how things are supposed to work: global_prefs.xml contains preferences from the project server. global_prefs_override.xml contains preferences set locally. It's written by the set_global_prefs_override() GUI RPC. As the name implies, values specified here override those in global_prefs.xml. Note: after calling set_global_prefs_override(), you must call read_global_prefs_override() to have the new preferences take effect. -- David On 05-Mar-2013 12:02 PM, Jöbstl, Emanuel wrote: Hello Boinc Devs, Again, I need some help: 1) I noticed that my local Boinc preferences are being overwritten with the preferences from the project server. Is there any way to avoid this or do I have to change the preferences on the server too? I am setting the preferences by doing a set_global_prefs_override Gui-Rpc call. 2) Is there any way to detect that a task has been finished on client side (using Gui Rpc)? with best regards and thanks, Emanuel ___ boinc_dev mailing list boinc_dev@ssl.berkeley.edu http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address. ___ boinc_dev mailing list boinc_dev@ssl.berkeley.edu http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address. ___ boinc_dev mailing list boinc_dev@ssl.berkeley.edu http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address. ___ boinc_dev mailing list boinc_dev@ssl.berkeley.edu http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.