So to run effectively, I would need more memory, because the job wants two shares? ... Yes. With a larger node it works. What would be a reasonable memory size for a ducc node?
2014-11-14 9:38 GMT-06:00 Lou DeGenaro <lou.degen...@gmail.com>: > Simon, > > Congratulations! You found a bug in DUCC's Web Server. It was incorrectly > rounding up when reporting the number of shares for a machine. This issue > is addressed by Jira 4104 <https://issues.apache.org/jira/browse/UIMA-4104>. > > Lou. > > On Fri, Nov 14, 2014 at 7:49 AM, Jim Challenger <chall...@gmail.com> wrote: > >> Simon, >> It looks like the problem is the amount of RAM on your machine. It's >> going to be hard to get any meaningful work running on < about 8G. >> >> Here's what to do to get the test job to run on your 4G machine: >> 1. In the resources folder, edit ducc.properties and change this: >> ducc.jd.host.memory.size=2GB >> to this: >> ducc.jd.host.memory.size=1GB >> >> This is the amount of RAM that DUCC reserves for itself to manage >> it's "head" processes. >> >> 2. In the examples/simple folder, edit 1.job and change this: >> process_memory_size 2 >> to this: >> process_memory_size 1 >> >> This is the amount of memory in GB that the sample 1.job is >> requesting. >> >> 3. Stop ducc and restart it so the ducc processes reset the >> jd.host.memory size from the new ducc.properties. >> >> 4. Rerun 1.job and all should be well. >> >> Here are the gory details from the RM log, if you're interested. In >> the RM log, I see these lines. >> >> 13 Nov 2014 22:04:14,909 INFO RM.NodePool - queryMachines N/A >> Name Order Active Shares Unused Shares Memory (MB) >> Jobs >> -------------------- ----- ------------- ------------- ----------- >> ------ ... >> .us-west-2.compute.internal 3 2 1 3955 7 [1] >> >> This says you have 3G of **usable-by-ducc** RAM, of which 2G are used by >> the reservation/job "7", and that you have 1GB free. The reason you have >> only 3GB **usable** is that usually the hardware/opsys will reserve a small >> part of the installed RAM for itself, so the reported RAM is a tad >> smaller. To avoid overcommitting the system, we use the reported value, >> not the installed value. Most or all of the jobs here will easily >> overwhelm even the largest machines if we don't do this. >> >> Next, these lines show the actual schedule the RM is trying to build. >> Dormant: >> ID JobName User Class Shares >> Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst >> J_________8 Test_job_1 ducc normal 0 >> 2 0 2 2 15 15 true 8 >> >> Reserved: >> ID JobName User Class Shares >> Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst >> R_________7 Job_Driver System JobDriver 1 >> 2 2 0 2 0 0 0 1 >> >> This confirms that the DUCC reservation "7" occupies 2G, and that job "8" >> is requesting 2G but is "dormant", i.e. waiting for resources. Since there >> is only 3G available on this machine, job 8 will wait. >> >> Best, >> Jim >> >> >> >> >> >>