Hi,
We have setup vpmaster to use a machinepool to distribute the work of making a
large terrain db. This works fairly well and we are getting the expected
results after about 40 hours of work. However, after a few thousand of 32000
tasks has been completed it appears that the master can't provide tasks fast
enought to the other workers. It will drop from ~ 100 running tasks to 3-4 for
a long time. We have ~100 processes configured across 6 machines
(40-40-8-8-8-8), and when running tasks are few the other machines have almost
no load.
See the htop snapshot from the master to see the situation.
After som investigation it appears (and I'm just specualting) that the main
vpmaster process writes an enourmous amount of data to the
"terrainname.ive.0.added" file in the output folder.
This files gets an increasing number of task names written to it at a rate of
600 MB in a few seconds.
Tlines are like this:
/mnt/master/vpb/PlanetSAT150m_Mexico15m_vpbmaster/output/PlanetSAT150m_Mexico15m_vpbmaster_subtile_L3_X1_Y2/PlanetSAT150m_Mexico15m_vpbmaster_subtile_L8_X56_Y77/PlanetSAT150m_Mexico15m_vpbmaster_L12_X905_Y1246_subtile.ive
It appears to add 600 MB worth of these lines every few seconds, which really
saturates the disk i/o and keeps one of the processes at 100%
At some point this line is added:
PlanetSAT150m_Mexico15m_vpbmaster.ive.0.added: file truncated
And the file is set to 0 MB and it starts to write to it again.
If I cancle the vpbmaster run and resubmit the tasks, it will start with normal
effiency, but after a few thousand tasks this behaviour starts again.
Has anyone seen this behaviour before?
...
Thank you!
Cheers,
Knut
--
Read this topic online here:
http://forum.openscenegraph.org/viewtopic.php?p=68495#68495
Attachments:
http://forum.openscenegraph.org//files/htop_265.jpg
___
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org