Warren-
Thanks for your prompt response!
Given the fundamental issues you mentioned, I think I will change my
script so that it loads files only when they are needed and deletes the
associated objects when they are no longer being displayed. Initially,
I rejected this solution as less efficient, but apparently the specific
situation with pymol actually makes it more efficient!
Although the current version of my script uses a lot of outside
information to determine which files are loaded, how they are colored,
and how they are aligned (and so I haven't included it), the following
illustrates what I'm talking about:
#!/usr/bin/env python
from glob import glob
from time import time
if __name__ == 'pymol':
from pymol import cmd
t1 = time()
for pdb in glob('*.pdb'):
print pdb
cmd.load(pdb)
t2 = time()
print t2-t1
If (from pymol) I cd to a directory that has 50 pdb files and run this
script, it takes about 105 sec to complete. If I include simple
alignment and color commands, as follows:
#!/usr/bin/env python
from glob import glob
from time import time
if __name__ == 'pymol':
from pymol import cmd
t1 = time()
objects = []
for pdb in glob('*.pdb'):
print pdb
cmd.load(pdb)
objects.append(pdb[:-4])
cmd.fit(objects[-1]+' and name ca',objects[0]+' and name ca')
cmd.color('wheat',objects[-1]+' and elem c')
t2 = time()
print t2-t1
, it still takes same amount of time as before. This is only one data
point (50 structures), because I didn't want to repeat the benchmarks
for larger sets of structures, but it seems to indicate that the
limiting step is the actual loading of the pdb files, and not the
subsequent aligning/coloring steps.
Thanks again for letting me know which direction I should go. I'll let
you know if I get any insight into the origin of the original issue.
-Ben
On Sep 7, 2004, at 11:51 AM, Warren DeLano wrote:
Ben,
Thanks for the great benchmarks! PyMOL is definitely showing
non-linear
behavior when it comes to loading a lot of objects...I don't know why
this
is exactly, but I can tell you that I didn't originally envision (and
thus
optimize PYMOL for) loading of so many objects.
As it currently stands, there are a number of places where PyMOL does
things
using lists when it should be using hashes, and there are many tasks
(such
as selecting of atoms) that are linearly dependent (or worse) on the
total
number of atoms and coordinate sets present in the system. All of
these
issues will be addressed in time, but it may take a considerable work
to
correct them. Unfortunately, these are more than just bugs -- they are
limitations in the original design. Such limitations are now the bane
of my
existence, my dreams are filled with questions of "How do we fix or
improve
the software, without breaking existing PyMOL usage?" Remodeling an
airplane full of passengers while you're flying it is much more
challenging
than when it is empty and on the ground. : )
My current advice is to find creative ways of limiting the total
number of
atoms and objects loaded into PyMOL at one time. One way to do this
is to
create subsets which just contain those atoms you'd like to see.
Another
approach is to run multiple PyMOL instances simultaneously.
Cheers,
Warren
PS. It would be great if you could send us one of your more challenging
example scripts to use as a test-case for improvement -- and if you do
spot
simple bottlenecks in the code, such information could be very helpful.
--
mailto:war...@delsci.com
Warren L. DeLano, Ph.D.
Principal Scientist
DeLano Scientific LLC
Voice (650)-346-1154
Fax (650)-593-4020
-----Original Message-----
From: pymol-users-ad...@lists.sourceforge.net
[mailto:pymol-users-ad...@lists.sourceforge.net] On Behalf Of
Ben Allen
Sent: Tuesday, September 07, 2004 10:32 AM
To: pymol-users@lists.sourceforge.net
Subject: [PyMOL] long loading times as the number of existing
objects increases
I have a situation in which I need to load a large number of
separate pdb files into a single pymol session. In this
case, the number is ~150, but it could potentially be more.
However, the amount of time required to load a file appears
to be strongly dependent on the number of files already
loaded. For example:
# of structures loaded time to load all structures (seconds)
5 0.82
10 2.49
20 11.05
30 29.85
40 62.48
50 115.25
60 189.79
70 302.67
80 432.82
90 589.23
unfortunately, this means that to load 150 structures takes
over an hour. I observe this behavior whether I am loading
the structures all at once using a python script, or one at a
time. In both cases, I am using the cmd.load() api function,
but the built-in load command gives similar results. The
structures I am loading are (nearly) identical:
each has 263 residues (in a single chain); each individual
pdb file is about 215KB.
I am running this on a dual 2.0 GHz G5 system with 1.5 GB
memory. The long loading times are consistent between the
two versions of pymol I have installed: OSX/X11 hybrid
version 0.97 and MacPyMol version 0.95.
During the long loading times, there is plenty of memory
available, but the processor load stays at 50% (i.e. one
processor on my machine is fully loaded throughout).
My gut feeling is that this situation should not be, but I
don't yet understand the structure of the code well enough to
debug it. Can anyone shed light on this issue?
Thanks in advance,
Ben Allen
-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop FREE
Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
PyMOL-users mailing list
PyMOL-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pymol-users