Warren-
Thanks for your prompt response!
Given the fundamental issues you mentioned, I think I will change my script so that it loads files only when they are needed and deletes the associated objects when they are no longer being displayed. Initially, I rejected this solution as less efficient, but apparently the specific situation with pymol actually makes it more efficient!

Although the current version of my script uses a lot of outside information to determine which files are loaded, how they are colored, and how they are aligned (and so I haven't included it), the following illustrates what I'm talking about:

#!/usr/bin/env python

from glob import glob
from time import time

if __name__ == 'pymol':
   from pymol import cmd
   t1 = time()
   for pdb in glob('*.pdb'):
      print pdb
      cmd.load(pdb)
   t2 = time()
   print t2-t1

If (from pymol) I cd to a directory that has 50 pdb files and run this script, it takes about 105 sec to complete. If I include simple alignment and color commands, as follows:

#!/usr/bin/env python

from glob import glob
from time import time

if __name__ == 'pymol':
   from pymol import cmd
   t1 = time()
   objects = []
   for pdb in glob('*.pdb'):
      print pdb
      cmd.load(pdb)
      objects.append(pdb[:-4])
      cmd.fit(objects[-1]+' and name ca',objects[0]+' and name ca')
      cmd.color('wheat',objects[-1]+' and elem c')
   t2 = time()
   print t2-t1

, it still takes same amount of time as before. This is only one data point (50 structures), because I didn't want to repeat the benchmarks for larger sets of structures, but it seems to indicate that the limiting step is the actual loading of the pdb files, and not the subsequent aligning/coloring steps.

Thanks again for letting me know which direction I should go. I'll let you know if I get any insight into the origin of the original issue.

-Ben

On Sep 7, 2004, at 11:51 AM, Warren DeLano wrote:

Ben,

Thanks for the great benchmarks! PyMOL is definitely showing non-linear behavior when it comes to loading a lot of objects...I don't know why this is exactly, but I can tell you that I didn't originally envision (and thus
optimize PYMOL for) loading of so many objects.

As it currently stands, there are a number of places where PyMOL does things using lists when it should be using hashes, and there are many tasks (such as selecting of atoms) that are linearly dependent (or worse) on the total number of atoms and coordinate sets present in the system. All of these issues will be addressed in time, but it may take a considerable work to
correct them.  Unfortunately, these are more than just bugs -- they are
limitations in the original design. Such limitations are now the bane of my existence, my dreams are filled with questions of "How do we fix or improve
the software, without breaking existing PyMOL usage?"  Remodeling an
airplane full of passengers while you're flying it is much more challenging
than when it is empty and on the ground. : )

My current advice is to find creative ways of limiting the total number of atoms and objects loaded into PyMOL at one time. One way to do this is to create subsets which just contain those atoms you'd like to see. Another
approach is to run multiple PyMOL instances simultaneously.

Cheers,
Warren

PS. It would be great if you could send us one of your more challenging
example scripts to use as a test-case for improvement -- and if you do spot
simple bottlenecks in the code, such information could be very helpful.

--
mailto:war...@delsci.com
Warren L. DeLano, Ph.D.
Principal Scientist
DeLano Scientific LLC
Voice (650)-346-1154
Fax   (650)-593-4020


-----Original Message-----
From: pymol-users-ad...@lists.sourceforge.net
[mailto:pymol-users-ad...@lists.sourceforge.net] On Behalf Of
Ben Allen
Sent: Tuesday, September 07, 2004 10:32 AM
To: pymol-users@lists.sourceforge.net
Subject: [PyMOL] long loading times as the number of existing
objects increases

I have a situation in which I need to load a large number of
separate pdb files into a single pymol session.  In this
case, the number is ~150, but it could potentially be more.
However, the amount of time required to load a file appears
to be strongly dependent on the number of files already
loaded.  For example:

# of structures loaded  time to load all structures (seconds)
5       0.82
10      2.49
20      11.05
30      29.85
40      62.48
50      115.25
60      189.79
70      302.67
80      432.82
90      589.23

unfortunately, this means that to load 150 structures takes
over an hour.  I observe this behavior whether I am loading
the structures all at once using a python script, or one at a
time.  In both cases, I am using the cmd.load() api function,
but the built-in load command gives similar results.  The
structures I am loading are (nearly) identical:
each has 263 residues (in a single chain); each individual
pdb file is about 215KB.

I am running this on a dual 2.0 GHz G5 system with 1.5 GB
memory.  The long loading times are consistent between the
two versions of pymol I have installed: OSX/X11 hybrid
version 0.97 and MacPyMol version 0.95.
During the long loading times, there is plenty of memory
available, but the processor load stays at 50% (i.e. one
processor on my machine is fully loaded throughout).

My gut feeling is that this situation should not be, but I
don't yet understand the structure of the code well enough to
debug it.  Can anyone shed light on this issue?

Thanks in advance,
Ben Allen




-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop FREE
Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
PyMOL-users mailing list
PyMOL-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pymol-users




Reply via email to