Re: [PyMOL] long loading times as the number of existing objects increases

Ben Allen Tue, 07 Sep 2004 14:48:21 -0700

Warren-
Thanks for your prompt response!

Given the fundamental issues you mentioned, I think I will change myscript so that it loads files only when they are needed and deletes theassociated objects when they are no longer being displayed. Initially,I rejected this solution as less efficient, but apparently the specificsituation with pymol actually makes it more efficient!

Although the current version of my script uses a lot of outsideinformation to determine which files are loaded, how they are colored,and how they are aligned (and so I haven't included it), the followingillustrates what I'm talking about:


#!/usr/bin/env python

from glob import glob
from time import time

if __name__ == 'pymol':
   from pymol import cmd
   t1 = time()
   for pdb in glob('*.pdb'):
      print pdb
      cmd.load(pdb)
   t2 = time()
   print t2-t1

If (from pymol) I cd to a directory that has 50 pdb files and run thisscript, it takes about 105 sec to complete. If I include simplealignment and color commands, as follows:


#!/usr/bin/env python

from glob import glob
from time import time

if __name__ == 'pymol':
   from pymol import cmd
   t1 = time()
   objects = []
   for pdb in glob('*.pdb'):
      print pdb
      cmd.load(pdb)
      objects.append(pdb[:-4])
      cmd.fit(objects[-1]+' and name ca',objects[0]+' and name ca')
      cmd.color('wheat',objects[-1]+' and elem c')
   t2 = time()
   print t2-t1

, it still takes same amount of time as before. This is only one datapoint (50 structures), because I didn't want to repeat the benchmarksfor larger sets of structures, but it seems to indicate that thelimiting step is the actual loading of the pdb files, and not thesubsequent aligning/coloring steps.

Thanks again for letting me know which direction I should go. I'll letyou know if I get any insight into the origin of the original issue.


-Ben

On Sep 7, 2004, at 11:51 AM, Warren DeLano wrote:

Ben,

Thanks for the great benchmarks! PyMOL is definitely showingnon-linearbehavior when it comes to loading a lot of objects...I don't know whythisis exactly, but I can tell you that I didn't originally envision (andthus

optimize PYMOL for) loading of so many objects.

As it currently stands, there are a number of places where PyMOL doesthingsusing lists when it should be using hashes, and there are many tasks(suchas selecting of atoms) that are linearly dependent (or worse) on thetotalnumber of atoms and coordinate sets present in the system. All oftheseissues will be addressed in time, but it may take a considerable workto

correct them.  Unfortunately, these are more than just bugs -- they are

limitations in the original design. Such limitations are now the baneof myexistence, my dreams are filled with questions of "How do we fix orimprove

the software, without breaking existing PyMOL usage?"  Remodeling an

airplane full of passengers while you're flying it is much morechallenging

than when it is empty and on the ground. : )

My current advice is to find creative ways of limiting the totalnumber ofatoms and objects loaded into PyMOL at one time. One way to do thisis tocreate subsets which just contain those atoms you'd like to see.Another

approach is to run multiple PyMOL instances simultaneously.

Cheers,
Warren

PS. It would be great if you could send us one of your more challenging

example scripts to use as a test-case for improvement -- and if you dospot

simple bottlenecks in the code, such information could be very helpful.

--
mailto:[email protected]
Warren L. DeLano, Ph.D.
Principal Scientist
DeLano Scientific LLC
Voice (650)-346-1154
Fax   (650)-593-4020

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of
Ben Allen
Sent: Tuesday, September 07, 2004 10:32 AM
To: [email protected]
Subject: [PyMOL] long loading times as the number of existing
objects increases

I have a situation in which I need to load a large number of
separate pdb files into a single pymol session.  In this
case, the number is ~150, but it could potentially be more.
However, the amount of time required to load a file appears
to be strongly dependent on the number of files already
loaded.  For example:

# of structures loaded  time to load all structures (seconds)
5       0.82
10      2.49
20      11.05
30      29.85
40      62.48
50      115.25
60      189.79
70      302.67
80      432.82
90      589.23

unfortunately, this means that to load 150 structures takes
over an hour.  I observe this behavior whether I am loading
the structures all at once using a python script, or one at a
time.  In both cases, I am using the cmd.load() api function,
but the built-in load command gives similar results.  The
structures I am loading are (nearly) identical:
each has 263 residues (in a single chain); each individual
pdb file is about 215KB.

I am running this on a dual 2.0 GHz G5 system with 1.5 GB
memory.  The long loading times are consistent between the
two versions of pymol I have installed: OSX/X11 hybrid
version 0.97 and MacPyMol version 0.95.
During the long loading times, there is plenty of memory
available, but the processor load stays at 50% (i.e. one
processor on my machine is fully loaded throughout).

My gut feeling is that this situation should not be, but I
don't yet understand the structure of the code well enough to
debug it.  Can anyone shed light on this issue?

Thanks in advance,
Ben Allen

-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop FREE
Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
PyMOL-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pymol-users

Re: [PyMOL] long loading times as the number of existing objects increases

Reply via email to