RE: [PyMOL] long loading times as the number of existing objects increases

Warren DeLano Tue, 07 Sep 2004 21:45:24 -0700

In relation to this, I just noticed something very important:

If you "set auto_zoom, off" before loading a series of structures, you'll
boost PyMOL's loading performance dramatically: by 10X at least...perhaps
much more.   Just zoom once manually after everything is loaded in.


cmd.set("zoom","off")
# now load your structures
...
# now zoom
cmd.zoom()

This change makes it possible to load hundreds of structures containing
hundreds of thousands of atoms in a reasonable amount of time.  For example,
on my dual 1 GB G5 with Shark-optimized G5 beta code, I loaded 800 PDB
structures containing a total of 1.4 million atoms in just over 426 seconds
-- apparently the situation isn't nearly as bad as I'd feared.

Cheers,
Warren
--
mailto:war...@delsci.com
Warren L. DeLano, Ph.D.
Principal Scientist
DeLano Scientific LLC
Voice (650)-346-1154 
Fax   (650)-593-4020
  

> -----Original Message-----
> From: Ben Allen [mailto:benal...@caltech.edu] 
> Sent: Tuesday, September 07, 2004 2:48 PM
> To: Warren DeLano
> Cc: pymol-users@lists.sourceforge.net
> Subject: Re: [PyMOL] long loading times as the number of 
> existing objects increases
> 
> Warren-
> Thanks for your prompt response!
> Given the fundamental issues you mentioned, I think I will 
> change my script so that it loads files only when they are 
> needed and deletes the associated objects when they are no 
> longer being displayed. Initially, I rejected this solution 
> as less efficient, but apparently the specific situation with 
> pymol actually makes it more efficient! 
> 
> Although the current version of my script uses a lot of 
> outside information to determine which files are loaded, how 
> they are colored, and how they are aligned (and so I haven't 
> included it), the following illustrates what I'm talking about:
> 
> #!/usr/bin/env python
> 
> from glob import glob
> from time import time
> 
> if __name__ == 'pymol':
> from pymol import cmd
> t1 = time()
> for pdb in glob('*.pdb'):
> print pdb
> cmd.load(pdb)
> t2 = time()
> print t2-t1
> 
> If (from pymol) I cd to a directory that has 50 pdb files and 
> run this script, it takes about 105 sec to complete. If I 
> include simple alignment and color commands, as follows:
> 
> #!/usr/bin/env python
> 
> from glob import glob
> from time import time
> 
> if __name__ == 'pymol':
> from pymol import cmd
> t1 = time()
> objects = []
> for pdb in glob('*.pdb'):
> print pdb
> cmd.load(pdb)
> objects.append(pdb[:-4])
> cmd.fit(objects[-1]+' and name ca',objects[0]+' and name ca') 
> cmd.color('wheat',objects[-1]+' and elem c')
> t2 = time()
> print t2-t1
> 
> , it still takes same amount of time as before. This is only 
> one data point (50 structures), because I didn't want to 
> repeat the benchmarks for larger sets of structures, but it 
> seems to indicate that the limiting step is the actual 
> loading of the pdb files, and not the subsequent 
> aligning/coloring steps.
> 
> Thanks again for letting me know which direction I should go. 
> I'll let you know if I get any insight into the origin of the 
> original issue. 
> 
> -Ben
> 
> On Sep 7, 2004, at 11:51 AM, Warren DeLano wrote:
> 
> 
> 
>       Ben,
>       
>       Thanks for the great benchmarks! PyMOL is definitely 
> showing non-linear
>       behavior when it comes to loading a lot of objects...I 
> don't know why this
>       is exactly, but I can tell you that I didn't originally 
> envision (and thus
>       optimize PYMOL for) loading of so many objects. 
>       
>       As it currently stands, there are a number of places 
> where PyMOL does things
>       using lists when it should be using hashes, and there 
> are many tasks (such
>       as selecting of atoms) that are linearly dependent (or 
> worse) on the total
>       number of atoms and coordinate sets present in the 
> system. All of these
>       issues will be addressed in time, but it may take a 
> considerable work to
>       correct them. Unfortunately, these are more than just 
> bugs -- they are
>       limitations in the original design. Such limitations 
> are now the bane of my
>       existence, my dreams are filled with questions of "How 
> do we fix or improve
>       the software, without breaking existing PyMOL usage?" 
> Remodeling an
>       airplane full of passengers while you're flying it is 
> much more challenging
>       than when it is empty and on the ground. : )
>       
>       My current advice is to find creative ways of limiting 
> the total number of
>       atoms and objects loaded into PyMOL at one time. One 
> way to do this is to
>       create subsets which just contain those atoms you'd 
> like to see. Another
>       approach is to run multiple PyMOL instances simultaneously.
>       
>       Cheers,
>       Warren
>       
>       PS. It would be great if you could send us one of your 
> more challenging
>       example scripts to use as a test-case for improvement 
> -- and if you do spot
>       simple bottlenecks in the code, such information could 
> be very helpful.
>       
>       --
>       mailto:war...@delsci.com
>       Warren L. DeLano, Ph.D.
>       Principal Scientist
>       DeLano Scientific LLC
>       Voice (650)-346-1154 
>       Fax (650)-593-4020
>       
>       
>       
> 
>               -----Original Message-----
>               From: pymol-users-ad...@lists.sourceforge.net 
>               
> [mailto:pymol-users-ad...@lists.sourceforge.net] On Behalf Of 
>               Ben Allen
>               Sent: Tuesday, September 07, 2004 10:32 AM
>               To: pymol-users@lists.sourceforge.net
>               Subject: [PyMOL] long loading times as the 
> number of existing 
>               objects increases
>               
>               I have a situation in which I need to load a 
> large number of 
>               separate pdb files into a single pymol session. In this 
>               case, the number is ~150, but it could 
> potentially be more. 
>               However, the amount of time required to load a 
> file appears 
>               to be strongly dependent on the number of files already 
>               loaded. For example:
>               
>               # of structures loaded time to load all 
> structures (seconds)
>               5 0.82
>               10 2.49
>               20 11.05
>               30 29.85
>               40 62.48
>               50 115.25
>               60 189.79
>               70 302.67
>               80 432.82
>               90 589.23
>               
>               unfortunately, this means that to load 150 
> structures takes 
>               over an hour. I observe this behavior whether I 
> am loading 
>               the structures all at once using a python 
> script, or one at a 
>               time. In both cases, I am using the cmd.load() 
> api function, 
>               but the built-in load command gives similar 
> results. The 
>               structures I am loading are (nearly) identical: 
>               each has 263 residues (in a single chain); each 
> individual 
>               pdb file is about 215KB.
>               
>               I am running this on a dual 2.0 GHz G5 system 
> with 1.5 GB 
>               memory. The long loading times are consistent 
> between the 
>               two versions of pymol I have installed: OSX/X11 hybrid 
>               version 0.97 and MacPyMol version 0.95. 
>               During the long loading times, there is plenty 
> of memory 
>               available, but the processor load stays at 50% 
> (i.e. one 
>               processor on my machine is fully loaded throughout).
>               
>               My gut feeling is that this situation should 
> not be, but I 
>               don't yet understand the structure of the code 
> well enough to 
>               debug it. Can anyone shed light on this issue?
>               
>               Thanks in advance,
>               Ben Allen
>               
>               
>               
>               
>               -------------------------------------------------------
>               This SF.Net email is sponsored by BEA Weblogic 
> Workshop FREE 
>               Java Enterprise J2EE developer tools!
>               Get your free copy of BEA WebLogic Workshop 8.1 today.
>               http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
>               _______________________________________________
>               PyMOL-users mailing list
>               PyMOL-users@lists.sourceforge.net
>               https://lists.sourceforge.net/lists/listinfo/pymol-users
>               
>               
> 
> 
> 
> 
>

RE: [PyMOL] long loading times as the number of existing objects increases

Reply via email to