I want to show some numbers from a compatible fragmentation scheme to my own 
one. Which means generating all the leaves from the hierarchy and then doing 
some post processing to merge these fragments. This isn't a problem on some of 
the more drug like data sets, however with ChEMBL this is causing me some 
stress.

Best,
Nick

Nicholas C. Firth | PhD Student | Cancer Therapeutics
The Institute of Cancer Research | 15 Cotswold Road | Belmont | Sutton | Surrey 
| SM2 5NG
T 020 8722 4033 | E [email protected]<mailto:[email protected]> | 
W www.icr.ac.uk<http://www.icr.ac.uk/> | Twitter 
@ICRnews<https://twitter.com/ICRnews>
Facebook 
www.facebook.com/theinstituteofcancerresearch<http://www.facebook.com/theinstituteofcancerresearch>
Making the discoveries that defeat cancer

[cid:[email protected]]

On 11 Jun 2014, at 09:26, Greg Landrum 
<[email protected]<mailto:[email protected]>> wrote:

The RECAP code currently generates a hierarchy tree for the molecule. The size 
of this tree scales very non-linearly with the number of fragments. That 
molecule has a huge number of fragments.

I don't think the RECAP code will work for you as written.

What are you trying to get out of the analysis? There may be another approach 
that will work,

-greg





On Wed, Jun 11, 2014 at 3:25 AM, Nicholas Firth 
<[email protected]<mailto:[email protected]>> wrote:


I think I have found part of the problem, I tried it on a single processor last 
night and didn't get past the second molecule. The script hangs on this 
molecule.

>>> from rdkit import Chem
>>> from rdkit.Chem import Recap
>>> mol = 
>>> Chem.MolFromSmiles('CC[C@H](C)[C@H](NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](NC(=O)[C@@H](N)CCSC)[C@@H](C)O)C(=O)NCC(=O)N[C@@H](C)C(=O)N[C@@H](C)C(=O)N[C@@H](Cc1c[nH]cn1)C(=O)N[C@@H](CC(=O)N)C(=O)NCC(=O)N[C@@H](C)C(=O)N[C@@H](C)C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCN=C(N)N)C(=O)NCC(=O)N[C@@H](CCC(=O)N)C(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N2CCC[C@H]2C(=O)N3CCC[C@H]3C(=O)NCC(=O)N[C@@H](CO)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N')
>>> hierarch = Recap.RecapDecompose(mol)
>>> ks = hierarch.GetLeaves().keys()

I imagined it would be slow for this molecule, but 8 hours might be an issue 
rather than a feature!
Best,
Nick

Nicholas C. Firth | PhD Student | Cancer Therapeutics
The Institute of Cancer Research | 15 Cotswold Road | Belmont | Sutton | Surrey 
| SM2 5NG
T 020 8722 4033 | E [email protected]<mailto:[email protected]> | 
W www.icr.ac.uk<http://www.icr.ac.uk/> | Twitter 
@ICRnews<https://twitter.com/ICRnews>
Facebook 
www.facebook.com/theinstituteofcancerresearch<http://www.facebook.com/theinstituteofcancerresearch>
Making the discoveries that defeat cancer

<image001.gif>

On 10 Jun 2014, at 20:53, Dimitri Maziuk 
<[email protected]<mailto:[email protected]>> wrote:

On 06/10/2014 01:48 PM, Nicholas Firth wrote:

I still have plenty of CPU's and memory available though, so this
seems odd. Some of the processes have done nothing and the others seem
to have frozen at different times.

Yeah. Parallel processing is often not quite that straightforward.

For instance, since you say they're writing to files, how's your disk i/o?

--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu<http://www.bmrb.wisc.edu/>




The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company 
Limited by Guarantee, Registered in England under Company No. 534147 with its 
Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only. If the 
message is received by anyone other than the addressee, please return the 
message to the sender by replying to it and then delete the message from your 
computer and network.

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Rdkit-discuss mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company 
Limited by Guarantee, Registered in England under Company No. 534147 with its 
Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only.  If the 
message is received by anyone other than the addressee, please return the 
message to the sender by replying to it and then delete the message from your 
computer and network.
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to