Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Stiefl, Nikolaus
Even better ☺ From: Greg Landrum Date: Friday 29 June 2018 at 18:04 To: GMCProfile Cc: "rdkit-discuss@lists.sourceforge.net" Subject: Re: [Rdkit-discuss] elimination of small fragments How about just GetLargestFragment()? On Fri, 29 Jun 2018 at 16:45, Stiefl, Nikolaus mailto:ni

Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Greg Landrum
ik > > > > > > *From: *Alfredo Quevedo > *Date: *Friday 29 June 2018 at 12:06 > *To: *Andrew Dalke > *Cc: *Stephen Roughley via Rdkit-discuss < > rdkit-discuss@lists.sourceforge.net> > *Subject: *Re: [Rdkit-discuss] elimination of small fragments > &g

Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Stiefl, Nikolaus
rdmolops.GetMolFrags(mol, asMols = True, largestFragmentOnly = True) ? Just a thought … Cheers Nik From: Alfredo Quevedo Date: Friday 29 June 2018 at 12:06 To: Andrew Dalke Cc: Stephen Roughley via Rdkit-discuss Subject: Re: [Rdkit-discuss] elimination of small fragments thank you much much

Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Chris Earnshaw
Ed, As always, there are no 'one size fits all' solutions, it all depends on what you need to do. I was processing tens of millions of screening compounds into a database and used a desalter/desolvator written using the RDkit C++ API. That was quite quick enough for my needs - I never tried it wit

Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Ed Griffen
Chris, Absolutely agree with your points - processing the molecules into RDkit is much more robust, but it depends though on how many you’ve got to process. If you’re doing millions to billions, then the overhead can become a problem and doing it in two steps (lexical then graph) can be the pra

Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Chris Earnshaw
I'd say that using RDkit to calculate the numbers of heavy atoms is significantly more robust than a purely lexical approach - and it's easy to implement. It's also dangerous to just discard the smallest fragment. Years ago I worked on a project where the active molecule had only 11 heavy atoms an

Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Andrew Dalke
On Jun 29, 2018, at 02:43, 藤秀義 wrote: > Although not strictly based on the number of atoms, but on the length of > SMILES string, the simplest way is using Python built-in functions as follows: > > smiles = 'CCC.CC' > fragment = max(smiles.split('.'), key=len) > print (fragment) The mmpdb packa

Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Alfredo Quevedo
thank you Ed for suggesting this alternative regards Alfredo ⁣Enviado desde BlueMail ​ En 29 de junio de 2018 05:56, en 05:56, Ed Griffen escribió: >Using the string length to find the number of atoms in a molecule is OK >- but you need to take account of the additional characters in SMILES >t

Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Alfredo Quevedo
thank you much much Andrew for this detailed explanation regards Alfredo ⁣Enviado desde BlueMail ​ En 29 de junio de 2018 07:02, en 07:02, Andrew Dalke escribió: >On Jun 28, 2018, at 22:08, Paolo Tosco >wrote: >> if you wish to keep only the largest disconnected fragment you may >try the follo

Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Andrew Dalke
On Jun 28, 2018, at 22:08, Paolo Tosco wrote: > if you wish to keep only the largest disconnected fragment you may try the > following: > > mols = list(rdmolops.GetMolFrags(mol, asMols = True)) > if (mols): > mols.sort(reverse = True, key = lambda m: m.GetNumAtoms()) > mol = mols[0] A s

Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Ed Griffen
Using the string length to find the number of atoms in a molecule is OK - but you need to take account of the additional characters in SMILES that are not just atoms, for example: two letter elements - like silicon, chlorine etc brackets , ring closures, charges, explicit hydrogens It’s simple

Re: [Rdkit-discuss] elimination of small fragments

2018-06-28 Thread Alfredo Quevedo
thank you Hideyoshi for your feedback. regards Alfredo ⁣Enviado desde BlueMail ​ En 28 de junio de 2018 21:43, en 21:43, "藤秀義" escribió: >Dear Alfredo, > >Although not strictly based on the number of atoms, but on the length >of >SMILES string, the simplest way is using Python built-in function

Re: [Rdkit-discuss] elimination of small fragments

2018-06-28 Thread 藤秀義
Dear Alfredo, Although not strictly based on the number of atoms, but on the length of SMILES string, the simplest way is using Python built-in functions as follows: smiles = 'CCC.CC' fragment = max(smiles.split('.'), key=len) print (fragment) Best regards, Hideyoshi thank you Paolo for this

Re: [Rdkit-discuss] elimination of small fragments

2018-06-28 Thread Alfredo Quevedo
thank you Paolo for this help, I will study the code and try it, best regards Alfredo ⁣Enviado desde BlueMail ​ En 28 de junio de 2018 17:08, en 17:08, Paolo Tosco escribió: >Dear Alfredo, > >if you wish to keep only the largest disconnected fragment you may try >the following: > >mols = list(

Re: [Rdkit-discuss] elimination of small fragments

2018-06-28 Thread Paolo Tosco
Dear Alfredo, if you wish to keep only the largest disconnected fragment you may try the following: mols = list(rdmolops.GetMolFrags(mol, asMols = True)) if (mols):     mols.sort(reverse = True, key = lambda m: m.GetNumAtoms())     mol = mols[0] Hope that helps, cheers p. On 06/28/18 19:38,

[Rdkit-discuss] elimination of small fragments

2018-06-28 Thread Alfredo Quevedo
Good afternoon, I would like to filter out small fragments from a list of molecules using the below strategy: from rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem import SaltRemover remover=SaltRemover.SaltRemover() mol=Chem.MolFromSmiles('CCC.CC') res=remover.StripMol(mol) p