I'd say that using RDkit to calculate the numbers of heavy atoms is
significantly more robust than a purely lexical approach - and it's easy to
implement.
It's also dangerous to just discard the smallest fragment. Years ago I
worked on a project where the active molecule had only 11 heavy atoms and
the counterion (dicyclohexylamine) had 13 - so relying on atom counts is a
way to sometimes throw the baby out with the bath water. It's much safer
(but also a lot more work) to build a desalter/desolvater that explicitly
removes just the fragments you really want to remove.
Best regards,
Chris
On 29 June 2018 at 09:56, Ed Griffen <ed.grif...@medchemica.com> wrote:
> Using the string length to find the number of atoms in a molecule is OK -
> but you need to take account of the additional characters in SMILES that
> are not just atoms, for example:
>
> two letter elements - like silicon, chlorine etc
> brackets , ring closures, charges, explicit hydrogens
>
> It’s simple to do:
>
> Here’s a worked example:
>
> >>> SMILES = 'C[S@@+]([O-])c1ccc(cc1)[Si](C)(C)C'
> >>> print(len(SMILES))
> 34
> >>> heavies = [char for char in SMILES if char not in
> '''()[]1234567890#:;,.?%-=+\/Hherlabdgfikmputvy@''']
> >>> print(len(heavies))
> 13
>
> obviously you do this after splitting on the .
>
> Best regards,
>
> Ed
>
> Dr Ed Griffen,
> Technical Director,
> mobile +44 7762 121593
> office +44 1625 238843
> ed.grif...@medchemica.com
> www.medchemica.com
> skype: ed.griffen
> Twitter: @MedChemica
> Medchemica Ltd is a company registered in England and Wales with company
> number 8162245.
>
> Confidentiality Notice: This message is private and may contain
> confidential, proprietary and legally privileged information. If you have
> received this message in error, please notify us and remove it from your
> system and note that you must not copy, distribute or take any action in
> reliance on it. Any unauthorised use or disclosure of the contents of
> this message is not permitted and may be unlawful.
> Disclaimer: Email messages may be subject to delays, interception,
> non-delivery and unauthorised alterations. Therefore, information expressed
> in this message is not given or endorsed by MedChemica Limited unless
> otherwise notified by an authorised representative independent of this
> message. No contractual relationship is created by this message by any
> person unless specifically indicated by agreement in writing other than
> email.
> Monitoring: MedChemica Limited retains and monitors all email traffic data
> and content for the purposes of the prevention and detection of
> crime, ensuring the security of our computer systems and checking
> compliance with our policies.
>
> On 29 Jun 2018, at 06:37, Alfredo Quevedo <maquevedo....@gmail.com> wrote:
>
> thank you Hideyoshi for your feedback.
> regards
> Alfredo
>
> Enviado desde BlueMail <http://www.bluemail.me/r?b=13187>
> En 28 de junio de 2018, en 21:43, "藤秀義" <hideyoshif...@gmail.com>
> escribió:
>>
>> Dear Alfredo,
>>
>> Although not strictly based on the number of atoms, but on the length of
>> SMILES string, the simplest way is using Python built-in functions as
>> follows:
>>
>> smiles = 'CCC.CC'
>> fragment = max(smiles.split('.'), key=len)
>> print (fragment)
>>
>> Best regards,
>>
>> Hideyoshi
>>
>>
>> thank you Paolo for this help, I will study the code and try it,
>>>
>>> best regards
>>>
>>> Alfredo
>>>
>>> Enviado desde BlueMail <http://www.bluemail.me/r?b=13187>
>>>
>>> En 28 de junio de 2018, en 17:08, Paolo Tosco <
>>> paolo.tosco.m...@gmail.com> escribió:
>>>
>>> Dear Alfredo,
>>>
>>> if you wish to keep only the largest disconnected fragment you may try
>>> the following:
>>>
>>> mols = list(rdmolops.GetMolFrags(mol, asMols = True))
>>> if (mols):
>>> mols.sort(reverse = True, key = lambda m: m.GetNumAtoms())
>>> mol = mols[0]
>>>
>>> Hope that helps, cheers
>>> p.
>>>
>>> On 06/28/18 19:38, Alfredo Quevedo wrote:
>>>
>>> Good afternoon,
>>>
>>> I would like to filter out small fragments from a list of molecules
>>> using the below strategy:
>>>
>>> from rdkit import Chem
>>> from rdkit.Chem import AllChem
>>> from rdkit.Chem import SaltRemover fragment
>>>
>>> remover=SaltRemover.SaltRemover()
>>> mol=Chem.MolFromSmiles('CCC.CC')
>>> res=remover.StripMol(mol)
>>> print(res.GetNumAtoms())
>>>
>>>
>>> I am getting 5 atoms as output, so the ´CC´ is not being stripped (the
>>> script workd ok for salts). Is there any way of filtering non salts
>>> small fragments?
>>>
>>> thank you very much in advance,
>>>
>>> regards,
>>>
>>> Alfredo
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>>
>>>
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org <http://slashdot.org/>!
>>> http://sdm.link/slashdot
>>>
>>> ------------------------------
>>>
>>>
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>>
>>> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org <http://slashdot.org>!
> http://sdm.link/slashdot_______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss