[Rdkit-discuss] Extracting SMILES from text

Alexis Parenty Fri, 02 Dec 2016 01:13:27 -0800

Dear all,


I am looking for a way to extract SMILES scattered in many text documents
(thousands documents of several pages each).

At the moment, I am thinking to scan each words from the text and try to
make a mol object from them using Chem.MolFromSmiles() then store the words
if they return a mol object that is not None.

Can anyone think of a better/quicker way?


Would it be worth storing in a tuple any word that do not return a mol
object from Chem.MolFromSmiles() and exclude them from subsequent search?


Something along those lines


excluded_set = set()

smiles_list = []

For each_word in text:

    If each_word not in excluded_set:

            each_word_mol =  Chem.MolFromSmiles(each_word)

            if each_word_mol is not None:

                    smiles_list.append(each_word)

             else:

                     excluded_set.add(each_word_mol)


Would not searching into that growing tuple take actually more time than
trying to blindly make a mol object for every word?



Any suggestion?


Many thanks and regards,


Alexis

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] Extracting SMILES from text

Reply via email to