Dear all,
I am looking for a way to extract SMILES scattered in many text documents
(thousands documents of several pages each).
At the moment, I am thinking to scan each words from the text and try to
make a mol object from them using Chem.MolFromSmiles() then store the words
if they return a mol object that is not None.
Can anyone think of a better/quicker way?
Would it be worth storing in a tuple any word that do not return a mol
object from Chem.MolFromSmiles() and exclude them from subsequent search?
Something along those lines
excluded_set = set()
smiles_list = []
For each_word in text:
If each_word not in excluded_set:
each_word_mol = Chem.MolFromSmiles(each_word)
if each_word_mol is not None:
smiles_list.append(each_word)
else:
excluded_set.add(each_word_mol)
Would not searching into that growing tuple take actually more time than
trying to blindly make a mol object for every word?
Any suggestion?
Many thanks and regards,
Alexis
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss