Hello,

I am working on the problem of comparing SMILES strings based on
alignment. In my research, I came across the following problem:

Lets say there are two SMILES strings,

SMILES 1: CCCCCCCCCCCCCC(=O)NC(C(CCCCCCCCCCC)O)CO
SMILES 2: NC(C(CCCCCCCCCCC)O)CO

and we want to see how similar are those two smiles strings are ? If they
are similar, is there any fragment (or sub-structure) that is common ?

First, I removed redundancy by converting above SMILES to unique SMILES,
by OB's canonical SMILES algorithm

SMILES 1 (Unique): CCCCCCCCCCCCCC(=O)NC(C(CCCCCCCCCCC)O)CO
SMILES 2 (Unique): CCCCCCCCCCCC(C(CO)N)O

If above SMILES are aligned, I get the following

CCCCCCCCCCCCCC(=O)NC(C(CCCCCCCCCCC-)--O)-CO----
-----------------------CCCCCCCCCCCC-(C--(CO)N)O

However, if you can notice, SMILES 2 is nothing but, one half of SMILES 1.
I will just add few empty spaces before SMILES 2 to illustrate this,

SMILES 1: CCCCCCCCCCCCCC(=O)NC(C(CCCCCCCCCCC)O)CO
SMILES 2:                   NC(C(CCCCCCCCCCC)O)CO

The reason why common fragment of  "NC(C(CCCCCCCCCCC)O)CO" is not finding
it's place in first alignment is because of underlying canonical
algorithm.

I would like to know if there a way to generate SMILES
(programmatically), such that, for any given pair of SMILES strings,
common fragments find back their place after alignment ?

In other words, I am looking for a SMILES generator algorithm, which
always returns "NC(C(CCCCCCCCCCC)O)CO" instead of "CCCCCCCCCCCC(C(CO)N)O"
in above example ?

Any suggestions on how to do this ? or pointers to previous work would be
gratefully acknowledged

TIA
Varthy


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to