I'm not sure that measuring similarity based on an alignment of SMILES
is a good idea. The usual way to align the structures themselves by
getting the MCS (maximal common substructure). This would give you the
common fragment. Is there some reason you don't want to do this?

Igor Filippov has contributed some MCS code at
http://openbabel.svn.sf.net/viewvc/openbabel/contributed/trunk/c%2B%2B/mcs-cliquer/?view=tar

- Noel

On 12 October 2011 15:53, Chakravarthy Marella <chakravar...@ncbs.res.in> wrote:
> Hello,
>
> I am working on the problem of comparing SMILES strings based on
> alignment. In my research, I came across the following problem:
>
> Lets say there are two SMILES strings,
>
> SMILES 1: CCCCCCCCCCCCCC(=O)NC(C(CCCCCCCCCCC)O)CO
> SMILES 2: NC(C(CCCCCCCCCCC)O)CO
>
> and we want to see how similar are those two smiles strings are ? If they
> are similar, is there any fragment (or sub-structure) that is common ?
>
> First, I removed redundancy by converting above SMILES to unique SMILES,
> by OB's canonical SMILES algorithm
>
> SMILES 1 (Unique): CCCCCCCCCCCCCC(=O)NC(C(CCCCCCCCCCC)O)CO
> SMILES 2 (Unique): CCCCCCCCCCCC(C(CO)N)O
>
> If above SMILES are aligned, I get the following
>
> CCCCCCCCCCCCCC(=O)NC(C(CCCCCCCCCCC-)--O)-CO----
> -----------------------CCCCCCCCCCCC-(C--(CO)N)O
>
> However, if you can notice, SMILES 2 is nothing but, one half of SMILES 1.
> I will just add few empty spaces before SMILES 2 to illustrate this,
>
> SMILES 1: CCCCCCCCCCCCCC(=O)NC(C(CCCCCCCCCCC)O)CO
> SMILES 2:                   NC(C(CCCCCCCCCCC)O)CO
>
> The reason why common fragment of  "NC(C(CCCCCCCCCCC)O)CO" is not finding
> it's place in first alignment is because of underlying canonical
> algorithm.
>
> I would like to know if there a way to generate SMILES
> (programmatically), such that, for any given pair of SMILES strings,
> common fragments find back their place after alignment ?
>
> In other words, I am looking for a SMILES generator algorithm, which
> always returns "NC(C(CCCCCCCCCCC)O)CO" instead of "CCCCCCCCCCCC(C(CO)N)O"
> in above example ?
>
> Any suggestions on how to do this ? or pointers to previous work would be
> gratefully acknowledged
>
> TIA
> Varthy
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2d-oct
> _______________________________________________
> OpenBabel-discuss mailing list
> OpenBabel-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to