I'm not sure that measuring similarity based on an alignment of SMILES is a good idea. The usual way to align the structures themselves by getting the MCS (maximal common substructure). This would give you the common fragment. Is there some reason you don't want to do this?
Igor Filippov has contributed some MCS code at http://openbabel.svn.sf.net/viewvc/openbabel/contributed/trunk/c%2B%2B/mcs-cliquer/?view=tar - Noel On 12 October 2011 15:53, Chakravarthy Marella <chakravar...@ncbs.res.in> wrote: > Hello, > > I am working on the problem of comparing SMILES strings based on > alignment. In my research, I came across the following problem: > > Lets say there are two SMILES strings, > > SMILES 1: CCCCCCCCCCCCCC(=O)NC(C(CCCCCCCCCCC)O)CO > SMILES 2: NC(C(CCCCCCCCCCC)O)CO > > and we want to see how similar are those two smiles strings are ? If they > are similar, is there any fragment (or sub-structure) that is common ? > > First, I removed redundancy by converting above SMILES to unique SMILES, > by OB's canonical SMILES algorithm > > SMILES 1 (Unique): CCCCCCCCCCCCCC(=O)NC(C(CCCCCCCCCCC)O)CO > SMILES 2 (Unique): CCCCCCCCCCCC(C(CO)N)O > > If above SMILES are aligned, I get the following > > CCCCCCCCCCCCCC(=O)NC(C(CCCCCCCCCCC-)--O)-CO---- > -----------------------CCCCCCCCCCCC-(C--(CO)N)O > > However, if you can notice, SMILES 2 is nothing but, one half of SMILES 1. > I will just add few empty spaces before SMILES 2 to illustrate this, > > SMILES 1: CCCCCCCCCCCCCC(=O)NC(C(CCCCCCCCCCC)O)CO > SMILES 2: NC(C(CCCCCCCCCCC)O)CO > > The reason why common fragment of "NC(C(CCCCCCCCCCC)O)CO" is not finding > it's place in first alignment is because of underlying canonical > algorithm. > > I would like to know if there a way to generate SMILES > (programmatically), such that, for any given pair of SMILES strings, > common fragments find back their place after alignment ? > > In other words, I am looking for a SMILES generator algorithm, which > always returns "NC(C(CCCCCCCCCCC)O)CO" instead of "CCCCCCCCCCCC(C(CO)N)O" > in above example ? > > Any suggestions on how to do this ? or pointers to previous work would be > gratefully acknowledged > > TIA > Varthy > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > _______________________________________________ > OpenBabel-discuss mailing list > OpenBabel-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss > ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss