Hello, I am working on the problem of comparing SMILES strings based on alignment. In my research, I came across the following problem:
Lets say there are two SMILES strings, SMILES 1: CCCCCCCCCCCCCC(=O)NC(C(CCCCCCCCCCC)O)CO SMILES 2: NC(C(CCCCCCCCCCC)O)CO and we want to see how similar are those two smiles strings are ? If they are similar, is there any fragment (or sub-structure) that is common ? First, I removed redundancy by converting above SMILES to unique SMILES, by OB's canonical SMILES algorithm SMILES 1 (Unique): CCCCCCCCCCCCCC(=O)NC(C(CCCCCCCCCCC)O)CO SMILES 2 (Unique): CCCCCCCCCCCC(C(CO)N)O If above SMILES are aligned, I get the following CCCCCCCCCCCCCC(=O)NC(C(CCCCCCCCCCC-)--O)-CO---- -----------------------CCCCCCCCCCCC-(C--(CO)N)O However, if you can notice, SMILES 2 is nothing but, one half of SMILES 1. I will just add few empty spaces before SMILES 2 to illustrate this, SMILES 1: CCCCCCCCCCCCCC(=O)NC(C(CCCCCCCCCCC)O)CO SMILES 2: NC(C(CCCCCCCCCCC)O)CO The reason why common fragment of "NC(C(CCCCCCCCCCC)O)CO" is not finding it's place in first alignment is because of underlying canonical algorithm. I would like to know if there a way to generate SMILES (programmatically), such that, for any given pair of SMILES strings, common fragments find back their place after alignment ? In other words, I am looking for a SMILES generator algorithm, which always returns "NC(C(CCCCCCCCCCC)O)CO" instead of "CCCCCCCCCCCC(C(CO)N)O" in above example ? Any suggestions on how to do this ? or pointers to previous work would be gratefully acknowledged TIA Varthy ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss