Hi George, > This is probably not going to solve the problem at hand but it may be useful > to you or others in the future: > ChEMBLdb maintains a molecular hierarchy table where you can retrieve the > parent (=desalted - using Pipeline Pilot) structures for each molecular > entity. > You may try something like this: > > select distinct cs.molregno, cs.molfile, cs.canonical_smiles > from compound_structures cs, molecule_hierarchy mh > where cs.molregno = mh.parent_molregno
I confess pure ignorance here. While I've worked with databases, it's far from the list of things I know well. Reading the ERD is not simple for me, I don't have MySQL or Oracle installed on my machines, and I don't know how to browse through the schema and tables like I've seen those who are more database proficient than I do. So while I have an idea of what you are talking about, it's not something I can easily put into place. But as you say, it's not the problem, because RDKit's failure exception comes even using the original, unprocessed/un-de-salted record. Since you're here -- how come ChEMBL doesn't put an identifier on the first line of the SD record? Nearly all of them are blank; the exceptions are a dozen with mostly useless titles like: Acetic acid 6-(1-phenyl-ethyl)-6-aza-bicyclo[3.2.1]oct-3-yl ester 4-(4-Fluoro-phenyl)-2-methylsulfanyl-thiophene-3-carbonitrile 6-amino-9-(5-{[(1,2,3,3-tetrahydroxy-1,2,3-trioxidotriphosphanyl)oxy]methyl}tetr 2-Methyl-2,3-dihydro-benzofuran-7-carboxylic acid 8-methyl-8-aza-bicyclo[3.2.1]o (S)-N-((S)-1,6-diamino-1-oxohexan-2-yl)-1-((S)-5-guanidino-2-((2S,3S)-2-((S)-5-g Acetic acid 6-(1-phenyl-ethyl)-6-aza-bicyclo[3.2.1]oct-3-yl ester I end up doing a mol.SetProp("_Name", mol.GetProp("chembl_id")) so that my output SMILES have an identifier tied to them, and that seems like a needless extra step. Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss