Hi Markus, On Fri, May 23, 2008 at 10:49 AM, markus <[email protected]> wrote: > Hi there, > > some questions about File input: > 1. I want to read in large multiconformer sdf file. > This could be done by checking the canonical isomeric smiles. > However I wonder if that could be (or already is) implemented in some > Supplier > as I suspect my Python Code to be slow.
There's not currently a multiconformer SD file reader. Given a format that people could agree on, it would not be hard to implement one, but first you have to have an agreed-upon structure for the file. Matching molecules using canonical smiles would be very, very inefficient. It would be much better to use a property to identify the conformers (either the molecule name field or an SD property). As an aside: SDF is a very inefficient storage mechanism for multi-conf SD files: there's a lot of duplicated information. The RDKit supports a format called TPL (it's an old MSI/biosym format) that's a lot more efficient. > 2. How about Proteins / binding Pockets? As RDKit explicitly is designed > for > small Mols, I wonder if anyone has already loaded a Protein? > I found a post by Adrian Schreyer (*[Rdkit-discuss] Atom-level SMARTS > matching*) > mentioning SIFTs, This implies the perception of Pocket-residues. > Was that done / Can that be done by the RDKit? I'll say more about this in my reply to Adrian's reply. -greg

