Hi Andrew,

This is probably not going to solve the problem at hand but it may be
useful to you or others in the future:
ChEMBLdb maintains a molecular hierarchy table where you can retrieve the
parent (=desalted - using Pipeline Pilot) structures for each molecular
entity.
You may try something like this:

select distinct cs.molregno, cs.molfile, cs.canonical_smiles
from compound_structures cs, molecule_hierarchy mh
where cs.molregno = mh.parent_molregno

This will give you all the *unique* desalted structures in chEMBL.
In case you want to keep also track of the molregnos of the salt forms for
each parent structure, try (mysql-specific):

select cs.molregno, group_concat(mh.molregno), cs.molfile,
cs.canonical_smiles
from compound_structures cs, molecule_hierarchy mh
where cs.molregno = mh.parent_molregno
group by cs.molregno

I hope it hels.

Best regards,

George Papadatos
EMBL-EBI


On 30 April 2012 21:32, Andrew Dalke <da...@dalkescientific.com> wrote:

> I'm desalting the ChEMBL data set and generating the corresponding
> de-salted SD and SMILES files. I found a problem in the conversion step,
> and found that the problem has nothing to do with the de-salting.
>
> My code failed with CHEMBL1269997, which is record ~750,200 out of
> 1,142,974. (In other words, it took a while to get to this point.) Here's a
> reproducible:
>
> >>> from rdkit import Chem
> >>> writer = Chem.SDWriter("/dev/stdout")
> >>> for mol in Chem.ForwardSDMolSupplier("CHEMBL1269997.sdf"):
> ...   writer.write(mol)
> ...
> [22:11:05]
>
> ****
> Invariant Violation
>
> Violation occurred on line 388 in file
> /tmp/homebrew-rdkit-HEAD-Ebdo/Code/GraphMol/FileParsers/MolFileStereochem.cpp
> Failed Expression: pick >= 0
> ****
>
> Traceback (most recent call last):
>  File "<stdin>", line 2, in <module>
> RuntimeError: Invariant Violation
> >>> Chem.MolToSmiles(mol)
> 'OCC1=CC2OC(CC(C)C)(CC(C)C)C3C4CCCC56C(OC(C)(C)O5)C1(O)C46C23'
> >>> Chem.MolToSmiles(mol, isomericSmiles=True)
> 'OCC1=C[C@@H]2OC(CC(C)C)(CC(C)C)[C@@H]3[C@H]4CCC[C@@]56[C@
> @H](OC(C)(C)O5)[C@]1(O)[C@]46[C@H]23'
> >>>
>
> You can see that the molecule was read in, is not None, and can be used to
> generate a SMILES.
>
> The CHEMBL1269997.sdf is attached.
>
> This error was previously reported in the thread JP started, titled
> "Invariant violation...", dated July 6, 2011. Greg replied:
>
> > Wow that is certainly an error I never expected to see. From the code,
> > I guess the molecule has a stereocenter that is surrounded by other
> > stereocenters and something extremely unfortunate is happening with
> > the way decisions are being made about which bonds to wedge. As Eddie
> > requested in an earlier message, it would be helpful to have the input
> > that produced the error so that it can be added to the test cases (and
> > so that I can be sure the problem is fixed once I figure out how to).
>
> but I see no posting of a failing structure. I hope the attached structure
> helps resolve this problem.
>
>
>
>                                Andrew
>                                da...@dalkescientific.com
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to