Re: [Rdkit-discuss] molecule standardization in cartridge search

Jan Holst Jensen Sat, 26 Sep 2015 04:27:07 -0700

Hi Tim,

Soren (cc:ed) wrote me and asked about molvs. Thanks to Soren forreminding that the original question was about standardization more thancalling Python code from Postgres :-).

http://molvs.readthedocs.org/en/latest/

Take a look at molvs - it's got lots of functionality that you willneed. We also use molvs as the backbone for much of our standardization.


Cheers
-- Jan
________________________

Hi Tim,

A simple getting-started example is:

   CREATE FUNCTION smiles2molfile(smiles text) RETURNS text
      LANGUAGE plpythonu AS $$
   import rdkit
   from rdkit import Chem

   mol = Chem.MolFromSmiles(smiles)
   return Chem.MolToMolBlock(mol)
   $$;


and you can then

    select smiles2molfile('CC');

and get back a molfile.

For more advanced usage it is worth taking a look at the rdchord projectthat TJ has sent links to.


Cheers
-- Jan

On 2015-09-25 15:54, Tim Dudgeon wrote:

Jan,

thanks for that. I'll give it a try.
Are there any examples of writing RDKit functions and procedures for
postgres in python?
I see this general postgres docs:
http://www.postgresql.org/docs/9.4/static/plpython.html
but wondered if there are any RDKit specific examples anywhere?

Tim

On 25/09/2015 08:30, Jan Holst Jensen wrote:

On 2015-09-24 16:22, Tim Dudgeon wrote:

I'm trying to get to grips with using the RDKit cartridge, and so far
its going well.
One thing I'm concerned about is molecule standardization, along the
lines of the ChemAxon Standardizer that allows substructure searches to
be done is a way that is largely independent of the quirks of structure
representation. The classic example would be how nitro groups are
represented, so that it didn't matter which nitro representation was in
the query or target structures, because both were converted to a
canonical form.

My initial thoughts are that this would be done by:
1. loading the "raw" structures into a source column that would never be
changed
2. defining a function that performed the necessary transform to
generate the canonical form of a molecule.
3. generating a "canonical" structure column that was the result of
passing the raw structures through that function
4. building the SSS index on that canonical column
5. executing queries using that function to canonicalize the query
structure

The problem I'm finding is that there do not seem to be postgres
functions defined for doing molecular transforms (essentially a reaction
transform) and doing things like removing explicit hydrogens. At least
not in the functions listed on this page:
http://rdkit.org/docs/Cartridge.html#functions

Am I missing something here, or might I be barking up completely the
wrong tree?

Tim

Hi Tim,

We have about the same situation and we're adding standardization
(beyond what RDKit implicitly does when it sanitizes the molecule)
through Python stored procedures. You will need to build and maintain
a normal Python-enabled RDKit installation in parallel to the
cartridge. The Python stored procedures can access the normal RDKit
installation and then run whatever Python code is necessary to do
additional molecule cleanup.

You will need to tweak your Postgres environment so the Python stored
procedures can load RDKit. This is what I have defined in an
environment file on CentOS:

RDBASE=/opt/rdkit
LD_LIBRARY_PATH=/opt/rdkit/lib
PYTHONPATH=/opt/rdkit

On Ubuntu this would go into /etc/postgresql/9.x/main/environment (in
a slightly different format where the values have to be single-quoted).

Cheers
-- Jan, Biochemfusion

------------------------------------------------------------------------------

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] molecule standardization in cartridge search

Reply via email to