Re: [Rdkit-discuss] Using SQLAlchemy with the RDKit database cartridge

Riccardo Vianello Mon, 04 Jul 2011 06:15:31 -0700

Hi Adrian,

On Mon, Jul 4, 2011 at 11:22 AM, Adrian Schreyer <[email protected]> wrote:
> Hi Riccardo,
>
> are you planning on supporting other cartridges/database dialects? If
> you only want to include RDKit/PostgreSQL then your implementation
> might be much more of what you actually need :)


yes, at present I'm focusing almost exclusively on RDKit/PostgreSQL,
but the longer term goal includes supporting additional backends. I'm
particularly interested in the RDKit's cartridge, and I plan to
implement full support for it, but you are correct, if that were the
only supported cartridge/database this kind of approach could be
considered overkill.

> If you want to change the similarity thresholds you will need
> something like session.execute(text("SET
> rdkit.tanimoto_threshold=:threshold").execution_options(autocommit=True).params(threshold=threshold)),
> probably wrapped inside a function. Important is that you use
> execution_options(autocommit=True) because SQLAlchemy won't autocommit
> SET operations (if you set it in the engine config).

Useful hint, thanks. I'm still learning about SQLAlchemy and I would
have probably missed the proper management of the autocommit flag.
Btw, I actually postponed the management of the threshold values
because of some technical details that are unclear to me and I should
have asked about anyway. More specifically, I'm not sure about the
scope associated to these "global" parameters. The question is
probably for Greg, but do these values hold for the whole server, or
for the given database, or for the specific database connection?

> I also have rdkit in my database api but I went for the hybrid
> approach in SQLAlchemy that allows you to distinguish between methods
> on the class and instance level. With an instance of an rdkit molecule
> for example, the api will use the local rdkit installation for a
> substructure pattern match. On the class level however, the same
> expression is turned into an SQL expression to query the database. I
> also use the @reconstructor decorator to turn the database rdmol
> smiles string back in to a Python RDMol but this is only useful if you
> plan on using RDKit on the client side as well.

Currently I'm making no assumptions on the toolkit available on the
client side, I had quickly read the documentation related to the
hybrid approach and I wasn't very convinced it was what I was looking
for, but from your examples I would say I should reconsider that, at
least in part.

Thanks a lot for your reply and suggestions,

Cheers,
Riccardo

>
> # instance of ChemCompRDMol
>>>>> print sti.RDMol.contains('c1ccccc1')
> True
>
> # class itself
>>>>> print ChemCompRDMol.contains(sti.ism)
> pdbchem.chem_comp_rdmols.rdmol OPERATOR(rdkit.@>) :rdmol_1
>
> Here is an example:
>
>   @reconstructor
>   def init_on_load(self):
>       '''
>       Turns the rdmol column that is returned as a SMILES string back into an
>       RDMol object.
>       '''
>       self.rdmol = MolFromSmiles(self.rdmol)
>
>   @hybrid_method
>   def contains(self, smiles):
>       '''
>       '''
>       return self.rdmol.HasSubstructMatch(MolFromSmiles(smiles))
>
>   @contains.expression
>   def contains(self, smiles):
>       '''
>       '''
>       return self.rdmol.op('OPERATOR(rdkit.@>)')(smiles)
>
> and that's basically it. I have the cartridge installed in it's own
> schema, that's why I need the OPERATOR() syntax.
>
> Cheers,
>
> Adrian
>
> On Fri, Jul 1, 2011 at 17:22, Riccardo Vianello
> <[email protected]> wrote:
>> Hi all,
>>
>> I've started working on an extension of the SQLAlchemy database
>> toolkit that is aimed to support direct access from python to the
>> functions and data types exposed by the database chemical cartridge.
>> In brief this means that instead of interacting with the RDBMS using
>> raw SQL queries, it may become possible to execute the entire workflow
>> (data preprocessing and cleanup, insertion, selection and further
>> processing) without leaving the python interpreter, and at the same
>> time delegating the construction of the required SQL expressions to a
>> higher-level API. Just to make a simple example, instead of using
>>
>> select count(*) from molecules where structure @> 'O=C1OC2=CC=CC=C2C=C1';
>>
>> one might type something like the following:
>>
>>>>> constraint = Molecule.structure.contains('O=C1OC2=CC=CC=C2C=C1')
>>>>> print session.query(Molecule).filter(constraint).count()
>>
>> (ok, in this specific case the python expression is a bit more
>> verbose, but it's a very simple SQL query :-)
>>
>> The project is still in an initial phase, and the code is far from
>> being mature, but the development is currently strongly focused on the
>> RDKit postgresql extension. Structure searches and molecular
>> descriptors should be fully supported, and bit fingerprints and
>> associated similarity operators are also available (but modifying the
>> default threshold similarity values is not yet possible). The code is
>> currently hosted on github
>>
>> https://github.com/rvianello/razi
>>
>> and some draft documentation (at the moment mainly intended to
>> illustrate the idea than providing a detailed reference) is also
>> available:
>>
>> http://razi.readthedocs.org
>>
>> If you use the RDKit chemical cartridge or SQLAlchemy (or both), I
>> hope you will find the idea interesting and I'd love to hear from you.
>> Comments, ideas and suggestions would be very welcome.
>>
>> Cheers,
>> Riccardo
>>
>> ------------------------------------------------------------------------------
>> All of the data generated in your IT infrastructure is seriously valuable.
>> Why? It contains a definitive record of application performance, security
>> threats, fraudulent activity, and more. Splunk takes this data and makes
>> sense of it. IT sense. And common sense.
>> http://p.sf.net/sfu/splunk-d2d-c2
>> _______________________________________________
>> Rdkit-discuss mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Using SQLAlchemy with the RDKit database cartridge

Reply via email to