Hi Michal,
On Wed, Apr 1, 2015 at 10:51 AM, Michal Krompiec <[email protected]>
wrote:
> Hi Greg,
> Is it possible to do the same (i.e. create a molecule from SMILES without
> removing explicit hydrogens) in the postgresql cartridge? I would like to
> do a "restricted" substructure search using SMILES queries.
>
I think I understand your use case.
> For example, with the standard behaviour (hydrogens removed),
> c1ccccc1[CH3] is converted to c1ccccc1C and matches TNT and benzaldehyde,
> whereas if the hydrogens are not removed, this SMILES query would match TNT
> but not benzaldehyde. Of course, this can be done with SMARTS but SMILES
> with explicit hydrogens can be drawn in MarvinSketch in KNIME by a
> non-expert user.
>
Without getting overly into terminology, it sounds to me like you want
people to be able to draw something corresponding to
"C1=CC=CC=C1C([H])([H])[H]" in a sketcher, convert that to SMILES, and have
the query constructed from that SMILES match toluene but not ethyl-benzene
or benzaldehyde. Going via SMARTS here does not work because
[#6]-1=[#6]-[#6]=[#6]-[#6]=[#6]-1[H]C([H])[H] doesn't match much of
anything.
Skipping sanitization, as you propose, isn't going to help here: the
kekulized form of the ring will not be converted to aromatic and you won't
get the matches you are looking for.
Here's an approach to this that works in Python :
In [8]: m =Chem.MolFromSmiles('c1ccnc([H])n1',sanitize=False)
In [9]: nm=Chem.MergeQueryHs(m)
In [10]: Chem.SanitizeMol(nm)
Out[10]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
In [11]: Chem.MolFromSmiles('c1ccncn1').HasSubstructMatch(nm)
Out[11]: True
In [12]: Chem.MolFromSmiles('c1ccnc(C)n1').HasSubstructMatch(nm)
Out[12]: False
Notice the MergeQueryHs() step; that's essential unless you are storing
molecules in the database with Hs attached (pretty unlikely).
Being able to do something equivalent in the cartridge would certainly be
useful. What I'd suggest is the addition of two functions:
"query_mol_from_smiles()" and "query_mol_from_ctab()" that do this.
Then you could do queries like:
select * from mols where m @> query_mol_from_smiles('c1ccnc([H])n1');
and have it do the right thing.
Sound reasonable?
-greg
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss