Re: [Rdkit-discuss] Building RDKit on Windows

2014-03-05 Thread James Davidson
Thanks Greg - that did the trick!
(I still see pythonTestDbCLI - as previously posted)

Kind regards

James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
100 Berkshire Place
Wharfedale Road
Winnersh, Berkshire
RG41 5RD, England
Tel: +44 (0)118 938 

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the Company address and 
registration details link at the bottom of the page..
__

--
Subversion Kills Productivity. Get off Subversion  Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Building RDKit on Windows

2014-03-05 Thread Greg Landrum
On Wednesday, March 5, 2014, James Davidson j.david...@vernalis.com wrote:

 Thanks Greg - that did the trick!
 (I still see pythonTestDbCLI - as previously posted)


Those problems are due to windows not being able to delete files that it
has just closed. I need to put the deletion bit in a try..except block. The
failures probably do not indicate any actual problem.

-greg


 Kind regards

 James

 __
 PLEASE READ: This email is confidential and may be privileged. It is
 intended for the named addressee(s) only and access to it by anyone else is
 unauthorised. If you are not an addressee, any disclosure or copying of the
 contents of this email or any action taken (or not taken) in reliance on it
 is unauthorised and may be unlawful. If you have received this email in
 error, please notify the sender or postmas...@vernalis.com javascript:;.
 Email is not a secure method of communication and the Company cannot accept
 responsibility for the accuracy or completeness of this message or any
 attachment(s). Please check this email for virus infection for which the
 Company accepts no responsibility. If verification of this email is sought
 then please request a hard copy. Unless otherwise stated, any views or
 opinions presented are solely those of the author and do not represent
 those of the Company.

 The Vernalis Group of Companies
 100 Berkshire Place
 Wharfedale Road
 Winnersh, Berkshire
 RG41 5RD, England
 Tel: +44 (0)118 938 

 To access trading company registration and address details, please go to
 the Vernalis website at www.vernalis.com and click on the Company
 address and registration details link at the bottom of the page..
 __

--
Subversion Kills Productivity. Get off Subversion  Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Pg cartridge - mol_to_ctab() and trouble with conformers.

2014-03-05 Thread Jan Holst Jensen

Hi,

About ready to push a changeset for implementing mol_to_ctab(), but I 
would like it to play nice and preserve input depictions.


Ideally I would like the following

select mol_to_ctab(mol_from_ctab(input-molfile));

to output a molfile where the coordinates of input-molfile are preserved.

If I do that in Python it works:

 from rdkit import Chem
 m = Chem.MolFromMolBlock(chiral1.mol
   ...   ChemDraw04200416412D
   ...
   ...   5  4  0  0  0  0  0  0  0  0999 V2000
   ...-0.01410.05530. C   0  0  0  0  0  0  0  0 0  0  0  0
   ... 0.81090.05530. F   0  0  0  0  0  0  0  0 0  0  0  0
   ...-0.42660.76970. Br  0  0  0  0  0  0  0  0 0  0  0  0
   ...-0.0141   -0.76970. Cl  0  0  0  0  0  0  0  0 0  0  0  0
   ...-0.8109   -0.15830. C   0  0  0  0  0  0  0  0 0  0  0  0
   ...   1  2  1  0
   ...   1  3  1  0
   ...   1  4  1  1
   ...   1  5  1  0
   ... M  END)
 m
   rdkit.Chem.rdchem.Mol object at 0x1240980
   * m.GetNumConformers()**
   **1*
 Chem.MolToMolBlock(m)
   'chiral1.mol\n RDKit  2D\n\n  5  4  0  0  0  0  0 0  0 
   0999 V2000\n   -0.01410.05530. C   0  0  0  0 0  0  0 
   0  0  0  0  0\n0.81090.05530. F   0  0 0  0  0  0 
   0  0  0  0  0  0\n   -0.42660.76970. Br 0  0  0  0  0 
   0  0  0  0  0  0  0\n   -0.0141   -0.7697 0. Cl  0  0  0  0  0 
   0  0  0  0  0  0  0\n   -0.8109 -0.15830. C   0  0  0  0  0 
   0  0  0  0  0  0  0\n  1 2  1  6\n  1  3  1  0\n  1  4  1  0\n  1 
   5  1  0\nM  END\n'

 quit()


In the PG cartridge I lose the conformer of the input. My implementation 
looks like this:


rdkit_io.c:

   PG_FUNCTION_INFO_V1(mol_to_ctab);
   Datum   mol_to_ctab(PG_FUNCTION_ARGS);
   Datum
   mol_to_ctab(PG_FUNCTION_ARGS) {
  CROMol  mol;
  char*str;
  int len;

  fcinfo-flinfo-fn_extra = SearchMolCache(
   fcinfo-flinfo-fn_extra,
   fcinfo-flinfo-fn_mcxt,
   PG_GETARG_DATUM(0),
NULL, mol, NULL);

  bool createDepictionIfMissing = PG_GETARG_BOOL(1);
  str = makeCtabText(mol, len, createDepictionIfMissing);

  PG_RETURN_CSTRING( pnstrdup(str, len) );
   }


adapter.cpp:

   extern C char *
   makeCtabText(CROMol data, int *len, bool createDepictionIfMissing) {
  ROMol *mol = (ROMol*)data;

  try {
ereport(NOTICE,
(errcode(ERRCODE_SUCCESSFUL_COMPLETION),
 errmsg(mol conformer count = %d,
   mol-getNumConformers(;

if (createDepictionIfMissing  mol-getNumConformers() == 0) {
  RDDepict::compute2DCoords(*mol);
}
StringData = MolToMolBlock(*mol);
  } catch (...) {
ereport(WARNING,
(errcode(ERRCODE_WARNING),
 errmsg(makeCtabText: problems converting molecule to
   CTAB)));
StringData=;
  }

  *len = StringData.size();
  return (char*)StringData.c_str();
   }


If I run the Python example equivalent from psql:

   postgres=# select mol_to_ctab(mol_from_ctab('chiral1.mol
  ChemDraw04200416412D

  5  4  0  0  0  0  0  0  0  0999 V2000
   -0.01410.05530. C   0  0  0  0  0  0  0  0  0 0  0  0
0.81090.05530. F   0  0  0  0  0  0  0  0  0 0  0  0
   -0.42660.76970. Br  0  0  0  0  0  0  0  0  0 0  0  0
   -0.0141   -0.76970. Cl  0  0  0  0  0  0  0  0  0 0  0  0
   -0.8109   -0.15830. C   0  0  0  0  0  0  0  0  0 0  0  0
  1  2  1  0
  1  3  1  0
  1  4  1  1
  1  5  1  0
   M  END', false));
   *NOTICE:  mol conformer count = 0*
   mol_to_ctab
   ---
   +
  RDKit 2D   +
   +
   5  4  0  0  0  0  0  0  0  0999 V2000  +
 0.0.0. C   0  0  0  0  0  0  0  0  0 0  0  0+
-1.50000.0. F   0  0  0  0  0  0  0  0  0 0  0  0+
-0.   -1.50000. Br  0  0  0  0  0  0  0  0  0 0  0  0+
 0.1.50000. Cl  0  0  0  0  0  0  0  0  0 0  0  0+
 1.50000.0. C   0  0  0  0  0  0  0  0  0 0  0  0+
   1  2  1 6 +
   1  3  1 0 +
   1  4  1 0 +
   1  5  1 0 +
 M END +

   (1 row)

   postgres=#

Something I missed about querying a mol for conformers ? As of now I 
lose the input conformer and the code will always output a 
calculated-from-scratch depiction.


Cheers
-- Jan

--
Subversion Kills Productivity. Get off Subversion  Make the Move to 

Re: [Rdkit-discuss] SMARTS/SMARTS and SMILES/SMARTS substructure matching

2014-03-05 Thread Christos Kannas
Hi Greg,

Thanks a lot for the explanation.
It makes things clearer now.
Well the reason I'm doing SMARTS-SMARTS match is because I would like to
match functional groups with the reactants in reactions.

Regards,

Christos

Christos Kannas

Researcher
Ph.D Student

Mob (UK): +44 (0) 7447700937
Mob (Cyprus): +357 99530608

[image: View Christos Kannas's profile on
LinkedIn]http://cy.linkedin.com/in/christoskannas


On 5 March 2014 04:44, Greg Landrum greg.land...@gmail.com wrote:

 Hi Christos,


 On Tue, Mar 4, 2014 at 3:46 PM, Christos Kannas chriskan...@gmail.comwrote:

 Hi all,

 Why does the following happen?

 In [1]: from rdkit import Chem
 In [2]: from rdkit.Chem import AllChem
 In [3]: from rdkit.Chem import Draw

 In [4]: patt = Chem.MolFromSmarts([CH;D2;!$(C-[!#6;!#1])]=O)

 In [5]: z2 = Chem.MolFromSmarts([*]-C-C([H])(=O), 1)
 In [6]: print Chem.MolToSmiles(z2)
 [*]CC=O
 In [7]: print Chem.MolToSmarts(z2)
 *-C-[C!H0]=O
 In [9]: z2.HasSubstructMatch(patt)
 Out[9]: False

 In [10]: z3 = Chem.MolFromSmiles(Chem.MolToSmiles(z2))
 In [11]: print Chem.MolToSmiles(z3)
 [*]CC=O
 In [12]: print Chem.MolToSmarts(z3)
 [*]-[#6]-[#6]=[#8]
 In [13]: z3.HasSubstructMatch(patt)
 Out[13]: True

 Shouldn't be that z2 and z3 have the same information?


 The way SMARTS/SMARTS matches is handled is different than the way
 SMARTS/SMILES matches works.
 The short answer is that when doing a SMARTS/SMARTS match, the RDKit
 compares the queries to each other; when doing a SMARTS/SMILES match, on
 the other hand, it checks to see if the atoms in the SMILES molecule match
 the queries in the SMARTS molecule.

 A bit longer answer:
 Molecules built using MolFromSmiles contain Atoms, molecules built using
 MolFromSmarts contain QueryAtoms. Both atoms and QueryAtoms have a Match()
 method that takes another Atom or QueryAtom as an argument and returns
 whether or not the two match.
 The substructure matching code makes heavy use of this Match() method.
 QueryAtom.Match(Atom) checks to see if the Atom satisfies the query.
 QueryAtom.Match(QueryAtom) checks to see if the queries on the atoms are
 the same. This uses a crude approach that is easy to fool, but I assume
 that a SMARTS-SMARTS match is not a frequent thing someone wants to do.
 query-query matching is also not a particularly easy problem to solve in a
 general way.

 -greg



--
Subversion Kills Productivity. Get off Subversion  Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Two nitrogens in a 5 membered ring

2014-03-05 Thread Toby Wright
Thanks all for informative and helpful responses, the behaviour I was
struggling to understand now makes perfect sense.

Toby Wright

--
InhibOx Ltd


On 4 March 2014 04:06, Greg Landrum greg.land...@gmail.com wrote:

 Bob hit the nail on the head.

 The first case, N1N=CC=C1, is aromatic because the RDKit sees that the
 first nitrogen has two bonds to it, assigns a hydrogen, and then sees a
 conjugated pi system with 6 electrons that is flagged as aromatic.
 Something similar would happen with the aromatic form [nH]1nccc1: first the
 ring system is kekulized to yield N1N=CC=C1, then the sanitization proceeds
 from there. The same thing would happen with the equivalent n1[nH]ccc1.

 The second case, N1=NC=CC1, has a C (the last one) that only has single
 bonds to it. This is assigned sp3 hybridization, so there's no conjugated
 ring system for aromaticity to be perceived in.

 The final case, n1nccc1, is an instance of the pyrrole problem: aromatic
 N's that need an implicit H on them, should have that implicit H present in
 the aromatic SMILES.

  -greg




 On Mon, Mar 3, 2014 at 5:59 PM, Bob Funchess bfunch...@kelaroo.comwrote:

 Hi Toby,



 I'd say it's more of a limitation inherent in Kekule representations than
 an actual bug in RDKit.  Trying to get too clever in figuring out what
 the user meant usually causes more harm than good.



 I'm not sure what version of RDKit you're using, but the aromatic
 specification with an explicit hydrogen on one of the nitrogen atoms works
 for me:



  Chem.MolFromSmiles('n1[nH]ccc1').Debug();

 Atoms:

 0 7 N chg: 0  deg: 2 exp: 3 imp: 0 hyb: 3 arom?: 1 chi: 0

 1 7 N chg: 0  deg: 2 exp: 3 imp: 0 hyb: 3 arom?: 1 chi: 0

 2 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0

 3 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0

 4 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0

 Bonds:

 0 0-1 order: 12 conj?: 1 aromatic?: 1

 1 1-2 order: 12 conj?: 1 aromatic?: 1

 2 2-3 order: 12 conj?: 1 aromatic?: 1

 3 3-4 order: 12 conj?: 1 aromatic?: 1

 4 4-0 order: 12 conj?: 1 aromatic?: 1



 The double bonds in the Kekule representations here can be between atom
 pairs 1,2 and 3,4 or between atom pairs 2,3 and 4,0.  Putting one between
 pair 0,1 leaves atom 4 with two single bonds to it (and therefore, to
 satisfy valence requirements, two implicit hydrogens); I'm not horribly
 surprised that RDKit perceives that as aliphatic.  You can see that's
 what's happening in your second example where the hybridization of atom 4
 is 4 (sp3) instead of 3 (sp2).



 Regards,

 Bob



 --

 Bob Funchess, Ph.D.
 Kelaroo, Inc

 Senior Scientist
 www.kelaroo.com

 bfunch...@kelaroo.com (858)
 259-7561 x3



 --
 Subversion Kills Productivity. Get off Subversion  Make the Move to
 Perforce.
 With Perforce, you get hassle-free workflows. Merge that actually works.
 Faster operations. Version large binaries.  Built-in WAN optimization and
 the
 freedom to use Git, Perforce or both. Make the move to Perforce.

 http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Subversion Kills Productivity. Get off Subversion  Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Pg cartridge - mol_to_ctab() and trouble with conformers.

2014-03-05 Thread Greg Landrum
Hi Jan,

The below behavior is the result of a bug (
https://github.com/rdkit/rdkit/issues/229).
mol_from_ctab() takes an (undocumented) optional argument that is supposed
to determine whether or not the molecule's conformation is stored in the
database. The default is to not store the conformation; this reduces the
size of the database and the speed at which molecules are depickled. The
bug is that even if you try to keep the conformation the argument is
ignored and the conformation is discarded.

I'll get this fixed tomorrow morning. Alternatively, if you want to fix it
now, the change just needs to be made in the definition of mol_from_ctab()
in rdkit_io.c

-greg




On Wed, Mar 5, 2014 at 10:27 AM, Jan Holst Jensen j...@biochemfusion.comwrote:

  Hi,

 About ready to push a changeset for implementing mol_to_ctab(), but I
 would like it to play nice and preserve input depictions.

 Ideally I would like the following

 select mol_to_ctab(mol_from_ctab(input-molfile));

 to output a molfile where the coordinates of input-molfile are preserved.

 If I do that in Python it works:

  from rdkit import Chem
  m = Chem.MolFromMolBlock(chiral1.mol
 ...   ChemDraw04200416412D
 ...
 ...   5  4  0  0  0  0  0  0  0  0999 V2000
 ...-0.01410.05530. C   0  0  0  0  0  0  0  0  0  0  0  0
 ... 0.81090.05530. F   0  0  0  0  0  0  0  0  0  0  0  0
 ...-0.42660.76970. Br  0  0  0  0  0  0  0  0  0  0  0  0
 ...-0.0141   -0.76970. Cl  0  0  0  0  0  0  0  0  0  0  0  0
 ...-0.8109   -0.15830. C   0  0  0  0  0  0  0  0  0  0  0  0
 ...   1  2  1  0
 ...   1  3  1  0
 ...   1  4  1  1
 ...   1  5  1  0
 ... M  END)
  m
 rdkit.Chem.rdchem.Mol object at 0x1240980
 * m.GetNumConformers()*
 *1*
  Chem.MolToMolBlock(m)
 'chiral1.mol\n RDKit  2D\n\n  5  4  0  0  0  0  0  0  0  0999
 V2000\n   -0.01410.05530. C   0  0  0  0  0  0  0  0  0  0  0
 0\n0.81090.05530. F   0  0  0  0  0  0  0  0  0  0  0
 0\n   -0.42660.76970. Br  0  0  0  0  0  0  0  0  0  0  0
 0\n   -0.0141   -0.76970. Cl  0  0  0  0  0  0  0  0  0  0  0
 0\n   -0.8109   -0.15830. C   0  0  0  0  0  0  0  0  0  0  0  0\n
 1  2  1  6\n  1  3  1  0\n  1  4  1  0\n  1  5  1  0\nM  END\n'
  quit()


 In the PG cartridge I lose the conformer of the input. My implementation
 looks like this:

 rdkit_io.c:

 PG_FUNCTION_INFO_V1(mol_to_ctab);
 Datum   mol_to_ctab(PG_FUNCTION_ARGS);
 Datum
 mol_to_ctab(PG_FUNCTION_ARGS) {
   CROMol  mol;
   char*str;
   int len;

   fcinfo-flinfo-fn_extra = SearchMolCache(
 fcinfo-flinfo-fn_extra,
 fcinfo-flinfo-fn_mcxt,
 PG_GETARG_DATUM(0),
 NULL, mol, NULL);

   bool createDepictionIfMissing = PG_GETARG_BOOL(1);
   str = makeCtabText(mol, len, createDepictionIfMissing);

   PG_RETURN_CSTRING( pnstrdup(str, len) );
 }


 adapter.cpp:

 extern C char *
 makeCtabText(CROMol data, int *len, bool createDepictionIfMissing) {
   ROMol *mol = (ROMol*)data;

   try {
 ereport(NOTICE,
 (errcode(ERRCODE_SUCCESSFUL_COMPLETION),
  errmsg(mol conformer count = %d, mol-getNumConformers(;

 if (createDepictionIfMissing  mol-getNumConformers() == 0) {
   RDDepict::compute2DCoords(*mol);
 }
 StringData = MolToMolBlock(*mol);
   } catch (...) {
 ereport(WARNING,
 (errcode(ERRCODE_WARNING),
  errmsg(makeCtabText: problems converting molecule to
 CTAB)));
 StringData=;
   }

   *len = StringData.size();
   return (char*)StringData.c_str();
 }


 If I run the Python example equivalent from psql:

 postgres=# select mol_to_ctab(mol_from_ctab('chiral1.mol
   ChemDraw04200416412D

   5  4  0  0  0  0  0  0  0  0999 V2000
-0.01410.05530. C   0  0  0  0  0  0  0  0  0  0  0  0
 0.81090.05530. F   0  0  0  0  0  0  0  0  0  0  0  0
-0.42660.76970. Br  0  0  0  0  0  0  0  0  0  0  0  0
-0.0141   -0.76970. Cl  0  0  0  0  0  0  0  0  0  0  0  0
-0.8109   -0.15830. C   0  0  0  0  0  0  0  0  0  0  0  0
   1  2  1  0
   1  3  1  0
   1  4  1  1
   1  5  1  0
 M  END', false));
 *NOTICE:  mol conformer count = 0*
   mol_to_ctab
 ---
   +
   RDKit  2D   +
   +
5  4  0  0  0  0  0  0  0  0999 V2000  +
  0.0.0. C   0  0  0  0  0  0  0  0  0  0  0  0+
 -1.50000.0. F   0  0  0  0  0  0  0  0  0  0  0  0+
 -0.   -1.50000. 

Re: [Rdkit-discuss] SMARTS/SMARTS and SMILES/SMARTS substructure matching

2014-03-05 Thread Toby Wright
Hi,

This is probably related to the above so I thought I'd post it on this
thread. I am noticing inconsistent behaviour when a molecule created via
SMARTS that contains an 'or' statement has HasSubstructMatch called on it,
as opposed to it being the argument to HasSubstructMatch. A simple example
follows:

 O_or_C = Chem.MolFromSmarts('[O,C]')
 O = Chem.MolFromSmiles('O')
 C = Chem.MolFromSmiles('C')
 O_or_C.HasSubstructMatch(O)
True
 O_or_C.HasSubstructMatch(C)
False
 O.HasSubstructMatch(O_or_C)
True
 C.HasSubstructMatch(O_or_C)
True

We also see:
 C_or_O = Chem.MolFromSmarts('[C,O]')
 C_or_O.HasSubstructMatch(O)
False
 C_or_O.HasSubstructMatch(C)
True

so the order of elements in a SMARTS 'or' statement changes the behaviour,
which is unexpected.

Yours,

Toby Wright

--
InhibOx Ltd


On 5 March 2014 10:10, Christos Kannas chriskan...@gmail.com wrote:

 Hi Greg,

 Thanks a lot for the explanation.
 It makes things clearer now.
 Well the reason I'm doing SMARTS-SMARTS match is because I would like to
 match functional groups with the reactants in reactions.

 Regards,

 Christos

 Christos Kannas

 Researcher
 Ph.D Student

 Mob (UK): +44 (0) 7447700937
 Mob (Cyprus): +357 99530608

 [image: View Christos Kannas's profile on 
 LinkedIn]http://cy.linkedin.com/in/christoskannas


 On 5 March 2014 04:44, Greg Landrum greg.land...@gmail.com wrote:

 Hi Christos,


 On Tue, Mar 4, 2014 at 3:46 PM, Christos Kannas chriskan...@gmail.comwrote:

 Hi all,

 Why does the following happen?

 In [1]: from rdkit import Chem
 In [2]: from rdkit.Chem import AllChem
 In [3]: from rdkit.Chem import Draw

 In [4]: patt = Chem.MolFromSmarts([CH;D2;!$(C-[!#6;!#1])]=O)

 In [5]: z2 = Chem.MolFromSmarts([*]-C-C([H])(=O), 1)
 In [6]: print Chem.MolToSmiles(z2)
 [*]CC=O
 In [7]: print Chem.MolToSmarts(z2)
 *-C-[C!H0]=O
 In [9]: z2.HasSubstructMatch(patt)
 Out[9]: False

 In [10]: z3 = Chem.MolFromSmiles(Chem.MolToSmiles(z2))
 In [11]: print Chem.MolToSmiles(z3)
 [*]CC=O
 In [12]: print Chem.MolToSmarts(z3)
 [*]-[#6]-[#6]=[#8]
 In [13]: z3.HasSubstructMatch(patt)
 Out[13]: True

 Shouldn't be that z2 and z3 have the same information?


 The way SMARTS/SMARTS matches is handled is different than the way
 SMARTS/SMILES matches works.
  The short answer is that when doing a SMARTS/SMARTS match, the RDKit
 compares the queries to each other; when doing a SMARTS/SMILES match, on
 the other hand, it checks to see if the atoms in the SMILES molecule match
 the queries in the SMARTS molecule.

 A bit longer answer:
 Molecules built using MolFromSmiles contain Atoms, molecules built using
 MolFromSmarts contain QueryAtoms. Both atoms and QueryAtoms have a Match()
 method that takes another Atom or QueryAtom as an argument and returns
 whether or not the two match.
 The substructure matching code makes heavy use of this Match() method.
 QueryAtom.Match(Atom) checks to see if the Atom satisfies the query.
 QueryAtom.Match(QueryAtom) checks to see if the queries on the atoms are
 the same. This uses a crude approach that is easy to fool, but I assume
 that a SMARTS-SMARTS match is not a frequent thing someone wants to do.
 query-query matching is also not a particularly easy problem to solve in a
 general way.

 -greg






 --
 Subversion Kills Productivity. Get off Subversion  Make the Move to
 Perforce.
 With Perforce, you get hassle-free workflows. Merge that actually works.
 Faster operations. Version large binaries.  Built-in WAN optimization and
 the
 freedom to use Git, Perforce or both. Make the move to Perforce.

 http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Subversion Kills Productivity. Get off Subversion  Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Pg cartridge - mol_to_ctab() and trouble with conformers.

2014-03-05 Thread Jan Holst Jensen

Hi Greg,

Thanks for the explanation.

I added this to rdkit_io.c in mol_from_ctab():

+  bool keepConformer = PG_GETARG_BOOL(1);
-  mol = parseMolCTAB(data,false,true);
+  mol = parseMolCTAB(data,keepConformer,true);

and then I can get the expected behavior and have my tests complete 
successfully. Yes :-).


I will go ahead and create a pull request for mol_to_ctab(). The tests 
for mol_to_ctab() will assume that mol_from_ctab() uses the optional 
parameter to keep the conformer.


Cheers
-- Jan

On 2014-03-05 13:25, Greg Landrum wrote:

Hi Jan,

The below behavior is the result of a bug 
(https://github.com/rdkit/rdkit/issues/229).
mol_from_ctab() takes an (undocumented) optional argument that is 
supposed to determine whether or not the molecule's conformation is 
stored in the database. The default is to not store the conformation; 
this reduces the size of the database and the speed at which molecules 
are depickled. The bug is that even if you try to keep the 
conformation the argument is ignored and the conformation is discarded.


I'll get this fixed tomorrow morning. Alternatively, if you want to 
fix it now, the change just needs to be made in the definition of 
mol_from_ctab() in rdkit_io.c


-greg




On Wed, Mar 5, 2014 at 10:27 AM, Jan Holst Jensen 
j...@biochemfusion.com mailto:j...@biochemfusion.com wrote:


Hi,

About ready to push a changeset for implementing mol_to_ctab(),
but I would like it to play nice and preserve input depictions.

Ideally I would like the following

select mol_to_ctab(mol_from_ctab(input-molfile));

to output a molfile where the coordinates of input-molfile are
preserved.

If I do that in Python it works:

 from rdkit import Chem
 m = Chem.MolFromMolBlock(chiral1.mol
...   ChemDraw04200416412D
...
...   5  4  0  0  0  0  0  0  0  0999 V2000
...-0.01410.05530. C   0  0  0  0 0  0  0  0 
0  0  0  0
... 0.81090.05530. F   0  0  0  0 0  0  0  0 
0  0  0  0
...-0.42660.76970. Br  0  0  0  0 0  0  0  0 
0  0  0  0
...-0.0141   -0.76970. Cl  0  0  0  0 0  0  0  0 
0  0  0  0
...-0.8109   -0.15830. C   0  0  0  0 0  0  0  0 
0  0  0  0

...   1  2  1  0
...   1  3  1  0
...   1  4  1  1
...   1  5  1  0
... M  END)
 m
rdkit.Chem.rdchem.Mol object at 0x1240980
* m.GetNumConformers()**
**1*
 Chem.MolToMolBlock(m)
'chiral1.mol\n RDKit  2D\n\n  5  4  0 0  0  0  0 
0  0  0999 V2000\n   -0.01410.0553 0. C   0  0  0  0 
0  0  0  0  0  0  0  0\n 0.81090.05530. F   0  0 
0  0  0  0  0  0 0  0  0  0\n   -0.42660.76970.

Br  0  0 0  0  0  0  0  0  0  0  0  0\n   -0.0141   -0.7697
0. Cl  0  0  0  0  0  0  0  0  0  0  0  0\n -0.8109  
-0.15830. C   0  0  0  0  0  0  0 0  0  0  0  0\n  1 
2  1  6\n  1  3  1  0\n  1  4  1 0\n  1  5  1  0\nM  END\n'

 quit()


In the PG cartridge I lose the conformer of the input. My
implementation looks like this:

rdkit_io.c:

PG_FUNCTION_INFO_V1(mol_to_ctab);
Datum   mol_to_ctab(PG_FUNCTION_ARGS);
Datum
mol_to_ctab(PG_FUNCTION_ARGS) {
  CROMol  mol;
  char*str;
  int len;

  fcinfo-flinfo-fn_extra = SearchMolCache(
fcinfo-flinfo-fn_extra,
fcinfo-flinfo-fn_mcxt,
PG_GETARG_DATUM(0),
NULL, mol, NULL);

  bool createDepictionIfMissing = PG_GETARG_BOOL(1);
  str = makeCtabText(mol, len, createDepictionIfMissing);

  PG_RETURN_CSTRING( pnstrdup(str, len) );
}


adapter.cpp:

extern C char *
makeCtabText(CROMol data, int *len, bool
createDepictionIfMissing) {
  ROMol *mol = (ROMol*)data;

  try {
ereport(NOTICE,
(errcode(ERRCODE_SUCCESSFUL_COMPLETION),
 errmsg(mol conformer count = %d,
mol-getNumConformers(;

if (createDepictionIfMissing  mol-getNumConformers() ==
0) {
  RDDepict::compute2DCoords(*mol);
}
StringData = MolToMolBlock(*mol);
  } catch (...) {
ereport(WARNING,
(errcode(ERRCODE_WARNING),
 errmsg(makeCtabText: problems converting
molecule to CTAB)));
StringData=;
  }

  *len = StringData.size();
  return (char*)StringData.c_str();
}


If I run the Python example equivalent from psql:

postgres=# select mol_to_ctab(mol_from_ctab('chiral1.mol
  ChemDraw04200416412D

  5  4  0  0  0  0  0  0  0  0999 V2000
   -0.0141  

Re: [Rdkit-discuss] Pg cartridge - mol_to_ctab() and trouble with conformers.

2014-03-05 Thread Greg Landrum
Thanks Jan.
The fix and pull request have both been integrated.

-greg



On Wed, Mar 5, 2014 at 7:35 PM, Jan Holst Jensen j...@biochemfusion.comwrote:

  Hi Greg,

 Thanks for the explanation.

 I added this to rdkit_io.c in mol_from_ctab():

 +  bool keepConformer = PG_GETARG_BOOL(1);
 -  mol = parseMolCTAB(data,false,true);
 +  mol = parseMolCTAB(data,keepConformer,true);

 and then I can get the expected behavior and have my tests complete
 successfully. Yes :-).

 I will go ahead and create a pull request for mol_to_ctab(). The tests for
 mol_to_ctab() will assume that mol_from_ctab() uses the optional parameter
 to keep the conformer.

 Cheers
 -- Jan


 On 2014-03-05 13:25, Greg Landrum wrote:

 Hi Jan,

  The below behavior is the result of a bug (
 https://github.com/rdkit/rdkit/issues/229).
 mol_from_ctab() takes an (undocumented) optional argument that is supposed
 to determine whether or not the molecule's conformation is stored in the
 database. The default is to not store the conformation; this reduces the
 size of the database and the speed at which molecules are depickled. The
 bug is that even if you try to keep the conformation the argument is
 ignored and the conformation is discarded.

  I'll get this fixed tomorrow morning. Alternatively, if you want to fix
 it now, the change just needs to be made in the definition of
 mol_from_ctab() in rdkit_io.c

  -greg




 On Wed, Mar 5, 2014 at 10:27 AM, Jan Holst Jensen 
 j...@biochemfusion.comwrote:

  Hi,

 About ready to push a changeset for implementing mol_to_ctab(), but I
 would like it to play nice and preserve input depictions.

 Ideally I would like the following

 select mol_to_ctab(mol_from_ctab(input-molfile));

 to output a molfile where the coordinates of input-molfile are
 preserved.

 If I do that in Python it works:

  from rdkit import Chem
  m = Chem.MolFromMolBlock(chiral1.mol
 ...   ChemDraw04200416412D
 ...
 ...   5  4  0  0  0  0  0  0  0  0999 V2000
 ...-0.01410.05530. C   0  0  0  0  0  0  0  0  0  0  0  0
 ... 0.81090.05530. F   0  0  0  0  0  0  0  0  0  0  0  0
 ...-0.42660.76970. Br  0  0  0  0  0  0  0  0  0  0  0  0
 ...-0.0141   -0.76970. Cl  0  0  0  0  0  0  0  0  0  0  0  0
 ...-0.8109   -0.15830. C   0  0  0  0  0  0  0  0  0  0  0  0
 ...   1  2  1  0
 ...   1  3  1  0
 ...   1  4  1  1
 ...   1  5  1  0
 ... M  END)
  m
 rdkit.Chem.rdchem.Mol object at 0x1240980
 * m.GetNumConformers()*
 *1*
  Chem.MolToMolBlock(m)
 'chiral1.mol\n RDKit  2D\n\n  5  4  0  0  0  0  0  0  0  0999
 V2000\n   -0.01410.05530. C   0  0  0  0  0  0  0  0  0  0  0
 0\n0.81090.05530. F   0  0  0  0  0  0  0  0  0  0  0
 0\n   -0.42660.76970. Br  0  0  0  0  0  0  0  0  0  0  0
 0\n   -0.0141   -0.76970. Cl  0  0  0  0  0  0  0  0  0  0  0
 0\n   -0.8109   -0.15830. C   0  0  0  0  0  0  0  0  0  0  0  0\n
 1  2  1  6\n  1  3  1  0\n  1  4  1  0\n  1  5  1  0\nM  END\n'
  quit()


 In the PG cartridge I lose the conformer of the input. My implementation
 looks like this:

 rdkit_io.c:

 PG_FUNCTION_INFO_V1(mol_to_ctab);
 Datum   mol_to_ctab(PG_FUNCTION_ARGS);
 Datum
 mol_to_ctab(PG_FUNCTION_ARGS) {
   CROMol  mol;
   char*str;
   int len;

   fcinfo-flinfo-fn_extra = SearchMolCache(
 fcinfo-flinfo-fn_extra,
 fcinfo-flinfo-fn_mcxt,
 PG_GETARG_DATUM(0),
 NULL, mol, NULL);

   bool createDepictionIfMissing = PG_GETARG_BOOL(1);
   str = makeCtabText(mol, len, createDepictionIfMissing);

   PG_RETURN_CSTRING( pnstrdup(str, len) );
 }


 adapter.cpp:

 extern C char *
 makeCtabText(CROMol data, int *len, bool createDepictionIfMissing) {
   ROMol *mol = (ROMol*)data;

   try {
 ereport(NOTICE,
 (errcode(ERRCODE_SUCCESSFUL_COMPLETION),
  errmsg(mol conformer count = %d,
 mol-getNumConformers(;

 if (createDepictionIfMissing  mol-getNumConformers() == 0) {
   RDDepict::compute2DCoords(*mol);
 }
 StringData = MolToMolBlock(*mol);
   } catch (...) {
 ereport(WARNING,
 (errcode(ERRCODE_WARNING),
  errmsg(makeCtabText: problems converting molecule to
 CTAB)));
 StringData=;
   }

   *len = StringData.size();
   return (char*)StringData.c_str();
 }


 If I run the Python example equivalent from psql:

 postgres=# select mol_to_ctab(mol_from_ctab('chiral1.mol
   ChemDraw04200416412D

   5  4  0  0  0  0  0  0  0  0999 V2000
-0.01410.05530. C   0  0  0  0  0  0  0  0  0  0  0  0
 0.81090.05530. F   0  0  0  0  0  0  0  0  0  0  0  0
-0.42660.76970. Br  0  0  0  0  0  0  0  0  0  0  0  0
-0.0141   -0.76970. Cl  0  0  0  0  0  0  0  0  0  0  0  0
-0.8109   -0.15830. C   0