Re: [Rdkit-discuss] Error: free(): double free detected in tcache 2 (going from pickles to mol)

2022-12-29 Thread Jason Biggs
Jason Biggs



On Thu, Dec 29, 2022 at 3:22 PM Jarod Younker 
wrote:

> The code compiles fine, but I’m running into a memory deallocation issue
> in ‘Dummy’ if I use ROMol derived from pickled strings:
>
>
>
> Error: free(): double free detected in tcache 2
>
>
>
> If, however, I go from pickle to SMILES and then to ROMol there is no
> memory error.  Somewhere, I’m deallocating memory twice.  Any help would be
> appreciated.
>
>
>
> std::string MolPickle2Smiles(std::string& pickle) {
>
>
>
>   RDKit::ROMol molecule;
>
>   RDKit::MolPickler::molFromPickle(pickle,molecule);
>
>   return RDKit::MolToSmiles(molecule);
>
>
>
> }
>
>
>
> bool Dummy(std::string& pickle1, std::string& pickle2) {
>
>
>
>   RDKit::MOL_SPTR_VECT reacts;
>
>   reacts.clear();
>
>
>
>   //THIS DOESN’T WORK
>
>   RDKit::ROMol molecule;
>

By declaring the 'molecule' in that way you ensure that it will be deleted
automatically when the function exits.  To create an ROMol that persists
after the Dummy function exits you need the 'new' keyword.  Even better is
to start with a smart pointer out of the gate:

bool Dummy(std::string& pickle1) {
  RDKit::MOL_SPTR_VECT reacts;
  reacts.clear();
  RDKit::ROMOL_SPTR mptr;
  RDKit::MolPickler::molFromPickle(pickle1, mptr.get());
  reacts.push_back(mptr);
  return 0;
}

Hope that helps
Jason


>   RDKit::MolPickler::molFromPickle(pickle1,molecule);
>
>   reacts.push_back(RDKit::ROMOL_SPTR());
>
>
>
>   //THIS WORKS
>
>
> //reacts.push_back(RDKit::ROMOL_SPTR(RDKit::SmilesToMol(MolPickle2Smiles(pickle1;
>
>
>
>   return 0;
>
>
>
> }
>
> Sent from my iPhone
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] two molecules in mol object

2021-03-10 Thread Jason Biggs
Have you looked at GetMolFrags?

if you define
>m=MolFromSmiles('CCC.CC')

then

>rdmolops.GetMolFrags(m,asMols=True)

returns two molecules, keeping any coordinates or non-computed properties
in the process.

Jason Biggs



On Wed, Mar 10, 2021 at 9:42 AM Shani Zev  wrote:

> Hi all,
> I have mol object that contains two molecules, I want to delete one of the
> molecules but without going through smiles (if I will go through smiles it
> could be easy but I need to do it directly for the mol object.)
> There is a way to get to each molecule while they are in the same mol
> object?
> just for a simple example:
> [image: image.png]
> I finally want to write from mol object to coordinates file and when I
> have it together it overlapping.
> thank you very much,
> Shani
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] explicit H atoms

2021-03-09 Thread Jason Biggs
Jason Biggs



On Tue, Mar 9, 2021 at 10:49 AM Maciek Wójcikowski 
wrote:

> Hi Jean-Marc,
>
> I know you can draw them, but both SMILES and RDKit internally use two
> bonds (up/down) directions to assign the bond stereo, which means that
> there are not
>

Fortunately the RDKit also allows you to set bond stereo without using the
bond directions, by setting the bond stereo flag to stereocis or
stereotrans and then setting the stereoAtoms to say which atoms are cis or
trans.  Using bond directions for double bond stereo has always seemed
rather archaic and confusing to me.

Jason



> enough bonds to define both double bonds configuration and have the middle
> one undefined at the same time.
> 
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
>
>
> wt., 9 mar 2021 o 13:14 Jean-Marc Nuzillard 
> napisał(a):
>
>> Hi Maciek,
>>
>> I would find your example rather readable even without explicit H atoms.
>>
>>
>>
>> I drew it like that because I do not have the wavy wedge at hand.
>>
>> Thanks for your proposal,
>> Best,
>>
>> Jean-Marc
>>
>>
>>
>> Le 09/03/2021 à 11:26, Maciek Wójcikowski a écrit :
>>
>> Hi,
>>
>> I'd say that for a tetrahedral stereo that is possible to remove all of
>> Hs. But for double bonds it might not be as easy, or impossible for some
>> edge cases - conjugated double bonds in particular. Here is one:
>>
>> [image: image.png]
>> [H]\C(=C/C)C=C\C([H])=C\C
>>
>>
>> 
>> Pozdrawiam,  |  Best regards,
>> Maciek Wójcikowski
>> mac...@wojcikowski.pl
>>
>>
>> wt., 9 mar 2021 o 10:47 Paul Emsley 
>> napisał(a):
>>
>>> On 09/03/2021 09:01, Jean-Marc Nuzillard wrote:
>>> > Sure, testosterone may be drawn as
>>> > [snip]
>>>
>>> OK :-)
>>>
>>> That's a top quality rendering by the way. How did you make it?
>>>
>>> Paul.
>>>
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
>>
>> ___
>> Rdkit-discuss mailing 
>> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>>
>> --
>> Jean-Marc Nuzillard
>> Directeur de Recherches au CNRS
>>
>> Institut de Chimie Moléculaire de Reims
>> CNRS UMR 7312
>> Moulin de la Housse
>> CPCBAI, Bâtiment 18
>> BP 1039
>> 51687 REIMS Cedex 2
>> France
>>
>> Tel : 03 26 91 82 10
>> Fax : 03 26 91 31 
>> 66http://www.univ-reims.fr/icmrhttp://eos.univ-reims.fr/LSD/CSNteam.html
>> http://www.univ-reims.fr/LSD/http://www.univ-reims.fr/LSD/JmnSoft/
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] sanitization converts "I(=O)(=O)[O-]" into "[O-][I+2]([O-])[O-]"

2021-01-21 Thread Jason Biggs
The RDKit will always convert iodate from the form on the left, with an
expanded octet on iodine and a single negative charge, into the form on the
right with all single bonds and a charge on every atom (image here
https://i.stack.imgur.com/hq3St.png).  This happens no matter how I import
the molecule, from SMILES or from a file.  The only way to avoid it is to
skip sanitization, which I'd rather avoid.

Is this the desired behavior?

Thanks
Jason

[image: image.png]
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RWMol, ROMol, and static casting

2021-01-06 Thread Jason Biggs
I have learned what C++ I know in order to build a link to rdkit from my
own system, so the following may be more of a general coding question than
anything rdkit-specific.

I am looking at the python wrappers, and I see many functions return a
ROMol object by static casting the RWMol pointer to a pointer of its base
class.  I believe this line

shows an example.

First, what is the purpose of this upcasting? MolToSmiles could just return
a python RWMol object, if I read the code correctly.  Is this done as a
means of preventing users from shooting themselves in the foot, by making
them create an RWMol first before accessing the add/delete/replace methods?

Second, I see instances in the python wrappers where the ROMol objects are
downcast to RWMol using static_cast, such as is done on this line

for the SanitizeMol wrapper.

Is static_cast'ing from a ROMol to a RWMol always safe?  What I know about
type inheritance in C++ is minimal (it can fit in this email), but I did
not know you could do this.  In reading up on this, I would think you would
have to dynamic_cast and check the result.

Thanks,
Jason
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SA_scores and QED scores questions?

2020-10-29 Thread Jason Biggs
Have you compared the output of debugMol

from your from-smiles ROMol (from test2) versus the ROMol objects you get
from the larger program?  That can help to explain the differences, though
not alway.

Jason



On Wed, Oct 28, 2020 at 5:54 PM Steven Pak 
wrote:

> Hello,
>
> I created two CPP scoring functions based on your QED and SA_score python
> code. I tried implementing it as close as possible to get matching scores.
> Just to see if the code works. I have succeeded, which is exciting! I
> converted the SMILES strings of various molecules into ROMol objects, which
> eventually was inputted into both the python and CPP code, which means both
> cases have the same treatment prior to going into these functions (test 1).
> Unexpectedly, when I integrated the QED/SA calculators (in CPP) into a
> larger program, the ROMol objects that I am inputting do not give me the
> same results as the python or the standalone CPP version I created (test
> 2). This makes sense to me because I am inputting SMILES strings in the
> first test in both programs, while in the second test, I am inputting two
> different inputs into two different situations. So my question is: How are
> SMILES strings being treated or editted when it turns into a RWMol/ROMol
> object? Is there a default treatment to these SMILES when the user turns
> them into ROMol objects? And what are the recommended treatment functions
> you would apply to the RWMol before you convert them into ROMols?
>
>
> Thanks,
> Steven Pak
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Counting Stereo centers in CPP. And [H] in SMILES string

2020-10-01 Thread Jason Biggs
On Wed, Sep 30, 2020 at 9:45 PM Steven Pak 
wrote:

> Hello,
>
> I have been using RDKIT for some time now, and I could not get a certain
> descriptor that would be super useful for my work. Every other descriptor
> works fine, however, for some reason the following calculations always
> gives me 0:
>
> RDKit::Descriptors::numAtomStereoCenters( rdmol );
>

Does it help to add a check like this before getting stereo properties?

if(!rdmol.hasProp(RDKit::common_properties::_StereochemDone)) {
RDKit::MolOps::assignStereochemistry(rdmol);
}


>
> I did convert the in-house mol information into molecular information that
> is compatible with RDKIT, but for some reason it is not giving me any
> information on stereocenters.
>
> Furthermore, when I print out the SMILES strings
> from RDKit::MolToSmiles(rdmol) with explicit H turned off, it still spits
> out the entire SMILE string that has [H] integrated into the string. I
> would like to turn that off, unless explicit H in RDKIT may mean something
> else.
>

MolToSmiles has the option *allHsExplicit*, but that only determines
whether implicit hydrogens are always indicated in counts.  So if you have
a hydrogen-suppressed mol of ethane, that option decides whether you get a
SMILES of "CC" or "[CH3][CH3]".  But if you have an all-atom mol, then it
will not suppress them.  You need to first convert the hydrogen atoms to
implicit before getting the SMILES.

Because I often want to keep hydrogens as explicit vertices, but don't want
to look at them in the SMILES string I use a wrapper function around
MolToSmiles. The wrapper  will create a copy and remove or add explicit
hydrogen atoms, as well as call Kekulize/SetAromaticity, depending on the
options passed in.


Jason


>
> Any help would be appreciated.
>
> Thanks,
> Steven Pak
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Order of linking static libraries

2020-09-14 Thread Jason Biggs
I don't have this problem on mac/windows, but it came up on Linux. Rather
than figure out which library or libraries to link twice I just use the
idiom

--start-group -llib1 -llib2 -llib3 --end-group

Maybe that will help with your issue.


Jason



On Mon, Sep 14, 2020 at 10:46 AM topgunhaides  wrote:

> Hi guys,
>
> I am trying to get an executable that doesn't require an RDKit install to
> run.
> But found it is tricky to figure out the correct order of static libraries
> for linking.
>
> Here is my dynamic linkage (successful):
>
> g++ -o test.exe test.cpp \
> -I$RDBASE/Code -L$RDBASE/lib \
> -lRDKitFileParsers \
> -lRDKitRDGeneral \
> -lRDKitDescriptors \
> -lRDKitDistGeomHelpers \
> -lRDKitForceField \
> -lRDKitForceFieldHelpers \
> -lRDKitMolAlign \
> -lRDKitShapeHelpers \
> -lRDKitGraphMol
>
> My RDKit-related header files in the code:
>
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> However, I still cannot figure out the correct static linkages. Here is
> one example:
>
> g++ -o test.exe test.cpp \
> -I$RDBASE/Code -L$RDBASE/lib \
> -lRDKitForceFieldHelpers_static \
> -lRDKitForceField_static \
> -lRDKitRDGeneral_static \
> -lRDKitShapeHelpers_static \
> -lRDKitDistGeomHelpers_static \
> -lRDKitGraphMol_static \
> -lRDKitMolAlign_static \
> -lRDKitFileParsers_static \
> -lRDKitDescriptors_static
>
> which gave me a bunch of "undefined reference" errors.
>
> I guess that circular dependencies exist. I found some old discussions,
> but the info is limited.
> I mean shall we have documentation to explain those dependencies to help
> establish static linkage?
> Can anyone help me with this? Maybe with a couple of examples? I
> appreciate it!
>
> Best,
> Leon
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] c++ atomic lifetime

2020-08-28 Thread Jason Biggs
Jason Biggs



On Fri, Aug 28, 2020 at 8:16 PM dmaziuk via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> On 8/27/2020 8:48 PM, Jason Biggs wrote:
> >
> > I'm not very familiar with how the python interface works, is there a
> > similar issue with the python wrappers?  Does the wrapper class for the
> > Atom clean up after itself differently if the atom is marked as having an
> > owner?
>
> There Be Dragons.
>
> Python VM does reference counting on its own objects and will destroy
> them for you at some point. Exactly how it works out with objects
> created by external libraries is an interesting question.
>
> SWIG, for example, creates a "proxy" python object for each c++ one,
> with a flag that tells the runtime to either destroy the underlying c++
> object when the "proxy" is garbage-collected, or not. E.g. if you
> garbage-collect an Atom on python side, you have no idea if destroying
> its linked c++ Atom will mess up its c++ ROMol container, so on
> container/element-type objects the flag's typically a "not".
>

This is helpful, thank you.  If I do end up exposing the Atom in top level
(mathematica in my case), then I will need to do something similar with the
wrapper class.

Jason


>
> Dima
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] c++ atomic lifetime

2020-08-27 Thread Jason Biggs
On Thu, Aug 27, 2020 at 4:33 PM David Cosgrove 
wrote:

> Hi Jason,
> The answer is that when you delete the molecule, the memory it uses is
> flagged as available for re-use,  but nothing else happens to it. If you
> then de-reference pointers to it, such as the atoms that are buried in the
> block of memory allocated to the molecule, you may get away with it and you
> may not. It will depend on whether something else has written over the
> memory or not. In your example, the memory was still in its original state,
> so the de-referencing of the atom pointers succeeded. This is not
> guaranteed, however, and this sort of bug is generally very nasty to find-
> sometimes the code will run, sometimes it will crash. Worse still is if you
> accidentally write to de-allocated memory that something else is now using-
> you can then get failures 5 minutes later in a completely different part of
> the program.
>

Thank you David, this really helps.  Thank you also to Dan, Nils, and
Dima.

I knew that accessing that atom after deleting the molecule was bad juju,
and was confused why it worked. I will steer clear of undefined behavior.

In my application, I have a wrapper class that I expose to top level users,
which holds a unique pointer to an ROMol. I know then that when my wrapper
class member goes away so does the ROMol.  What I don't have is a similar
wrapper class for an Atom, precisely because of these ownership questions -
any atom properties or modifications go through the ROMol wrapper class.

I'm not very familiar with how the python interface works, is there a
similar issue with the python wrappers?  Does the wrapper class for the
Atom clean up after itself differently if the atom is marked as having an
owner?  Hope that question isn't too vague

Jason


>
> Deleting the atoms is also an error, because they will be deleted by the
> molecule’s destructor, so you’ll be de-allocating the memory twice, another
> exciting source of undefined behaviour. Valgrind is excellent for tracking
> down these sorts of error, and many more besides.  If you’re developing on
> Linux, it’s good practice to use it on any code before you use that program
> in earnest.
>
> Cheers,
> Dave
>
>
> On Thu, 27 Aug 2020 at 20:17, Jason Biggs  wrote:
>
>> Everything I know about C++ I learned just so that I can write a link
>> between an interpreted language and the rdkit, so there are definitely some
>> gaps in my knowledge.
>>
>> What I'm trying to understand right now is the expected lifetime of an
>> Atom pointer returned by a molecule, for instance by the getAtomWithIdx
>> method.  Based on the documentation, since this method doesn't say the user
>> is responsible for deleting the returned pointer I know I'm not supposed to
>> delete it. But when exactly does it get deleted?  If I dereference it after
>> deleting the molecule, what is it?
>>
>> auto mol = RDKit::SmilesToMol("");
>> auto atom = mol->getAtomWithIdx(0);
>> auto m2 = atom->getOwningMol();
>> std::cout << "Z=" << atom->getAtomicNum() << std::endl;  // prints Z=6
>> delete mol;
>> std::cout << "Z=" << atom->getIdx() << std::endl; // prints Z=0
>> std::cout << "N=" << m2.getNumAtoms() << std::endl;// prints N=4
>> delete atom; // seg fault
>>
>> I would have thought the first time dereferencing the atom pointer after
>> deleting mol would have crashed, but it does not.  I would also have
>> expected bad things when calling the getNumAtoms method on m2 after calling
>> delete on mol, but this also works just fine.  What am I missing?
>>
>> Thanks
>> Jason
>>
>>
>> ___
>>
>> Rdkit-discuss mailing list
>>
>> Rdkit-discuss@lists.sourceforge.net
>>
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] c++ atomic lifetime

2020-08-27 Thread Jason Biggs
Everything I know about C++ I learned just so that I can write a link
between an interpreted language and the rdkit, so there are definitely some
gaps in my knowledge.

What I'm trying to understand right now is the expected lifetime of an Atom
pointer returned by a molecule, for instance by the getAtomWithIdx method.
Based on the documentation, since this method doesn't say the user is
responsible for deleting the returned pointer I know I'm not supposed to
delete it. But when exactly does it get deleted?  If I dereference it after
deleting the molecule, what is it?

auto mol = RDKit::SmilesToMol("");
auto atom = mol->getAtomWithIdx(0);
auto m2 = atom->getOwningMol();
std::cout << "Z=" << atom->getAtomicNum() << std::endl;  // prints Z=6
delete mol;
std::cout << "Z=" << atom->getIdx() << std::endl; // prints Z=0
std::cout << "N=" << m2.getNumAtoms() << std::endl;// prints N=4
delete atom; // seg fault

I would have thought the first time dereferencing the atom pointer after
deleting mol would have crashed, but it does not.  I would also have
expected bad things when calling the getNumAtoms method on m2 after calling
delete on mol, but this also works just fine.  What am I missing?

Thanks
Jason
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] draw molecule without rescaling or translating

2020-07-26 Thread Jason Biggs
Thanks David, that was the missing piece!  I will reach out to you if I
find there is something I want to do that doesn't work with the API for
this class.

Best,
Jason


On Sun, Jul 26, 2020 at 3:48 PM David Cosgrove 
wrote:

> Hi Jason,
> The original design was set up to make it relatively straightforward to
> make your own drawing class, so it's disappointing you aren't finding it
> so.  I guess we've become focussed on getting Cairo and SVG pictures to
> work, and have forgotten to keep this in mind.
> A good place to start might be overriding the virtual functions
> getDrawCoords() and getAtomCoords().  These transform from atom coordinates
> to draw coordinates and vice versa.  You will need to override both, and
> make sure they are consistent.  For reasons that demonstrate that the whole
> drawing system needs re-factoring, the code for setting the scales switches
> between the two coordinate systems in ways that are impossible to defend.
> It's a bit like the human eye - it's where we've ended up, but not how
> you'd design it if you were starting from scratch.  The other thing might
> be to override the non-virtual calculateScale() functions.  There is a
> class data member needs_scale_ which would probably be exactly what you
> want except that it's private so you won't be able to see or alter it in
> your derived class.  That's something we could think about making access
> functions for.
> I hope this helps.  If not, please do get back to me either privately or
> via the list.  It would be useful for future development to find out how
> you get on and what would have made it easier for you.
>
> Best,
> Dave
>
>
>
>
> On Sun, Jul 26, 2020 at 8:37 PM Jason Biggs  wrote:
>
>> I'm trying to use the MolDraw2D class in C++ to generate all the graphics
>> primitives for a molecule, which I then pass into my own graphics engine to
>> make an image.  I'm doing this by making a new class that is a subclass of
>> MolDraw2D, similar to MolDraw2DSVG, and overriding the drawXXX methods.
>>
>> But I find that when I call drawMolecule on a molecule that already has
>> 2D coordinates, the lines that get drawn have all been rescaled and
>> translated.  I need to turn off this, I need for the coordinates used to
>> make the lines match the existing conformer.  I know this is somehow
>> related to the width and height used in the MolDraw2D constructor, and I
>> see that there are options fixedBondLength, fixedScale, as well as a
>> setScale method.  But I'm not clear on what values to set in these to set
>> the rescaling to 1 and translation to 0.
>>
>> Thanks
>> Jason Biggs
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
>
> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] draw molecule without rescaling or translating

2020-07-26 Thread Jason Biggs
I'm trying to use the MolDraw2D class in C++ to generate all the graphics
primitives for a molecule, which I then pass into my own graphics engine to
make an image.  I'm doing this by making a new class that is a subclass of
MolDraw2D, similar to MolDraw2DSVG, and overriding the drawXXX methods.

But I find that when I call drawMolecule on a molecule that already has 2D
coordinates, the lines that get drawn have all been rescaled and
translated.  I need to turn off this, I need for the coordinates used to
make the lines match the existing conformer.  I know this is somehow
related to the width and height used in the MolDraw2D constructor, and I
see that there are options fixedBondLength, fixedScale, as well as a
setScale method.  But I'm not clear on what values to set in these to set
the rescaling to 1 and translation to 0.

Thanks
Jason Biggs
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Get the Exact Mass from a Molecular Formula field

2020-03-23 Thread Jason Biggs
You can get the most common isotope mass from the periodic table.  Here is
a small modification to the code you linked,
https://gist.github.com/jasondbiggs/cadc261ed00a08054ad5c4a85cccd9d4



Jason Biggs



On Mon, Mar 23, 2020 at 9:21 AM Pierre-Marie Allard <
pierre-marie.all...@unige.ch> wrote:

> Hi all !
>
>
> I would like to calculate the exact mass (monoisotopic mass) of a compound
> given it's molecular formula.
> I am aware of the Descriptors.ExactMolWt() function. However I have
> problematic SMILES and InChI and would like to calculate the exact mass
> directly form the molecular formula.
>
> This thread reported a custom function [
> https://bioinformatics.stackexchange.com/a/9273](
> https://bioinformatics.stackexchange.com/a/9273) to calculate it using
> RDKit. However the GetMass() function returns the molecular weight on the
> given atom (meaned by it's isotopes). Is there and equivalent
> GetExactMass() somewhere ?
>
>
> Many thanks,
>
> PM
>
>
> _
>
> Pierre-Marie Allard
> Research Assistant - Natural Products Chemistry
> EPGL - UniGe - Geneva
> pierre-marie.all...@unige.ch
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AdditionalOutput from FingerprintGenerator

2020-03-16 Thread Jason Biggs
Thank you again Greg.  If you have time to get this in the upcoming release
great, do not rush on my account.

I have another couple of questions regarding fingerprints in general and
the fingerprint generators in particular.


   - To what degree do people use the different fingerprint types? Is it
   more common to use the RDKit fingerprint, for example, as a bit vector, and
   the Morgan fingerprint as a counts vector?  Does it depend on the
   application or is it more how a particular fingerprint was historically
   used?
   - I notice there is a wider variety of distance measures available for
   bit vectors than for count vectors. Is this because these measures, the
   McConnaughey similarity for example, aren't extendable to multisets in the
   same way that Tversky similarity can? Or is it just that there hasn't been
   any demand for non-bitvector versions of the measures in BitOps.h?
   - Would it be useful to people for the FingerprintGenerator class to
   return the list of atom invariants (or environments) used?  Or is that what
   the BitInfo is used for?

Best,
Jason



On Fri, Mar 13, 2020 at 11:13 PM Greg Landrum 
wrote:

> Unfortunately it looks like the additional outputs for morgan, and rdkit
> fingerprints are parts that weren't finished:
>
> https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Fingerprints/MorganGenerator.cpp#L143
>
> https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Fingerprints/RDKitFPGenerator.cpp#L99
>
> I will take a look and see if it's possible to get these into the next
> release. In the meantime, if you want that info it looks like you'll need
> to use the older fingerprinting functions.
>
> -greg
>
> On Fri, Mar 13, 2020 at 11:10 PM Jason Biggs 
> wrote:
>
>> Thank you Greg.
>>
>> I am working in C++.  I can poke around with this if I knew which members
>> of the AdditionalOutput struct are used by which fingerprint generators.  I
>> just wanted to make sure there wasn't an explanation somewhere I missed.
>>
>> I can see that with the AtomPairs fingerprints I can do the following
>>
>> //mol is an *ROMol and fpg is a *FingerprintGenerator
>> RDKit::AdditionalOutput ao;
>>
>> std::vector> atomtobits(mol->getNumAtoms());
>> ao.atomToBits = 
>>
>> auto res = fpg->getSparseCountFingerprint(*mo, nullptr, nullptr, -1, );
>>
>> after which atomtobits contains a list of bits for each atom.  From the
>> comments I think the bitInfo member should be used by the
>> RDKitFingerprintGenerator, but I don't see where it is used in the code.
>> Is that the part that wasn't finished?  Is it possible to get information
>> about the atoms/environments that set particular bits in the Morgan or
>> RDKit fingerprints using the new API?
>>
>> Jason Biggs
>>
>>
>>
>> On Fri, Mar 13, 2020 at 10:20 AM Greg Landrum 
>> wrote:
>>
>>> Hi Jason,
>>>
>>> At the moment there's nothing available here except what's in the C++
>>> tests. This part of the code didn't end up being completely finished before
>>> the GSoC project ended and it's never bubbled up on my priority list to
>>> finish it.
>>>
>>> I haven't spent much time with this code, but I can probably put
>>> together an example.
>>> Are you working from C++?
>>>
>>> -greg
>>>
>>>
>>> On Thu, Mar 12, 2020 at 10:42 PM Jason Biggs 
>>> wrote:
>>>
>>>> I am taking a look at the FingerprintGenerator class and I really like
>>>> this unified interface for these four types of fingerprints.  I have very
>>>> limited experience with the fingerprint code before the generator API was
>>>> introduced.
>>>>
>>>> What I'm not sure about is how to get information about the
>>>> atoms/environments that set the bits.  I believe I need to use the
>>>> AdditionalOutput struct,
>>>> https://www.rdkit.org/docs/cppapi/structRDKit_1_1AdditionalOutput.html,
>>>> but I'm not exactly sure how to do so.  I normally would look at the c++
>>>> test files to see how it is used, and from that I see the atomToBits member
>>>> is used in the atom pairs fingerprints, but I'm not sure about the other
>>>> members of this struct.  For example there is a bitInfo member, is this
>>>> where I would find information for the RDKit and Morgan fingerprints?
>>>>
>>>> Are there any examples somewhere that I could follow to find out more
>>>> information?
>>>>
>>>> Thank you
>>>>
>>>> Jason
>>>> ___
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AdditionalOutput from FingerprintGenerator

2020-03-13 Thread Jason Biggs
Thank you Greg.

I am working in C++.  I can poke around with this if I knew which members
of the AdditionalOutput struct are used by which fingerprint generators.  I
just wanted to make sure there wasn't an explanation somewhere I missed.

I can see that with the AtomPairs fingerprints I can do the following

//mol is an *ROMol and fpg is a *FingerprintGenerator
RDKit::AdditionalOutput ao;

std::vector> atomtobits(mol->getNumAtoms());
ao.atomToBits = 

auto res = fpg->getSparseCountFingerprint(*mo, nullptr, nullptr, -1, );

after which atomtobits contains a list of bits for each atom.  From the
comments I think the bitInfo member should be used by the
RDKitFingerprintGenerator, but I don't see where it is used in the code.
Is that the part that wasn't finished?  Is it possible to get information
about the atoms/environments that set particular bits in the Morgan or
RDKit fingerprints using the new API?

Jason Biggs



On Fri, Mar 13, 2020 at 10:20 AM Greg Landrum 
wrote:

> Hi Jason,
>
> At the moment there's nothing available here except what's in the C++
> tests. This part of the code didn't end up being completely finished before
> the GSoC project ended and it's never bubbled up on my priority list to
> finish it.
>
> I haven't spent much time with this code, but I can probably put together
> an example.
> Are you working from C++?
>
> -greg
>
>
> On Thu, Mar 12, 2020 at 10:42 PM Jason Biggs 
> wrote:
>
>> I am taking a look at the FingerprintGenerator class and I really like
>> this unified interface for these four types of fingerprints.  I have very
>> limited experience with the fingerprint code before the generator API was
>> introduced.
>>
>> What I'm not sure about is how to get information about the
>> atoms/environments that set the bits.  I believe I need to use the
>> AdditionalOutput struct,
>> https://www.rdkit.org/docs/cppapi/structRDKit_1_1AdditionalOutput.html,
>> but I'm not exactly sure how to do so.  I normally would look at the c++
>> test files to see how it is used, and from that I see the atomToBits member
>> is used in the atom pairs fingerprints, but I'm not sure about the other
>> members of this struct.  For example there is a bitInfo member, is this
>> where I would find information for the RDKit and Morgan fingerprints?
>>
>> Are there any examples somewhere that I could follow to find out more
>> information?
>>
>> Thank you
>>
>> Jason
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] AdditionalOutput from FingerprintGenerator

2020-03-12 Thread Jason Biggs
I am taking a look at the FingerprintGenerator class and I really like this
unified interface for these four types of fingerprints.  I have very
limited experience with the fingerprint code before the generator API was
introduced.

What I'm not sure about is how to get information about the
atoms/environments that set the bits.  I believe I need to use the
AdditionalOutput struct,
https://www.rdkit.org/docs/cppapi/structRDKit_1_1AdditionalOutput.html, but
I'm not exactly sure how to do so.  I normally would look at the c++ test
files to see how it is used, and from that I see the atomToBits member is
used in the atom pairs fingerprints, but I'm not sure about the other
members of this struct.  For example there is a bitInfo member, is this
where I would find information for the RDKit and Morgan fingerprints?

Are there any examples somewhere that I could follow to find out more
information?

Thank you

Jason
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Smarts conversion help

2019-03-27 Thread Jason Biggs
On Tue, Mar 26, 2019 at 8:22 PM Patrick Walters  wrote:

> HI Xiaobo,
>
> There's an explicit hydrogen in the SMARTS that shouldn't be there.  I
> also wouldn't include the single bonds around the ring closures.
>

To be fair, that explicit hydrogen was in the original SMILES string, so it
is reasonable to find it in the SMARTS string if the conversion program
didn't make the same choice as RDKit to remove all hydrogens on parsing.
If you disable hydrogen removal in RDKit you do find a match,

smi = "O=C(C1=C2C(C=CC=C23)=CC=C1)N([H])C3=O"
params = Chem.SmilesParserParams()
params.removeHs=False
mol = Chem.MolFromSmiles(smi, params)

s=Chem.MolFromSmarts("[#8]=[#6]-3-c1c2c(ccc1)2-[#6](-[#7]-3-[#1])=[#8]")
mol.HasSubstructMatch(s)
// True


 Whether you want to remove hydrogens when parsing SMILES strings or
whether you want to represent those hydrogens as explicit vertices in the
pattern, that is up to you.



> '[#8]=[#6]-3-c1c2c(ccc1)2-[#6](-[#7]-3-*[#1]*)=[#8]')
>
> from rdkit import Chem
> from rdkit.Chem import Draw
>
> smi = "O=C(C1=C2C(C=CC=C23)=CC=C1)N([H])C3=O"
> mol = Chem.MolFromSmiles(smi)
> mol_list = [mol]
> core = Chem.MolFromSmarts("[#8]=[#6]3-c1c2c(ccc1)2-[#6](-[#7H]3)=[#8]")
> Draw.MolsToGridImage(mol_list,highlightAtomLists=[x.GetSubstructMatch(core)
> for x in mol_list])
>
> [image: image.png]
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Why this doesn't work? HasStructMatch function

2019-02-19 Thread Jason Biggs
Change your pattern to use ~ as an unspecified bond instead of - for a
single bond:

>m=Chem.MolFromSmiles('CN(C(C=CC=C1)=C1C2=O)C3=C2C=CC=C3')
>s=Chem.MolFromSmarts('c1c1~[#6](~c2c2)=[#8]')
>m.HasSubstructMatch(s)

True


Jason



On Tue, Feb 19, 2019 at 4:42 PM Li, Xiaobo [xiaoboli] <
xiaobo...@liverpool.ac.uk> wrote:

> Dear all,
>
> Why the output is False?
>
> m=Chem.MolFromSmiles('CN(C(C=CC=C1)=C1C2=O)C3=C2C=CC=C3')
> s=Chem.MolFromSmarts('c1c1-[#6](-c2c2)=[#8]')
> m.HasSubstructMatch(s)
>
> Output: False
>
> m
>
>
>
> s
>
>
>
>
> Best regards,
>
>
> Xiaobo Li
>
>
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MMFF changing molecule

2019-02-06 Thread Jason Biggs
Thanks for the response Paolo.

I would still think that regardless of the MMFF94's aromaticity model, the
only modifications it should make to the ROMol passed to it should be to
the conformer.  At the very least the documentation for MMFFMolProperties
should mention the side effect.

MolToInChI and MolToMolBlock don't modify the passed molecule, even though
InChI and mol files handle aromaticity differently than the RDKit's default.

Maybe the fact that the MMFFMolProperties constructor doesn't take the
input mol as const is clue enough that there are side effects, but it still
caught me off guard and took some time to figure out what was happening.

Jason

On Tue, Feb 5, 2019 at 3:43 PM Paolo Tosco 
wrote:

> Hi Jason,
>
> The MMFF94 force field uses a different aromaticity model from the RDKit;
> to restore the traditional RDKit aromaticity model you may useChem.
> SetAromaticity():
>
> from cresset import flare
>
> from rdkit import Chemfrom rdkit.Chem import AllChem
>
> m = Chem.MolFromSmiles("Cc1(Oc2cnc(=O)[nH]c2)c1")
>
> m1 = Chem.AddHs(m)
>
> m2 = Chem.Mol(m1)
>
> AllChem.EmbedMolecule(m2);
>
> m2.GetSubstructMatches(Chem.MolFromSmarts('a'))
>
> ((1,), (2,), (3,), (4,), (5,), (7,), (8,), (9,), (10,), (12,), (13,), (14,))
>
> AllChem.MMFFOptimizeMolecule(m2);
>
> m2.GetSubstructMatches(Chem.MolFromSmarts('a'))
>
> ((1,), (2,), (3,), (4,), (5,), (14,))
>
> Chem.SetAromaticity(m2)
>
> m2.GetSubstructMatches(Chem.MolFromSmarts('a'))
>
> ((1,), (2,), (3,), (4,), (5,), (7,), (8,), (9,), (10,), (12,), (13,), (14,))
>
>
> Cheers,
> p.
>
> On 02/05/19 20:31, Jason Biggs wrote:
>
> I noticed that I'm getting a different substructure match depending on
> whether I have called the MMFF optimization.  This is reproducible in
> python with
>
>
> >> m = Chem.MolFromSmiles("Cc1(Oc2cnc(=O)[nH]c2)c1")
>
> >> m1 = Chem.AddHs(m)
> >> m2 = Chem.Mol(m1)
> >> AllChem.EmbedMolecule(m2);
> >> m2.GetSubstructMatches(Chem.MolFromSmarts('a'))
>
> ((1,), (2,), (3,), (4,), (5,), (7,), (8,), (9,), (10,), (12,), (13,), (14,))
>
>
> >> AllChem.MMFFOptimizeMolecule(m2);
> >> m2.GetSubstructMatches(Chem.MolFromSmarts('a'))
>
> ((1,), (2,), (3,), (4,), (5,), (14,))
>
> Is this expected behavior?  I notice in the C++ code that the constructor
> for RDKit::MMFF::MMFFMolProperties will call MolOps::Kekulize, so maybe
> this is expected.  For my application it's undesirable, and I can work
> around it by creating a copy of my ROMol and generating the
> MMFFMolProperties object using that copy (which can then be discarded).  Is
> there a better workaround?
>
>
> Jason Biggs
>
>
>
>
>
> ___
> Rdkit-discuss mailing 
> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] MMFF changing molecule

2019-02-05 Thread Jason Biggs
I noticed that I'm getting a different substructure match depending on
whether I have called the MMFF optimization.  This is reproducible in
python with


>> m = Chem.MolFromSmiles("Cc1(Oc2cnc(=O)[nH]c2)c1")

>> m1 = Chem.AddHs(m)
>> m2 = Chem.Mol(m1)
>> AllChem.EmbedMolecule(m2);
>> m2.GetSubstructMatches(Chem.MolFromSmarts('a'))

((1,), (2,), (3,), (4,), (5,), (7,), (8,), (9,), (10,), (12,), (13,), (14,))


>> AllChem.MMFFOptimizeMolecule(m2);
>> m2.GetSubstructMatches(Chem.MolFromSmarts('a'))

((1,), (2,), (3,), (4,), (5,), (14,))


Is this expected behavior?  I notice in the C++ code that the constructor
for RDKit::MMFF::MMFFMolProperties will call MolOps::Kekulize, so maybe
this is expected.  For my application it's undesirable, and I can work
around it by creating a copy of my ROMol and generating the
MMFFMolProperties object using that copy (which can then be discarded).  Is
there a better workaround?



Jason Biggs
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] InChI to Mol to InChi

2018-12-18 Thread Jason Biggs
see https://github.com/rdkit/rdkit/issues/1852, and
https://sourceforge.net/p/rdkit/mailman/message/36309813/

You can see it in the smiles if you remove stereo after embedding, then
re-detect stereo from the conformation.

inchi1 =
"InChI=1S/C20H26O4/c1-12(2)17-11-18(22)14(4)7-5-6-13(3)8-16(21)9-15-10-19(17)24-20(15)23/h6-7,10,12,17,19H,5,8-9,11H2,1-4H3/b13-6-,14-7+/t17-,19-/m1/s1"
m1 = Chem.MolFromInchi(inchi1)
m1 = Chem.AddHs(m1)
m2 = Chem.Mol(m1)
AllChem.EmbedMolecule(m2)
m3 = Chem.Mol(m2)
Chem.rdmolops.RemoveStereochemistry(m3)
Chem.rdmolops.AssignStereochemistryFrom3D(m3)
sm1 = Chem.MolToSmiles(m1)
sm2 = Chem.MolToSmiles(m2)
sm3 = Chem.MolToSmiles(m3)
print(sm1 == sm2)  # returns true
print(sm2 == sm3) # returns false


The difference between sm2 and sm3 is just swapping a \ for a /, confirming
what Christos was able to read from the InChI.

Why does the inchi reflect the 3D bond stereo but the smiles doesn't until
you remove and re-detect the stereo?  Does the InChI code go to the 3D
structure when present and ignore stereo information in the mol object?

Jason Biggs


On Tue, Dec 18, 2018 at 12:14 PM Christos Kannas 
wrote:

> Hi Jean-Marc,
>
> There difference is due to bond orientation (if my inchi analysis skills
> are correct).
> See the bold bond layer below (14-7+ vs 14-7-).
>
>
> m1 -> 
> InChI=1S/C20H26O4/c1-12(2)17-11-18(22)14(4)7-5-6-13(3)8-16(21)9-15-10-19(17)24-20(15)23/h6-7,10,12,17,19H,5,8-9,11H2,1-4H3/*b13-6-,14-7+*/t17-,19-/m1/s1
>
> m2 -> 
> InChI=1S/C20H26O4/c1-12(2)17-11-18(22)14(4)7-5-6-13(3)8-16(21)9-15-10-19(17)24-20(15)23/h6-7,10,12,17,19H,5,8-9,11H2,1-4H3/*b13-6-,14-7-*/t17-,19-/m1/s1
>
>
> Not sure why it happens, but I've seen it multiple times...
>
>
> Best,
>
> Christos
>
> Christos Kannas
>
> Chem[o]informatics Researcher & Software Developer
>
> [image: View Christos Kannas's profile on LinkedIn]
> <http://cy.linkedin.com/in/christoskannas>
>
>
> On Tue, 18 Dec 2018 at 17:36, JEAN-MARC NUZILLARD <
> jm.nuzill...@univ-reims.fr> wrote:
>
>> Thank you for your answer but alatis might not be adapted to my current
>> problem.
>>
>> Attempting to understand what was changed by the embedding step I wrote:
>>
>> inchi1 =
>>
>> "InChI=1S/C20H26O4/c1-12(2)17-11-18(22)14(4)7-5-6-13(3)8-16(21)9-15-10-19(17)24-20(15)23/h6-7,10,12,17,19H,5,8-9,11H2,1-4H3/b13-6-,14-7+/t17-,19-/m1/s1"
>> m1 = Chem.MolFromInchi(inchi1)
>> m1 = Chem.AddHs(m1)
>> m2 = Chem.Mol(m1)
>> AllChem.EmbedMolecule(m2)
>> sm1 = Chem.MolToSmiles(m1)
>> sm2 = Chem.MolToSmiles(m2)
>> print(sm1)
>> print(sm2)
>> print(sm1 == sm2)
>> inc1 = Chem.MolToInchi(m1)
>> inc2 = Chem.MolToInchi(m2)
>> print(inc1)
>> print(inc2)
>> print(inc1 == inc2)
>>
>> Molecules m1 and m2 have identical SMILES representations
>> but different InChI representations, which I find odd.
>>
>> All the best,
>>
>> Jean-Marc
>>
>>
>>
>>
>> Le 18/12/2018 00:40, Dimitri Maziuk via Rdkit-discuss a écrit :
>> > On 12/17/18 4:50 PM, JEAN-MARC NUZILLARD wrote:
>> >> Is there any more deterministic procedure than the one of trying until
>> >> success is obtained?
>> >>
>> >> How do I determine the InChI string of a conformer obtained after
>> >> multiple embedding?
>> >
>> > This representation keeps 3D config: http://alatis.nmrfam.wisc.edu/
>> >
>> > Generally speaking the problem with InChI is that the only *required*
>> > layer is the formula. Therefore *an* InChI string cannot be used to
>> > differentiate conformers, you need the InChI string with all the
>> > relevant layers and all the protons.
>> >
>> > https://www.nature.com/articles/sdata201773
>> >
>> > ___
>> > Rdkit-discuss mailing list
>> > Rdkit-discuss@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] InChI to Mol to InChi

2018-12-17 Thread Jason Biggs
Is this the same issue as https://github.com/rdkit/rdkit/issues/1852? where
the input string specifies a trans double bond in a ring but EmbedMolecule
returns a cis configuration?  I get different SMILES string before and
after embedding,

C/C1=C/C/C=C(\C)C(=O)C[C@H](C(C)C)[C@H]2C=C(CC(=O)C1)C(=O)O2
C/C1=C/C/C=C(/C)C(=O)C[C@H](C(C)C)[C@H]2C=C(CC(=O)C1)C(=O)O2

Getting the two highlighted atoms trans to each other seems difficult, and
the embedder doesn't check and reject for double-bond stereo like it does
for chiral centers.


[image: image.png]

(does this copy/pasted image come across in the mailing list?)


Jason Biggs



On Mon, Dec 17, 2018 at 7:24 AM JEAN-MARC NUZILLARD <
jm.nuzill...@univ-reims.fr> wrote:

> Dear all,
>
> I tried to transform an InChI into a Mol and back to InChi.
>
> The following code:
>
> inchi1 =
>
> "InChI=1S/C20H26O4/c1-12(2)17-11-18(22)14(4)7-5-6-13(3)8-16(21)9-15-10-19(17)24-20(15)23/h6-7,10,12,17,19H,5,8-9,11H2,1-4H3/b13-6-,14-7+/t17-,19-/m1/s1"
> m = Chem.MolFromInchi(inchi1)
> m = Chem.AddHs(m)
> AllChem.EmbedMolecule(m)
> inchi2 = Chem.MolToInchi(m)
> print(inchi1)
> print(inchi2)
> print("EQUAL" if inchi1 == inchi2 else "DIFFERENT")
>
> returns DIFFERENT.
>
> Removing the EmbedMolecule step returns EQUAL.
>
> How could I change that?
>
>
> All the best,
>
> Jean-Marc
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] GETAWAY descriptor returning non-numeric values

2018-11-29 Thread Jason Biggs
I don't know enough about the GETAWAY descriptor to know if this is to be
expected.

>>>m = Chem.MolFromSmiles('CC=O')
>>>m2=Chem.AddHs(m)
>>>AllChem.EmbedMolecule(m2,randomSeed = 1234)
>>>(Chem.rdMolDescriptors.CalcGETAWAY(m2))[85:90]

[0.207, 0.008, 0.027, nan, 0.0]


Is this a bug?  If not, is it reasonable to set these to 0.0?

Thanks
Jason
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] (no subject)

2018-07-31 Thread Jason Biggs
According to
http://www.rdkit.org/docs/api/rdkit.Chem.rdMolAlign-module.html#AlignMol,
the function returns the RMSD, and the probe molecule is modified in
place.  Look at the coordinates for mol2 before and after the
transformation:

mol = Chem.MolFromSmiles('CCCOC(=O)[O-]')
> Chem.rdDistGeom.EmbedMolecule(mol)
>
>> 0
>
> mol2 = Chem.MolFromSmiles('CC')
> Chem.rdDistGeom.EmbedMolecule(mol2)
>
>> 0
>
> mol2.GetConformer(0).GetPositions()
>
>> array([[ 0.75549098,  0.,  0.],
>>[-0.75549098,  0.,  0.]])
>
> Chem.rdMolAlign.AlignMol(mol2,mol, atomMap = ((0,0),(1,1)))
>
>> 0.005493118189962501
>
> mol2.GetConformer(0).GetPositions()
>
>> array([[ 3.10580323, -0.07674776, -0.45975102],
>>[ 1.75185317,  0.36881027,  0.04161089]])
>
>
Jason Biggs



On Tue, Jul 31, 2018 at 3:41 PM Phuong Chau  wrote:

> Hello everyone,
>
> I want to align chem B based on the 3D coordinates of chemical A and
> output the 3D coordinates of chemical B (alignment that center of mass of
> chem B is the same as chem A). I looked at the AlignMol() and Open3D align
> but it returns either a RMS value or a score. Is there a way that I convert
> either of these numbers to 3D coordinates?
>
> Thank you so much for your time and consideration.
>
> --
> Phuong Chau
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] A couple of questions about CoordGen in RDKit

2018-06-02 Thread Jason Biggs
Nicola,
Thanks for the example, I can definitely see the improvement in the diagram
from the template.  Is it mainly then these complicated bridged ring
systems that use the templates?  I do like the diagrams from Coordgen very
much, even when it doesn't use a template.

How difficult will it be to add more templates to the templates.mae file -
as I find examples of molecules that don't do well in the default method?
Can this be done with the rdkit, or would it need something else from the
Schrodinger repo?


This would be a question for both Greg and Nicola:  What would be good
words to describe the layout methods used by coordgen vs rdkit?  I want to
have an option for the user,  Molecule[ , DiagramLayout ->
"MethodName"], but just using "RDKit" and "CoordGen" isn't right because it
doesn't describe the underlying algorithm, just the library that implements
them.

Would it be wrong to call the rdkit method "DistanceGeometry"?  What is the
main distinction between the two methods?

Best,

Jason


(the image from Nicola's email didn't come through to me, showing the
example with template on the bottom, without template on the top - big
improvement)





Jason Biggs


On Sat, Jun 2, 2018 at 7:08 AM, Nicola Zonta 
wrote:

> Hello,
>
> yes, I don’t think we check for the existence of the directory (I got rid
> of that code when we released cause it was using a proprietary lib and
> never replaced it). It’s surprising that you get the same results though.
>
> here’s the smiles I use (or you can use any molecule in templates.mae (I
> am not sure if maeparser has been integrated with RDKit yet?) )
>
> C12CC3CC(C1)CC(C2)C3
>
> which should look something like this if the templates are used
>
>
> weird about the unstable coordinates, I think looking at the structure it
> has to do with the minimisation but I have no quick solution for it
>
>
> On 02 Jun 2018, at 12:35, Greg Landrum  wrote:
>
> Hi Jason,
>
> That's a great question. I can also confirm that it seems that setting the
> parameter file location to a bogus value seems to have no effect.
> @Nic: can you help us out here? I figure you can probably answer the
> question quicker than I can dig through the code. :-)
>
> -greg
>
>
> On Thu, May 31, 2018 at 10:22 PM Jason Biggs 
> wrote:
>
>> I recently switched to the 2018_03_1 release, and I am trying out the new
>> 2D coordinate generating functions.  The diagrams look good, but I can't
>> seem to figure out what the role of the template file is.
>>
>> I find that I can set the templateFileDir parameter either to a real
>> directory with the templates.mae file in it, or to an almost-empty string "
>> ", and it has no effect.  Is there an example SMILES where using the
>> template file changes the returned diagram?
>>
>> Another thing I notice is the conformer generated by CoordGen isn't
>> always reproducible.  I find that if I run the following code multiple
>> times, I will get different results,
>>
>>
>> m = Chem.MolFromSmiles('CO[C@H]1[C@]2(O)C(=O)N3C=CC(C)(C)c4c(C=
>> C3C(=O)N2[C@@]23[C@@]1(O)c1c1N3C([C@H]2C)(C)C)c1c1[nH]4')
>> Chem.rdCoordGen.AddCoords(m)
>> m.GetConformer(0).GetPositions()[0]
>>
>> will sometime output
>>
>> array([-1.2795,  1.35720001,  0.])
>>
>>
>> but other times outputs
>>
>> array([-1.28240005,  1.365 ,  0.])
>>
>>
>> Obviously it's a small difference, but I would prefer to always return
>> the same values for the same input.
>>
>> Best,
>> Jason
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org <http://slashdot.org>!
>> http://sdm.link/slashdot___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] A couple of questions about CoordGen in RDKit

2018-05-31 Thread Jason Biggs
I recently switched to the 2018_03_1 release, and I am trying out the new
2D coordinate generating functions.  The diagrams look good, but I can't
seem to figure out what the role of the template file is.

I find that I can set the templateFileDir parameter either to a real
directory with the templates.mae file in it, or to an almost-empty string "
", and it has no effect.  Is there an example SMILES where using the
template file changes the returned diagram?

Another thing I notice is the conformer generated by CoordGen isn't always
reproducible.  I find that if I run the following code multiple times, I
will get different results,


m = Chem.MolFromSmiles('CO[C@H]1[C@]2(O)C(=O)N3C=CC(C)(C)c4c(C=C3C(=O)N2[C@
@]23[C@@]1(O)c1c1N3C([C@H]2C)(C)C)c1c1[nH]4')
Chem.rdCoordGen.AddCoords(m)
m.GetConformer(0).GetPositions()[0]

will sometime output

array([-1.2795,  1.35720001,  0.])


but other times outputs

array([-1.28240005,  1.365 ,  0.])


Obviously it's a small difference, but I would prefer to always return the
same values for the same input.

Best,
Jason
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] EmbedMolecule ignoring double-bond ring stereochemistry

2018-05-02 Thread Jason Biggs
I don't know if this is a duplicate of
https://github.com/rdkit/rdkit/issues/435

I notice that ring double-bond stereo gets quietly ignored by the embedding
code.  Trying to make cis and trans cyclododecene (choosing a large ring to
minimize any strain), and they both come out cis.

Using the SMILES "C/1=C\CC1" and "C/1=C/CC1", the 3D
structure comes out cis for both
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] cmake error building master

2018-04-19 Thread Jason Biggs
Greg,
I have boost version 1.64.  I found reference to this error on the web,
always in reference to boost version 1.64.  But what I don't see is whether
this is fixed in later or previous versions.

I forgot to include the first error I get, actually during the cmake stage,
when it tries to download the maeparser tarball.  Somehow the curl command
isn't downloading the file with the right name.  I get this error

- Found Threads: TRUE
-- Boost version: 1.64.0
-- Found the following Boost libraries:
--   serialization
== Using strict rotor definition
Downloading
https://codeload.github.com/schrodinger/maeparser/tar.gz/83368293dcc0eb07562dadfb7728b8d18d23a6cb.
..
  % Total% Received % Xferd  Average Speed   TimeTime Time
Current
 Dload  Upload   Total   SpentLeft
Speed
100 24984  100 249840 0  73460  0 --:--:-- --:--:-- --:--:--
274k
CMake Error at Code/cmake/Modules/RDKitUtils.cmake:198 (MESSAGE):
  The md5 checksum for
  base/rdkit/External/CoordGen/master.tar.gz
  is incorrect; expected: 32c0c3b315bba49fbf4c41a07aa58528, found:
  d41d8cd98f00b204e9800998ecf8427e
Call Stack (most recent call first):
  External/CoordGen/CMakeLists.txt:9 (downloadAndCheckMD5)



If I go to the Coordgen directory after getting this error, I see the
following

drwxr-xr-x 3 jasonb users 4.0K Apr 19 13:06 .
drwxr-xr-x 7 jasonb users 4.0K Apr 19 13:05 ..
-rw-r--r-- 1 jasonb users  25K Apr 19 13:06
83368293dcc0eb07562dadfb7728b8d18d23a6cb
-rw-r--r-- 1 jasonb users 2.8K Apr 19 13:05 CMakeLists.txt
-rw-r--r-- 1 jasonb users 5.6K Apr 19 13:05 CoordGen.h
-rw-r--r-- 1 jasonb users0 Apr 19 13:06 master.tar.gz
-rw-r--r-- 1 jasonb users  12K Apr 19 13:05 test.cpp
drwxr-xr-x 2 jasonb users 4.0K Apr 19 13:05 Wrap


So the curl command downloaded the tarball as
"83368293dcc0eb07562dadfb7728b8d18d23a6cb",
and created an empty file "master.tar.gz" which of course doesn't match the
md5 sum.  I definitely don't know enough about cmake, or curl for that
matter, to see why this is happening.  But if I manually download the
maeparser, the coordgenlibs, and the rapidjson files and put them in
External/ then it works fine.

Thanks,
Jason



Jason Biggs


On Thu, Apr 19, 2018 at 12:42 PM, Greg Landrum <greg.land...@gmail.com>
wrote:

> Which version of boost is that?
>
> On Thu, 19 Apr 2018 at 19:34, Jason Biggs <jasondbi...@gmail.com> wrote:
>
>> Trying to build on Scientific Linux release 6.9, and I'm getting boost
>> serialization errors, both on the recent release branch and on master
>>
>>
>> In file included from base/Boost/include/boost/
>>> numeric/ublas/vector.hpp:21:0,
>>>  from base/Boost/include/boost/
>>> numeric/ublas/matrix.hpp:18,
>>>  from base/rdkit/Code/GraphMol/Substruct/ullmann.hpp:41,
>>>  from base/rdkit/Code/GraphMol/
>>> Substruct/SubstructMatch.cpp:29:
>>> base/Boost/include/boost/numeric/ublas/storage.hpp: In member function
>>> ‘void boost::numeric::ublas::unbounded_array<T,
>>> ALLOC>::serialize(Archive&, unsigned int)’:
>>> base/Boost/include/boost/numeric/ublas/storage.hpp:299:18: error:
>>> ‘make_array’ is not a member of ‘boost::serialization’
>>>  ar & serialization::make_array(data_, s);
>>>   ^
>>> base/Boost/include/boost/numeric/ublas/storage.hpp: In member function
>>> ‘void boost::numeric::ublas::bounded_array<T, N,
>>> ALLOC>::serialize(Archive&, unsigned int)’:
>>> base/Boost/include/boost/numeric/ublas/storage.hpp:494:18: error:
>>> ‘make_array’ is not a member of ‘boost::serialization’
>>>  ar & serialization::make_array(data_, s);
>>
>>
>>
>> If I add
>>
>> #include 
>>
>>
>> to the SubstructMatch.cpp file, as instructed here
>> https://stackoverflow.com/q/44534516/4712538, then compilation continues
>> fine.
>>
>>
>> Jason
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot__
>> _
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] cmake error building master

2018-04-19 Thread Jason Biggs
Trying to build on Scientific Linux release 6.9, and I'm getting boost
serialization errors, both on the recent release branch and on master


In file included from
> base/Boost/include/boost/numeric/ublas/vector.hpp:21:0,
>  from base/Boost/include/boost/numeric/ublas/matrix.hpp:18,
>  from base/rdkit/Code/GraphMol/Substruct/ullmann.hpp:41,
>  from
> base/rdkit/Code/GraphMol/Substruct/SubstructMatch.cpp:29:
> base/Boost/include/boost/numeric/ublas/storage.hpp: In member function
> ‘void boost::numeric::ublas::unbounded_array::serialize(Archive&,
> unsigned int)’:
> base/Boost/include/boost/numeric/ublas/storage.hpp:299:18: error:
> ‘make_array’ is not a member of ‘boost::serialization’
>  ar & serialization::make_array(data_, s);
>   ^
> base/Boost/include/boost/numeric/ublas/storage.hpp: In member function
> ‘void boost::numeric::ublas::bounded_array ALLOC>::serialize(Archive&, unsigned int)’:
> base/Boost/include/boost/numeric/ublas/storage.hpp:494:18: error:
> ‘make_array’ is not a member of ‘boost::serialization’
>  ar & serialization::make_array(data_, s);



If I add

#include 


to the SubstructMatch.cpp file, as instructed here
https://stackoverflow.com/q/44534516/4712538, then compilation continues
fine.


Jason
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] edge matrix

2018-01-17 Thread Jason Biggs
I am a novice when it comes to graph theory, but it seems like what is
wanted here is the adjacency matrix of the corresponding line graph (
http://mathworld.wolfram.com/LineGraph.html).

I don't know how to do this in python, but if I use mathematica, it goes
like this

adjacencyMatrix = {{0, 1, 0, 0, 0}, {1, 0, 1, 1, 0}, {0, 1, 0, 0,
0}, {0, 1, 0, 0, 1}, {0, 0, 0, 1, 0}};

graph = AdjacencyGraph[adjacencyMatrix];
lineGraph = LineGraph[graph];
AdjacencyMatrix[lineGraph] // MatrixForm

[image: Inline image 1]


Jason Biggs


On Wed, Jan 17, 2018 at 10:21 AM, Marta Stępniewska-Dziubińska via
Rdkit-discuss <rdkit-discuss@lists.sourceforge.net> wrote:

> Hi Mario,
>
> What exactly do you mean by 'edge matrix'? Are you sure you provided a
> correct example? If you want to get an adjacency matrix of a molecular
> graph you can iterate over bonds to get it:
>
> from rdkit.Chem import MolFromSmiles
> import numpy as np
> m = MolFromSmiles('CC(C)CC')
> n = m.GetNumAtoms()
> E = np.zeros((n, n))
> for b in m.GetBonds():
> i = b.GetBeginAtomIdx()
> j = b.GetEndAtomIdx()
> E[[i,j], [j,i]] = 1
>
>
> Hope this helps,
> Marta SD
>
>
>
> 2018-01-17 16:31 GMT+01:00 Mario Lovrić <mario.lovri...@gmail.com>:
>
>> Dear all,
>>
>> Does any one have an idea how to get an edge matrix (graph theory) out of
>> Rdkit, I digged deep but didnt find anything.
>>
>> F.example for:
>>
>> 'CC(C)CC'
>>
>>
>> it would be:
>>
>> array([[0, 1, 1, 0],
>>[1, 0, 1, 0],
>>[1, 1, 0, 1],
>>[0, 0, 1, 0]])
>>
>> Thanks.
>>
>>
>> --
>> Mario Lovrić
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] mol file parsing, 3D or 2D

2018-01-17 Thread Jason Biggs
On Wed, Jan 17, 2018 at 10:12 AM, Dimitri Maziuk 
wrote:

> On 2018-01-16 22:46, Greg Landrum wrote:
>
> It might be worth thinking about adding an option to the aromaticity
>> perception code to maintain the original bond types and just set the
>> "isAromatic" flag on the bonds.
>>
>
> This is how it's modeled in mmCIF chem. comp. It may or may not come from
> openeye they were using originally to process their ligands/chem comps.
>
> From programming perspective it's pretty annoying since you have to
> remember to add an extra if stanza to all your code, queries, etc.
>
> What's wrong with keeping a copy of the original molecule around? -- I'm
> not sure I get the "I want to sanitize and keep the original bonds too", it
> sounds too much like the proverbial cake.
>

To the extent possible, I do want to allow users to have and eat the cake
:-).

For the case in question, I find that if I read in a mol file containing 2D
coordinates, and I skip the sanitization step altogether, then the 3D
embedding algorithms fail.


>
> Dima
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] mol file parsing, 3D or 2D

2018-01-15 Thread Jason Biggs
Thanks for the detailed reply


>
> The fact that this isn't happening for you indicates that you are reading
> the molecules in without sanitizing them - the mol file parser calls
> assignStereochemistry() by default if you sanitize. Are you sure that you
> should be disabling sanitization?
>
>
I did turn it off, but then afterwards I call sanitize, but with a
user-specified option for whether to perform the
Kekulization/setAromaticity steps.  I'm not certain what we want to set as
the default behavior, but I want the user to have the option of having the
molecule stay in the particular kekulized state that they input.  I will go
through the molfile parser to see what else I'm missing out on by doing the
sanitization step afterward.

Would it break functionality to add a sanitizeOps optional argument to some
of the XXXToMol functions (defaulting to SANITIZE_ALL of course)?

Looking at the bonds block I pasted in, I do see that the bond directions
were specified therein.  Will have to ruminate on what to do with them -
when they were written out the intention was obviously to convey the 3D
geometry, but it clearly clashes with the convention of using wedges and
dashes to indicate chirality.




> -greg
>
>
>
>>
>> The mol file for the second question is pasted below, and here is the
>> generated depiction,
>>
>> [image: Inline image 2]
>>
>>
>> aspirin.mol
>>
>>  21 21  0  0  0
>>-2.2240   -1.4442   -0.4577 C   0  0  0  0  0
>>-2.1657   -0.0545   -0.5349 C   0  0  0  0  0
>>-0.99160.6085   -0.1694 C   0  0  0  0  0
>> 0.1471   -0.07380.2764 C   0  0  0  0  0
>> 0.0751   -1.48320.3390 C   0  0  0  0  0
>>-1.1052   -2.1532   -0.0188 C   0  0  0  0  0
>> 1.2412   -2.29340.7925 C   0  0  0  0  0
>> 2.4223   -1.76191.1727 O   0  0  0  0  0
>> 1.1650   -3.51620.8364 O   0  0  0  0  0
>> 1.27950.62330.5954 O   0  0  0  0  0
>> 1.10051.75771.3258 C   0  0  0  0  0
>> 2.44292.36351.6825 C   0  0  0  0  0
>> 0.02552.20411.6578 O   0  0  0  0  0
>>-3.1430   -1.9775   -0.7500 H   0  0  0  0  0
>>-3.03820.5167   -0.8915 H   0  0  0  0  0
>>-0.96081.7083   -0.2479 H   0  0  0  0  0
>>-1.1740   -3.25200.0315 H   0  0  0  0  0
>> 2.9869   -2.51321.4166 H   0  0  0  0  0
>> 2.31423.39672.0773 H   0  0  0  0  0
>> 3.10512.41410.7884 H   0  0  0  0  0
>> 2.93911.74592.4657 H   0  0  0  0  0
>>   1  2  2  0  0  0
>>   1  6  1  0  0  0
>>   1 14  1  0  0  0
>>   2  3  1  0  0  0
>>   2 15  1  0  0  0
>>   3  4  2  0  0  0
>>   3 16  1  0  0  0
>>   4  5  1  0  0  0
>>   4 10  1  0  0  0
>>   5  6  2  0  0  0
>>   5  7  1  0  0  0
>>   6 17  1  0  0  0
>>   7  8  1  0  0  0
>>   7  9  2  0  0  0
>>   8 18  1  0  0  0
>>  10 11  1  1  0  0
>>  11 12  1  0  0  0
>>  11 13  2  0  0  0
>>  12 19  1  0  0  0
>>  12 20  1  6  0  0
>>  12 21  1  1  0  0
>> M  END
>>
>>
>> Thanks,
>>
>> Jason
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit and Google Summer of Code 2018

2018-01-15 Thread Jason Biggs
   - I've had this on my to-do list for a few months now, implementing the
   algorithm described in this paper.  I think the force-field energy
   minimization routines already present in the RDKit can be utilized for this
   pretty easily.  The only part that I don't think is set up already would be
   applying a constant force to all atoms to force them into the xy plane.

Frączek, T., "Simulation-Based Algorithm for Two-Dimensional Chemical
Structure Diagram Generation of Complex Molecules and Ligand–Protein
Interactions." J. Chem. Inf. Model. 2016, 56, 2320-2335, DOI:
10.1021/acs.jcim.6b00391.



   - Another idea would be to add in point-group symmetry detection.  I'm
   using the Symmetrizer java library, described here
   https://www.ncbi.nlm.nih.gov/pubmed/22549414, and pretty happy with it
   overall.  One could re-implement it in C++, or include the jar in the
   External folder and write python wrappers.


Jason Biggs


On Mon, Jan 15, 2018 at 1:09 AM, Greg Landrum <greg.land...@gmail.com>
wrote:

> Dear all,
>
> We've been invited again to participate in the OpenChemistry application
> for Google Summer of Code.
>
> In order to participate we need ideas for projects and mentors to go along
> with them.
>
> The current list of RDKit ideas is being maintained here:
> http://wiki.openchemistry.org/GSoC_Ideas_2018#RDKit_Project_Ideas
>
> (Note: at the point that I'm pressing "send", that's still a copy of last
> year's project ideas).
>
> If you're willing to be a mentor (please ask me about the ~5 hours/week
> required here) or have ideas, please reply to this thread.
>
> Best,
> -greg
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] mol file parsing, 3D or 2D

2018-01-14 Thread Jason Biggs
Two question about mol file conformer reading:

Looking through the .mol files included for testing, and chose
"Code/GraphMol/Depictor/test_data/7UPJ_spread.mol" at random.

When I read in this file using the RDKit::MolFileToMol function, and then
query its conformer's is3D() method, it returns true even though it is
definitely a 2D depiction in the file.  I'm not totally familiar with the
MDL file specifications, so is there some flag I'm missing in the file?

Second question,

When I read in a file with a 3D conformer, and then later use
compute2DCoords, followed by WedgeMolBonds, it adds wedges to non-chiral
atoms.  Is this by design?  It definitely does serve to convey 3D
information from the file in the depiction, but I'd also like to know how
to disable it if possible.  Would running assignStereochemistry fix the
issue.

The mol file for the second question is pasted below, and here is the
generated depiction,

[image: Inline image 2]


aspirin.mol

 21 21  0  0  0
   -2.2240   -1.4442   -0.4577 C   0  0  0  0  0
   -2.1657   -0.0545   -0.5349 C   0  0  0  0  0
   -0.99160.6085   -0.1694 C   0  0  0  0  0
0.1471   -0.07380.2764 C   0  0  0  0  0
0.0751   -1.48320.3390 C   0  0  0  0  0
   -1.1052   -2.1532   -0.0188 C   0  0  0  0  0
1.2412   -2.29340.7925 C   0  0  0  0  0
2.4223   -1.76191.1727 O   0  0  0  0  0
1.1650   -3.51620.8364 O   0  0  0  0  0
1.27950.62330.5954 O   0  0  0  0  0
1.10051.75771.3258 C   0  0  0  0  0
2.44292.36351.6825 C   0  0  0  0  0
0.02552.20411.6578 O   0  0  0  0  0
   -3.1430   -1.9775   -0.7500 H   0  0  0  0  0
   -3.03820.5167   -0.8915 H   0  0  0  0  0
   -0.96081.7083   -0.2479 H   0  0  0  0  0
   -1.1740   -3.25200.0315 H   0  0  0  0  0
2.9869   -2.51321.4166 H   0  0  0  0  0
2.31423.39672.0773 H   0  0  0  0  0
3.10512.41410.7884 H   0  0  0  0  0
2.93911.74592.4657 H   0  0  0  0  0
  1  2  2  0  0  0
  1  6  1  0  0  0
  1 14  1  0  0  0
  2  3  1  0  0  0
  2 15  1  0  0  0
  3  4  2  0  0  0
  3 16  1  0  0  0
  4  5  1  0  0  0
  4 10  1  0  0  0
  5  6  2  0  0  0
  5  7  1  0  0  0
  6 17  1  0  0  0
  7  8  1  0  0  0
  7  9  2  0  0  0
  8 18  1  0  0  0
 10 11  1  1  0  0
 11 12  1  0  0  0
 11 13  2  0  0  0
 12 19  1  0  0  0
 12 20  1  6  0  0
 12 21  1  1  0  0
M  END


Thanks,

Jason
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit and Mathematica

2018-01-12 Thread Jason Biggs
To the developers of RDKit - this is a great package you've made and the
level of support and responsiveness to bugs is fantastic.

I've been working on adding chemistry functionality to Mathematica, and the
RDKit is fundamental to this functionality.  I'm writing here to see if
there are any RDKit users who also use Mathematica, and if so, what kind of
functionality you think is most important to include.

This won't be like the python or java wrappers, but rather we are trying to
design a Molecule object that is fully integrated with the rest of the
Wolfram Language but uses an RDKit::ROMol as the underlying structure.  As
we find bugs, we will report them, and when we implement functionality that
that isn't available in the RDKit, I'm hoping to add back to the community
here.

Best wishes,

Jason Biggs

PS - I find it to be surreal, but my boss has taken to live-streaming our
design meetings regarding the chemistry functionality, so if anyone is
interested to watch they are here:
https://www.twitch.tv/stephen_wolfram/videos/all
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to convert numpy array to rdkit fingerprint object?

2018-01-11 Thread Jason Biggs
There may be a better way (my python is rudimentary),

explicitList = numpy.array([0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1,
1, 1, 1, 1, 0, 0])

onbits = numpy.where(explicitList==1)[0].tolist()

bv1 = DataStructs.SparseBitVect(20)

bv1.SetBitsFromList(onbits)

bv1

for i in range(20):
print(bv1.GetBit(i))

False
True
True
True
False
True
False
True
False
True
True
True
False
True
True
True
True
True
False
False


Jason Biggs


On Thu, Jan 11, 2018 at 9:30 AM, Michał Nowotka <mmm...@gmail.com> wrote:

> Hi,
>
> Imagine I have two numpy arrays containing zeros and ones (or bools)
> effectively being fingerprints:
>
> np_1, np_2 = some_fingerprints_as_np_arrays()
>
> I want to convert them both to rdkit fingerprint objects so I can use
> DiceSimilarity:
>
> from rdkit import DataStructs
>
> # this won't work becuse of type incompatibility
> DataStructs.DiceSimilarity(np_1, np_2)
>
> In the http://www.rdkit.org/Python_Docs/rdkit.DataStructs.
> cDataStructs.ExplicitBitVect-class.html
> docs I can't find any constructor apart from FromBase64.
> Any hints?
>
> Cheers,
>
> Michał
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] strange linker error using VC++

2018-01-11 Thread Jason Biggs
On Thu, Jan 11, 2018 at 8:03 AM, Paolo Tosco <paolo.to...@unito.it> wrote:

> Hi Jason,
>
> I believe the problem here is that if you are building outside CMake the
> WIN32 preprocessor macro is not defined (_WIN32 is). So, when ROMol.h is
> parsed, the ROMol class definition includes a "private" directive that
> should not be there, hence the error. To fix the issue, you need to add to
> your cl command line a /DWIN32.
>

Thank you!  This did the trick.  I should learn to use CMake instead of
compiling everything in Mathematica (when all you have is a hammer,
everything looks like a nail)


> As a side note, I am not sure you really want to do:
>
> RDKit::ROMol mol = *(RDKit::SmilesToMol(smi));
> std::string res = RDKit::MolToSmiles(mol);
>
> In fact, this makes a copy of the molecule that the pointer returned by
> SmilesToMol() points to, while leaking the memory pointed to.
> If you are not interested in making a copy of the returned molecule, you
> should probably rather do:
>
> RDKit::RWMol *mol = RDKit::SmilesToMol(smi);
> std::string res = RDKit::MolToSmiles(*mol);
> delete mol;
>

You are definitely correct here, I was trying to make the simplest failing
example and the example without the copy compiled fine - declaring a
pointer to an ROMol is fine but declaring an actual object triggered the
error.

I'm very new to c++, will keep in mind to always delete my pointers.   If I
understand correctly, if I always wrap my ROMol objects in a
std::shared_ptr (or the boost equivalent), then the memory will not be
leaked.  Is that right?

Best,
Jason



>
>
> If you actually meant to make a copy, you'd better do:
>
> RDKit::RWMol *mol = RDKit::SmilesToMol(smi);
> RDKit::ROMol molCopy(*mol);
> delete mol;
> std::string res = RDKit::MolToSmiles(molCopy);
>
> Cheers,
> p.
>
>
> On 11/01/2018 02:33, Jason Biggs wrote:
>
> I'm trying to use the rdkit as a library in another project, and am having
> trouble getting it to build on windows.  I can get the code to compile on
> mac and linux, but it fails for windows, both 32-big and 64-bit varieties.
> I don't know how specific this is to the rdkit, but I have zero experience
> compiling with visual studio (and very little C++ coding background) and I
> am very confused here.  The following is just a toy example showing the
> minimum necessary for me to get the error.  I get the same error using the
> full code.
>
>
> If I create a test class using this code, I can compile it just fine in
> windows:
>
>
> #include 
>
> #include 
>
> #include 
>
>
> class testClass{
>
>
>
> testClass();
>
>
>
> ~testClass();
>
>
>
>
>
> std::string testFunc() {
>
> std::string smi = "CCC";
>
> RDKit::ROMol mol = *(RDKit::SmilesToMol(smi));
>
> std::string res = RDKit::MolToSmiles(mol);
>
> return res;
>
> };
>
>
>
> };
>
>
> testClass::testClass() {
>
> }
>
>
> testClass::~testClass() {
>
> }
>
>
>
> But if I move the definition of the testFunc() function outside of the
> class declaration (which is the normal case, where the definitions are in
> separate files), like this
>
>
> #include 
>
> #include 
>
> #include 
>
>
>
> class testClass{
>
>
>
> testClass();
>
>
>
> ~testClass();
>
>
>
> std::string testFunc();
>
>
>
> };
>
>
> testClass::testClass() {
>
> }
>
>
> testClass::~testClass() {
>
> }
>
>
> std::string testClass::testFunc() {
>
> std::string smi = "CCC";
>
> RDKit::ROMol mol = *(RDKit::SmilesToMol(smi));
>
> std::string res = RDKit::MolToSmiles(mol);
>
> return res;
>
> };
>
> then I get the following linker errors:
>
> error LNK2019: unresolved external symbol "private: virtual void
> __thiscall RDKit::ROMol::destroy(void)" (?destroy@ROMol@RDKit@@EAEXXZ)
> referenced in function "public: virtual __thiscall
> RDKit::ROMol::~ROMol(void)" (??1ROMol@RDKit@@UAE@XZ)
> failing.obj : error LNK2019: unresolved external symbol "private: void
> __thiscall RDKit::ROMol::initFromOther(class RDKit::ROMol const
> &,bool,int)" (?initFromOther@ROMol@RDKit@@AAEXABV12@_NH@Z) referenced in
> function "public: __thiscall RDKit::ROMol::ROMol(class RDKit::ROMol const
> &,bool,int)" (??0ROMol@RDKit@@QAE@ABV01@_NH@Z)
> C:\Users\IEUser\Documents\rdkitlink_windows_compile_
> issue\Working-ie11win7-3268-3284-12\RDKitLink.dll : fatal error LNK1120:
> 2 unresolved externals
>

[Rdkit-discuss] strange linker error using VC++

2018-01-10 Thread Jason Biggs
I'm trying to use the rdkit as a library in another project, and am having
trouble getting it to build on windows.  I can get the code to compile on
mac and linux, but it fails for windows, both 32-big and 64-bit varieties.
I don't know how specific this is to the rdkit, but I have zero experience
compiling with visual studio (and very little C++ coding background) and I
am very confused here.  The following is just a toy example showing the
minimum necessary for me to get the error.  I get the same error using the
full code.


If I create a test class using this code, I can compile it just fine in
windows:


#include 

#include 

#include 


class testClass{



testClass();



~testClass();





std::string testFunc() {

std::string smi = "CCC";

RDKit::ROMol mol = *(RDKit::SmilesToMol(smi));

std::string res = RDKit::MolToSmiles(mol);

return res;

};



};


testClass::testClass() {

}


testClass::~testClass() {

}



But if I move the definition of the testFunc() function outside of the
class declaration (which is the normal case, where the definitions are in
separate files), like this


#include 

#include 

#include 



class testClass{



testClass();



~testClass();



std::string testFunc();



};


testClass::testClass() {

}


testClass::~testClass() {

}


std::string testClass::testFunc() {

std::string smi = "CCC";

RDKit::ROMol mol = *(RDKit::SmilesToMol(smi));

std::string res = RDKit::MolToSmiles(mol);

return res;

};

then I get the following linker errors:

error LNK2019: unresolved external symbol "private: virtual void __thiscall
RDKit::ROMol::destroy(void)" (?destroy@ROMol@RDKit@@EAEXXZ) referenced in
function "public: virtual __thiscall RDKit::ROMol::~ROMol(void)"
(??1ROMol@RDKit@@UAE@XZ)
failing.obj : error LNK2019: unresolved external symbol "private: void
__thiscall RDKit::ROMol::initFromOther(class RDKit::ROMol const
&,bool,int)" (?initFromOther@ROMol@RDKit@@AAEXABV12@_NH@Z) referenced in
function "public: __thiscall RDKit::ROMol::ROMol(class RDKit::ROMol const
&,bool,int)" (??0ROMol@RDKit@@QAE@ABV01@_NH@Z)
C:\Users\IEUser\Documents\rdkitlink_windows_compile_issue\Working-ie11win7-3268-3284-12\RDKitLink.dll
: fatal error LNK1120: 2 unresolved externals


I can't understand why moving the definition of testFunc() causes this
error.

Any help would be most appreciated. Clearly I can use the workaround of
making all definitions for testClass in the header file, but I would rather
not do that.

Thank you,
Jason
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SanitizeMol changing drawing

2017-12-14 Thread Jason Biggs
Greg,
That really helps!  The CIP rank property is the key I was looking for.
Now I can put in a check for to see if the property is defined before
calling prepareMolForDrawing, and if it isn't then call assignAtomCIPRanks
first.

I was having issues where molecules created from a SMILES would not
round-trip through my internal molecule representation (in Mathematica) and
back perfectly, and this fixes most of those issues.

Thanks,

Jason

On Wed, Dec 13, 2017 at 11:26 PM, Greg Landrum <greg.land...@gmail.com>
wrote:

> Hi Jason,
>
> This is a nice one.
>
> Here's what's going on:
> The depiction code (the piece that generates 2D coordinates) attempts to
> generate "canonical" coordinates : it tries to generate the same
> coordinates for a molecule no matter what the input atom ordering is.
> In order to do that it needs a canonical numbering of the atoms (or at
> least something approximating one).
> The current code uses the calculated CIP ranks of the atoms as this
> canonical ordering. These ranks are generated as part of the standard
> stereochemistry assignment that is done on molecule construction and are
> stored as computed properties on the atoms. If the CIP ranks are not there
> it more or less gives up and just uses the atomic number.
> The call to SanitizeMol() clears the computed properties on atoms, thus
> blowing out the CIP rank information that the depiction code uses.
>
> If you want to resolve this, you can call 
> Chem.AssignStereochemistry(m2,cleanIt=True,
> force=True) after you sanitize the molecule. Note that this can be a
> computationally expensive call, so you may not want to make a habit out of
> it.
>
> I'll create an issue to explore updating the depiction code and replacing
> the use of CIP ranks with the atom ranking generated by Nadine's
> canonicalization code
>
> -greg
>
>
> On Wed, Dec 13, 2017 at 10:38 PM, Jason Biggs <jasondbi...@gmail.com>
> wrote:
>
>> using the recent release,
>>
>>
>> m = Chem.MolFromSmiles("N[C@@H](C)C(=O)O")
>> m2 = Chem.MolFromSmiles("N[C@@H](C)C(=O)O")
>> Chem.rdmolops.SanitizeMol(m2)
>>
>>
>>
>> The two molecules above seem identical - MolFromSmiles already performs a
>> sanitization so why wouldn't they be?  They produce the same pickle,
>>
>> pickle.dumps(m) == pickle.dumps(m2)
>>
>> True
>>
>>
>> So why do they get treated differently by the drawing code? The only way
>> to return m2 to its original state is to run AssignStereoChemistry with
>> force = True.  What variable is being thrown off by SanitizeMol?
>>
>> [image: Inline image 1]
>>
>> Jason Biggs
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] SanitizeMol changing drawing

2017-12-13 Thread Jason Biggs
using the recent release,


m = Chem.MolFromSmiles("N[C@@H](C)C(=O)O")
m2 = Chem.MolFromSmiles("N[C@@H](C)C(=O)O")
Chem.rdmolops.SanitizeMol(m2)



The two molecules above seem identical - MolFromSmiles already performs a
sanitization so why wouldn't they be?  They produce the same pickle,

pickle.dumps(m) == pickle.dumps(m2)

True


So why do they get treated differently by the drawing code? The only way to
return m2 to its original state is to run AssignStereoChemistry with force
= True.  What variable is being thrown off by SanitizeMol?

[image: Inline image 1]

Jason Biggs
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDkit and Pubchem

2017-12-01 Thread Jason Biggs
Sundar,
What you do will depend on whether you have an SID or a CID number.  Read
https://pubchemblog.ncbi.nlm.nih.gov/2014/06/19/what-is-the-difference-between-a-substance-and-a-compound-in-pubchem/
for more info.

In PubChem terminology, a *substance* is a chemical sample description
> provided by a single source and a *compound* is a normalized chemical
> structure representation found in one or more contributed *substances*.


And looking at the pages for a few random substances, it doesn't list the
same kind of information that you'll find on a compound page.  So what you
need is to get a list of associated compounds for a given substance ID.

https://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/sid/123061/cids/JSON?cids_type=all

Leave off the cids_type=all if you only want one compound.  For the SID in
your query, it doesn't even have a compound, so it returns a message
stating so.

Jason

Jason Biggs


On Fri, Dec 1, 2017 at 5:33 PM, Sundar <jubilantsun...@gmail.com> wrote:

> Hi Jason,
>
> This is great. I would really benefit from this.
> At present I am looking for a way to download smiles or mol data of a few
> compound which only have SIDs and CIDs.
> Can we do it? I failed after trying the following,
>
> https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/sid/144205334/property/
> CanonicalSMILES,IsomericSMILES,InChI/JSON
>
> Thanks,
>
>
> On Fri, Dec 1, 2017 at 1:11 PM, Jason Biggs <jasondbi...@gmail.com> wrote:
>
>> Pubchem has an easy to use rest API, described here:
>> https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest
>>
>> If you have a compound ID, you can query properties via something
>>
>> https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/
>> property/CanonicalSMILES,IsomericSMILES,InChI/JSON
>>
>>
>> It comes back in JSON format, but you can have it return XML or plain
>> text.
>>
>> If you want an SDF file, something like
>>
>> https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/
>> SDF?record_type=3d
>>
>> setting up a python function to query this shouldn't be difficult.
>>
>> Jason Biggs
>>
>>
>> On Fri, Dec 1, 2017 at 12:51 PM, Sundar <jubilantsun...@gmail.com> wrote:
>>
>>> I would like to download at least SMILES (great if I can also download
>>> mol files).
>>> And the same is true for Pubchem Compound ID or using Substance ID.
>>> Or even download the whole data set using an assay id. Anything could
>>> help.
>>>
>>> Thanks,
>>> Jubi
>>>
>>> On Fri, Dec 1, 2017 at 11:55 AM, Tim Dudgeon <tdudgeon...@gmail.com>
>>> wrote:
>>>
>>>> In what way? Given a single PubChem compound or substance ID you just
>>>> want to pull the smiles or molfile into RDKit?
>>>>
>>>> Tim
>>>> On 01/12/17 17:26, Sundar wrote:
>>>>
>>>> Hi RDkit users,
>>>>
>>>> I was wondering if RDkit has a means of downloading compounds from
>>>> Pubchem.
>>>> Also let me other ways that helps here.
>>>>
>>>> Thanks,
>>>> Jubi
>>>>
>>>>
>>>> --
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>>
>>>>
>>>>
>>>> ___
>>>> Rdkit-discuss mailing 
>>>> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>>>
>>>>
>>>> 
>>>> --
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>> ___
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>>>
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDkit and Pubchem

2017-12-01 Thread Jason Biggs
Pubchem has an easy to use rest API, described here:
https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest

If you have a compound ID, you can query properties via something

https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/property/CanonicalSMILES,IsomericSMILES,InChI/JSON


It comes back in JSON format, but you can have it return XML or plain text.

If you want an SDF file, something like

https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/SDF?record_type=3d


setting up a python function to query this shouldn't be difficult.

Jason Biggs


On Fri, Dec 1, 2017 at 12:51 PM, Sundar <jubilantsun...@gmail.com> wrote:

> I would like to download at least SMILES (great if I can also download mol
> files).
> And the same is true for Pubchem Compound ID or using Substance ID.
> Or even download the whole data set using an assay id. Anything could help.
>
> Thanks,
> Jubi
>
> On Fri, Dec 1, 2017 at 11:55 AM, Tim Dudgeon <tdudgeon...@gmail.com>
> wrote:
>
>> In what way? Given a single PubChem compound or substance ID you just
>> want to pull the smiles or molfile into RDKit?
>>
>> Tim
>> On 01/12/17 17:26, Sundar wrote:
>>
>> Hi RDkit users,
>>
>> I was wondering if RDkit has a means of downloading compounds from
>> Pubchem.
>> Also let me other ways that helps here.
>>
>> Thanks,
>> Jubi
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>
>>
>>
>> ___
>> Rdkit-discuss mailing 
>> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] bad coordinates for acetylenic hydrogens

2017-12-01 Thread Jason Biggs
Thank you Malitha,

If I'm understanding the Draw code (not a given, my python ist nicht gut),
then MolToMPL (or MolToImage) is just using the 3D conformation generated
by EmbedMolecule.  Is it just chopping off the z-coordinate?  For
acetylene, this works out because it gets embedded mostly in the XY plane
by default, but it would fail if the 3D conformer were aligned along the
z-axis.

I would think GenerateDepictionMatching3DStructure is the safe way to do
this, but it also seems to have trouble with acetylene

m = Chem.MolFromSmiles('C#C')
m2=Chem.rdmolops.AddHs(m)
AllChem.EmbedMolecule(m2, AllChem.ETKDG())
m3=Chem.rdmolops.AddHs(m)
Chem.rdDepictor.GenerateDepictionMatching3DStructure(m3,m2)
m3.GetConformer(0).GetPositions()


array([[  1.60734892e-16,   7.5000e-01,   0.e+00],
   [ -1.14810637e-16,  -7.5000e-01,   0.e+00],
   [ -4.82204677e-16,  -7.5000e-01,   0.e+00],
   [  4.36280422e-16,   7.5000e-01,   0.e+00]])


The above seems to happen with any terminal alkyne hydrogen.  I will file
an issue for this

Jason

Jason Biggs


On Fri, Dec 1, 2017 at 4:59 AM, Malitha Kabir <malitha12...@gmail.com>
wrote:

> Hi Jason,
>
> I hope the following codes will help you a little.
>
> from rdkit import Chem
> from rdkit.Chem import Draw
> from rdkit.Chem import AllChem
> size = (120, 120)
> m = Chem.MolFromSmiles('C#C')
> m2=Chem.rdmolops.AddHs(m)
> AllChem.EmbedMolecule(m2, AllChem.ETKDG())
> Draw.MolToMPL(m2, size=size)
>
> *** source code link (non specific to your question though)
> https://github.com/rdkit/rdkit
>
> Thanks. - malitha
>
>
> On Fri, Dec 1, 2017 at 6:42 AM, Jason Biggs <jasondbi...@gmail.com> wrote:
>
>>
>> m = Chem.MolFromSmiles('C#C')
>>
>>
>> renders fine
>>
>> [image: Inline image 1]
>>
>> but adding in the hydrogens, they don't snap to a linear arrangement
>>
>> m2=Chem.rdmolops.AddHs(m)
>>
>>
>> [image: Inline image 3]
>>
>>
>> This doesn't just affect acetylene, but any terminal alkyne, like
>> 'c1ccc(CCC#C)cc1'.
>>
>> I can write a hack on my end to look for this special case, but where in
>> the drawing code does it decide where to place hydrogens?
>>
>> Jason Biggs
>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] bad coordinates for acetylenic hydrogens

2017-11-30 Thread Jason Biggs
m = Chem.MolFromSmiles('C#C')


renders fine

[image: Inline image 1]

but adding in the hydrogens, they don't snap to a linear arrangement

m2=Chem.rdmolops.AddHs(m)


[image: Inline image 3]


This doesn't just affect acetylene, but any terminal alkyne, like
'c1ccc(CCC#C)cc1'.

I can write a hack on my end to look for this special case, but where in
the drawing code does it decide where to place hydrogens?

Jason Biggs
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] transformation of an atom or group of atoms by atom indices

2017-11-09 Thread Jason Biggs
Something along these lines?

mol = Chem.MolFromSmiles('')emol =
Chem.EditableMol(mol)emol.ReplaceAtom(3, Chem.Atom(1))mol =
emol.GetMol()

In [12]:

Chem.MolToSmiles(mol)

Out[12]:

'[H]CCC'

In [13]:

Chem.MolToSmiles(Chem.RemoveHs(mol))

Out[13]:

'CCC'




Jason Biggs


On Thu, Nov 9, 2017 at 3:12 PM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> RDKit Discussion Group,
>
> Suppose I have a molecule
>
> smiles1 = ''
>
> The carbon atoms will be assigned indices 0, 1, 2, and 3.
>
> Suppose I want to specifically change carbon 3 to a hydrogen.
> Is this possible using RDkit?
>
> I am aware of using SMARTS to match a pattern and then change
> that group of atoms.  In my example, I would like to be able to
> change an atom or group of atoms based on the atom indices, not
> a SMARTS pattern which might be problematic for molecules with
> local symmetry.
>
> Regards,
> Jim Metz
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS for Joback and Reid method

2017-11-08 Thread Jason Biggs
Chenyang,
I haven't looked at your smarts strings yet, but I do have this list of
SMARTS strings for the joback method I compiled myself (for use here:
https://www.wolframalpha.com/input/?i=2,3-methano-5,6-dichloroindene=3
).

Perhaps this can be of use.  If you spot any mistakes, please let me know

Jason

$JobackSubstructures={

{"Methyl","-CH3", "[CX4H3]"},

{"SecondaryAcyclic", "-CH2-", "[!R;CX4H2]"},

{"TertiaryAcyclic",">CH-", "[!R;CX4H]"},

{"QuaternaryAcyclic", ">C<", "[!R;CX4H0]"},

{"PrimaryAlkene", "=CH2", "[CX3H2]"},

{"SecondaryAlkeneAcyclic", "=CH-", "[!R;CX3H1;!$([CX3H1](=O))]"},

{"TertiaryAlkeneAcyclic", "=C<", "[$([!R;#6X3H0]);!$([!R;#6X3H0]=[#8])]"},

{"CumulativeAlkene", "=C=", "[$([CX2H0](=*)=*)]"},

{"TerminalAlkyne", "\[Congruent]CH","[$([CX2H1]#[!#7])]"},

{"InternalAlkyne","\[Congruent]C-","[$([CX2H0]#[!#7])]"},

{"SecondaryCyclic", "-CH2- (ring)", "[R;CX4H2]"},

{"TertiaryCyclic", ">CH- (ring)", "[R;CX4H]"},

{"QuaternaryCyclic", ">C< (ring)", "[R;CX4H0]"},

{"SecondaryAlkeneCyclic", "=CH- (ring)", "[R;CX3H1,cX3H1]"},

{"TertiaryAlkeneCyclic", "=C<
(ring)","[$([R;#6X3H0]);!$([R;#6X3H0]=[#8])]"},

{"Fluoro", "-F", "[F]"},

{"Chloro", "-Cl", "[Cl]"},

{"Bromo", "-Br", "[Br]"},

{"Iodo", "-I", "[I]"},

{"Alcohol","-OH", "[OX2H;!$([OX2H]-[#6]=[O]);!$([OX2H]-a)]"},(* alcohol -
not matching a carboxylic acid *)

{"Phenol","-OH", "[$([OX2H]-a)]"},

{"EtherAcyclic", "-O-", "[OX2H0;!R;!$([OX2H0]-[#6]=[#8])]"},

{"EtherCyclic", "-O- (ring)", "[#8X2H0;R;!$([#8X2H0]~[#6]=[#8])]"},

{"CarbonylAcyclic", ">C=O",
"[$([CX3H0](=[OX1]));!$([CX3](=[OX1])-[OX2]);!R]=O"},

{"CarbonylCyclic", ">C=O
(ring)","[$([#6X3H0](=[OX1]));!$([#6X3](=[#8X1])~[#8X2]);R]=O"},

{"Aldehyde","O=CH-","[CX3H1](=O)"},

{"CarboxylicAcid", "COOH", "[OX2H]-[C]=O"},

{"Ester", "-C(=O)O-", "[#6X3H0;!$([#6X3H0](~O)(~O)(~O))](=[#8X1])[#8X2H0]"},

{"OxygenDoubleBondOther", "=O",
"[OX1H0;!$([OX1H0]~[#6X3]);!$([OX1H0]~[#7X3]~[#8])]"},

{"PrimaryAmino","NH2", "[NX3H2]"},

{"SecondaryAminoAcyclic",">NH", "[NX3H1;!R]"},

{"SecondaryAminoCyclic",">NH (ring)", "[#7X3H1;R]"},

{"TertiaryAmino", ">N-","[#7X3H0;!$([#7](~O)~O)]"}, (* Tertiary amine
except nitro group *)

{"ImineCyclic","=N- (ring)","[#7X2H0;R]"},

{"ImineAcyclic","=N-","[#7X2H0;!R]"},

{"Aldimine", "=NH", "[#7X2H1]"},

{"Cyano", "-C\[Congruent]N","[#6X2]#[#7X1H0]"},

{"Nitro", "NO2", "[$([#7X3,#7X3+][!#8])](=[O])~[O-]"},

{"Thiol", "-SH", "[SX2H]"},

{"ThioetherAcyclic", "-S-", "[#16X2H0;!R]"},

{"ThioetherCyclic", "-S- (ring)", "[#16X2H0;R]"}

};

Jason Biggs


On Wed, Nov 8, 2017 at 4:52 PM, Chenyang Shi <cs3...@columbia.edu> wrote:

> Hi everyone,
>
> I have been recently working on a project that implements Joback method
> using RDKit (https://en.wikipedia.org/wiki/Joback_method).
>
> I believe the core to the success of this project is to make the 41
> functional groups correctly represented by SMARTS code. I have compiled my
> own codes, see attachment. I would appreciate your review of it and let me
> know if you spot errors.
>
> I think building a robust/well-tested SMARTS database (though small in my
> case) would be helpful to others and other projects.
>
> Thank you,
> Chenyang
>
> PS: The ones highlighted red in the document are robust.
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] what if all explicit hydrogens were actual atoms

2017-10-12 Thread Jason Biggs
I'm creating a public-facing data structure that uses the rdkit as the back
end.  I don't want to expose three different levels of existence for a
hydrogen - I want them to be actual Atoms or be implied by valence.

What are the consequences of always converting anything the rdkit would
return via GetNumExplicitHs into an actual atom with the addHs function.
Are there families of functions in the rdkit that just do not work if any
hydrogen is instantiated?

What other strategy might I use to reduce the number of hydrogen types to
2?

I found this interesting discussion on the matter,
https://sourceforge.net/p/rdkit/mailman/message/30200937/, but it looks
like the branch mentioned there was abandoned.


Jason Biggs
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] bad inchi or parsing problem?

2017-09-14 Thread Jason Biggs
Thanks to Curt, Markus, and John for helping me understand this.  I knew
that inchi had its limitations, but that didn't jump out at me here because
there's no hydrogen migration between the different forms - not realizing
these forms also qualify as tautomers.  But So this is definitely a feature
(or limitation) of inchi.


> No, my "good old" cactus service doesn't do a lookup in this case, it is
> read from the string, which is of of course in opposition to what I just
> said :-). We did quite a bit regarding normalization, first, the CACTVS
> toolkit behind the service is quite good in this regard and I added a few
> things for the web service, too.
>
>
I may look into adding in a step after getting a sanitization error, but
before accepting the unsanitized structure, to see if CACTVS can give a
better SMILES string.

I want to avoid returning to the user 2D structures like the first image in
the thread, when you can point to another structure that equally matches
the input.
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] bad inchi or parsing problem?

2017-09-14 Thread Jason Biggs
Okay, all three of these smiles strings resolve to the same inchi,

"O=[N+](C1=NC2=CC=CC=C2N=C1)[N-](=O)C1=NC2=CC=CC=C2N=C1"
"C1=CC=C2C(=C1)N=CC(=N2)N(=N(=O)C3=NC4=CC=CC=C4N=C3)=O"
"[O-][N+](c1cnc2c2n1)=[N+]([O-])c3cnc4c4n3"

even though to me they seem like different structures due to the specified
charges.  Is this a limitation of inchi, or do I need to rethink my ideas
of what makes two chemical structures the same?





Jason Biggs


On Thu, Sep 14, 2017 at 12:38 PM, John Mayfield <john.wilkinson...@gmail.com
> wrote:

> InChI is an identifier and not a representation, you should not read
> InChIs... but we are beyond hope there so...
>
> The InChI string is correct and is the same if you roundtrip your
> preferred one with charge separated bonds and the 5 valent one.
>
> All toolkits will use the InChI library to read/write InChIs and it
> generates the representation with 5v nitrogens, cactus is either applying
> normalisation after reading or in this case (since it's the name resolved)
> doing a identifier lookup from an original SMILES used to generate this
> InChI:
>
> echo 'InChI=1S/C16H10N6O2/c23-21(15-9-17-11-5-1-3-7-13(11)19-
>> 15)22(24)16-10-18-12-6-2-4-8-14(12)20-16/h1-10H' | inchi -STDIO
>> -inChi2Struct -OutputSDF | obabel -imol -osmi
>
> c1ccc2c(c1)ncc(n2)N(=N(=O)c1cnc2c2n1)=O Structure #1
>
> SDF also attached without going though Open Babel.
>
> - John
>
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread Jason Biggs
But keep in mind that the kekulized mols you create with the resonance
supplier will not match the SMARTS patterns given.

Chem.MolToSmiles(mol2, kekuleSmiles = True)

>'C1C=CC=CC=1'


mol2.HasSubstructMatch(Chem.MolFromSmarts('[C]=[C]-[C]'))

> False

mol2.HasSubstructMatch(Chem.MolFromSmarts('[c]=[c]-[c]'))

> True

So at the very least, you need to change the smarts strings to use [#6]
instead of [C]



Jason Biggs


On Mon, Sep 11, 2017 at 2:53 PM, Paolo Tosco <paolo.to...@unito.it> wrote:

> Hi Jim,
>
> you can indeed enumerate all Kekulè structures for a molecule within the
> RDKit using Chem.ResonanceMolSupplier():
>
> from rdkit import Chem
>
> mol = Chem.MolFromSmiles('c1c1')
>
> suppl = Chem.ResonanceMolSupplier(mol, Chem.KEKULE_ALL)
>
> len(suppl)
>
> 2
>
> for i in range(len(suppl)):
> print (Chem.MolToSmiles(suppl[i], kekuleSmiles=True))
>
> C1C=CC=CC=1
> C1=CC=CC=C1
>
>   Best,
> Paolo
>
>
> On 09/11/2017 05:22 PM, James T. Metz via Rdkit-discuss wrote:
>
> Greg,
>
> Thanks!  Yes, very helpful.  I will need to digest the detailed
> information
> you have provided.  I am somewhat familiar with recursive SMARTS.  Thanks
> again.
>
> Regards,
> Jim Metz
>
>
>
>
> -Original Message-
> From: Greg Landrum <greg.land...@gmail.com> <greg.land...@gmail.com>
> To: James T. Metz <jamestm...@aol.com> <jamestm...@aol.com>
> Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
> <rdkit-discuss@lists.sourceforge.net>
> Sent: Mon, Sep 11, 2017 11:15 am
> Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures
>
>
> On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz < <jamestm...@aol.com>
> jamestm...@aol.com> wrote:
>
> Greg,
>
> I need to be able to use SMARTS patterns to identify substructures in
> molecules
> that can be aromatic, and I need to be able to handle cases where there
> can be
> differences in the way that the molecule was entered or drawn by a user.
>
>
> That particular problem is a big part of the reason that we tend to use
> the aromatic representation of things.
>
>
> For example, consider the following alkenyl-substituted pyridine, there
> are two possible Kekule structures
>
> m1 = 'C=CC1=NC=CC=C1'
> m2 = 'C=CC1N=CC=CC1'
>
>
> Fixing what I assume is a typo for m2, I can do the following:
>
> In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1')
>
> In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1')
>
> In [13]: q1 = Chem.MolFromSmarts('')
>
> In [14]: q2 = Chem.MolFromSmarts('cccn')
>
> In [15]: list(m1.GetSubstructMatch(q1))
> Out[15]: [2, 7, 6, 5]
>
> In [16]: list(m1.GetSubstructMatch(q2))
> Out[16]: [6, 5, 4, 3]
>
> In [17]: list(m2.GetSubstructMatch(q1))
> Out[17]: [2, 7, 6, 5]
>
> In [18]: list(m2.GetSubstructMatch(q2))
> Out[18]: [6, 5, 4, 3]
>
>
> Those particular queries were going for the aromatic species and will only
> match inside the ring, but if you want to be more generic you could tune
> your queries like this:
>
> In [28]: q3 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])
> ]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]')
>
> In [29]: q4 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])
> ]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]')
>
> In [30]: list(m1.GetSubstructMatch(q3))
> Out[30]: [0, 1, 2, 7]
>
> In [31]: list(m1.GetSubstructMatch(q4))
> Out[31]: [0, 1, 2, 3]
>
> In [32]: list(m2.GetSubstructMatch(q3))
> Out[32]: [0, 1, 2, 7]
>
> In [33]: list(m2.GetSubstructMatch(q4))
> Out[33]: [0, 1, 2, 3]
>
> If you aren't familiar with recursive SMARTS, this construct:
> "[#6;$([#6]=,:[*])]" means "a carbon that has either a double bond or an
> aromatic bond to another atom".  So you can interpret q3 as "four carbons
> that each have either a double or aromatic bond and that are connected to
> each other by single, double, or aromatic bonds".
>
> Is this starting to approximate what you're looking for?
> -greg
>
>
>
>
> Now consider two SMARTS
>
> pattern1 = '[C]=[C]-[C]={C]
> pattern2 = '[C]=[C]-[C]=[N]'
>
> I need to be able to detect the existence of each pattern in the
> molecule
>
> If m1 is the only available generated Kekule structure, then pattern2
> will be recognized.
> If m2 is the only available generated Kekule  structure, then pattern1
> will be recognized.
>
> Hence, I am getting different answers for the same input molecule just
> because
> it was drawn in different Kekule structures.
>
> Regards,
&g

Re: [Rdkit-discuss] SMARTS pattern matching of canonical forms of aromatic molecules

2017-09-08 Thread Jason Biggs
Start with your benzene molecule

m = Chem.MolFromSmiles('c1c1')


make a pattern using Peter's example, with three aromatic atoms connected
by three aromatic bonds

patt = Chem.MolFromSmarts('a:a:a')


and it's a match:

m.HasSubstructMatch(patt)

>True


Kekulize your mol, and the pattern doesn't match

Chem.rdmolops.Kekulize(m)
m.HasSubstructMatch(patt)
>False


but if you change the smarts pattern to match aromatic atoms connected by
kekulized bonds, it matches

patt2 = Chem.MolFromSmarts('[a]=[a]-[a]')
m.HasSubstructMatch(patt2)
>True

Your original SMARTS query doesn't match, because C in a smarts string is
specifically an aliphatic carbon.  Change it to c and it will match.  It
would work, if you had removed the aromatic flags when kekulizing


m = Chem.MolFromSmiles('c1c1')
Chem.rdmolops.Kekulize(m, clearAromaticFlags = True)
patt = Chem.MolFromSmarts('[C]=[C]-[C]')
m.HasSubstructMatch(patt)
>True



So when you kekulize, without using the clearAromaticFlags option, then
aromatic atoms will still only match 'a', not 'A', but the bonds will only
match '=' or '-', but not ':'  (they will also match '@' or '~', but that's
beside the point here)

As Peter mentions, by default if you read in a kekulized SMILES string, the
mol you create will not be kekulized, but it sounds like you are
intentionally kekulizing before doing substructure matching.



Jason Biggs


On Fri, Sep 8, 2017 at 5:19 PM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hello,
>
> Suppose I read in the SMILES of an aromatic molecule e.g., for
> benzene
>
> c1c1
>
> I then want to convert the molecule to a Kekule representation and
> then perform various SMARTS pattern recognition e.g.
>
> [C]=[C]-[C]
>
> I have tried various Kekule commands in RDkit, but I can not figure
> out how to (or if it is possible) to recognize a SMARTS pattern for
> a portion of a molecule which is aromatic, but is currently being
> stored as a Kekule structure.
>
> Also, is it possible to generate and store more than one Kekule
> form in RDkit?
>
> Thank you.
>
> Regards,
> Jim Metz
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] ETKDG conformation generation algorithm and fullerene-like structures.

2017-09-07 Thread Jason Biggs
I've never had success using the ETKDG or KDG methods for fullerenes, when
trying on C60 it goes for a long time and returns -1.  The ETDG method
works on C60, but fails on your C60H60.

One thing you could try is to embed the hydrogen-suppressed structure, then
add the hydrogens

RDKit::DGeomHelpers::EmbedParameters params(RDKit::DGeomHelpers::ETDG);

RDKit::DGeomHelpers::EmbedMolecule(*mol, params);

bool explicitOnly = false;

bool addCoords = true;

RDKit::MolOps::addHs(*mol, explicitOnly, addCoords);

seems to work.



Jason Biggs


On Thu, Sep 7, 2017 at 10:49 AM, Dmitry Redkin <red...@acdlabs.ru> wrote:

> Hello all!
> I've just started to use RDKit, and now I'm trying to generate some 3D
> conformation for a molecule. ETKDG successfully optimized cyclohexane, so
> I've tried some more complex example.
> It was this fullerene-like structure (with all the single bonds and every C
> atom having H atom attached). I'm attaching it to this email.
>
> But whatever I've tried to do with embedding parameters, RDKit whether
> stalls for several minutes trying to complete operation or just exits with
> all zero coordinates.
>
> Is there any way to generate conformations for this structure? Maybe I did
> something wrong or there is some flag that can be set to get some result
> (any result, not necessarily the best one) in a reasonable time?
>
> My code is pretty simple, you can see it below.
>
>
> RWMol *mol = MolFileToMol("d:\\temp\\exe32\\full.mol", true, false,
> false);
>
> MolOps::addHs(*mol);
> DGeomHelpers::EmbedParameters p(DGeomHelpers::ETKDG);
> p.maxIterations = 100; // if I left it -1, I could not wait long enough for
> EmbedMolecule to exit.
> p.useRandomCoords = true;
> int confid = DGeomHelpers::EmbedMolecule(*((ROMol*)mol), p);
> MolToMolFile(*((ROMol*)mol), "d:\\temp\\exe32\\full1.mol", true, confid);
> free(mol);
>
>
> 
> Dmitry Redkin, ACD Inc.
> red...@acdlabs.ru
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Extracting data of a standard structural formula

2017-08-13 Thread Jason Biggs
Peleg,
I was doing something similar using the c++ function compute2DCoords, where
you can give a coordinate map as the second argument.  It looks like this
is exposed in python, using the Compute2DCoords method (
http://www.rdkit.org/Python_Docs/rdkit.Chem.rdchem.Mol-class.html#Compute2DCoords),
so you could do something like this

int generate2DCoordinatesPlacintHeavyAtomsFirst( RDKit::RWMol 
*thisMolecule,
bool placeHeavyAtomsFirst,
bool canonicalize,
int randomseed)
{


int hydrogenFreeConformerIndex = 
RDDepict::compute2DCoords(*thisMolecule, 0,
canonicalize);

//now make a map of the coordinates which were optimized
//sans hydrogens

RDGeom::INT_POINT2D_MAP coordMap;
if(placeHeavyAtomsFirst)
{
int numAtoms = thisMolecule->getNumAtoms();
RDKit::Conformer hydrogenFreeConformer =
thisMolecule->getConformer(hydrogenFreeConformerIndex);
for (int i=0; i < numAtoms; i++)
{
RDGeom::Point3D pt = 
hydrogenFreeConformer.getAtomPos(i);
RDGeom::Point2D pt2;
pt2.x = pt.x;
pt2.y = pt.y;
coordMap[i] = pt2;
}
}

RDKit::MolOps::addHs(*thisMolecule, false, false);

int confID = RDDepict::compute2DCoords(
*thisMolecule,
placeHeavyAtomsFirst ?  : 0, true);
const RDKit::Conformer  = 
thisMolecule->getConformer(confID);
RDKit::WedgeMolBonds(*thisMolecule, );
return confID;

}


Now when I feed the result to my plotting program, I can give the option of
showing hydrogens or not, getting these two diagrams

Hope this helps,

Jason


On Sun, Aug 13, 2017 at 6:45 AM, Peleg Bar-Sapir  wrote:

> Greg,
>
> sorry for flooding you with replies, I think I understand where they issue
> stems from: the added hydrogens.
> Without the hydrogens the positions look fine (see C9_acid_out.png).
> Is there a way to add hydrogens so it will form a more "classic"
> representation? (e.g. C9_acid_expect.png)
>
> Best,
> Peleg
>
> On Sun, Aug 13, 2017 at 1:36 PM, Peleg Bar-Sapir  wrote:
>
>> To further clarify what I mean: I created a nonanoic acid representation
>> (via the smiles code "C(=O)O").
>> "Pelarginic_acid.svg.png" is what I expect to get in terms of
>> coordinates, while "test_out.png" are the coordinates I actually get.
>>
>> On Sun, Aug 13, 2017 at 1:30 PM, Peleg Bar-Sapir 
>> wrote:
>>
>>> Hi Greg,
>>>
>>> Thank you for your reply!
>>> I tested the method you suggested, but it seems like
>>> rdMolDraw2D.PrepareMolForDrawing() has no affect.
>>> e.g, I have a molecule m, to which I add hydrogens. I then compute the
>>> 2D coordinates of m, and use rdMolDraw2D.PrepareMolForDrawing() as you
>>> suggested.
>>> To test it, I compare the outputs of Chem.MolToMolBlock() on the
>>> molecule and on the prepared drawing.
>>> The results are the same, and it is defently not what you would expect
>>> from a standard representation (i.e. the ketone oxygen is not vertically up
>>> from its carbon, the chain is not oriented right, etc.).
>>>
>>> See the following explicit example:
>>>
>>> In[5]: m = Chem.MolFromSmiles('CCC(O)=O')
>>>
>>> In[6]: m = Chem.AddHs(m)
>>>
>>> In[7]: AllChem.Compute2DCoords(m)
>>>
>>> In[8]: m_draw = Chem.Draw.rdMolDraw2D.PrepareMolForDrawing(m)
>>>
>>> In[9]: print (Chem.MolToMolBlock(m))
>>>
>>>  RDKit  2D
>>>
>>>  11 10  0  0  0  0  0  0  0  0999 V2000
>>> 1.3490   -0.43400. C   0  0  0  0  0  0  0  0  0  0  0  0
>>> 0.34210.67780. C   0  0  0  0  0  0  0  0  0  0  0  0
>>>-1.12420.36170. C   0  0  0  0  0  0  0  0  0  0  0  0
>>>-1.5836   -1.06620. O   0  0  0  0  0  0  0  0  0  0  0  0
>>>-2.13111.47350. O   0  0  0  0  0  0  0  0  0  0  0  0
>>> 2.8153   -0.11800. H   0  0  0  0  0  0  0  0  0  0  0  0
>>> 1.9149   -1.82320. H   0  0  0  0  0  0  0  0  0  0  0  0
>>> 0.0792   -1.23260. H   0  0  0  0  0  0  0  0  0  0  0  0
>>> 1.61191.47630. H   0  0  0  0  0  0  0  0  0  0  0  0
>>>-0.22382.06690. H   0  0  0  0  0  0  0  0  0  0  0  0
>>>-3.0499   -1.38230. H   0  0  0  0  0  0  0  0  0  0  0  0
>>>   1  2  1  0
>>>   2  3  1  0
>>>   3  4  1  0
>>>   3  5  2  0
>>>   1  6  1  0
>>>   1  7  1  0
>>>   1  8  1  0
>>>   2  9  1  0
>>>   2 10  1  0
>>>   4 11  1  0
>>> M  END
>>>
>>> In[10]: print (Chem.MolToMolBlock(m_draw))
>>>
>>>  

[Rdkit-discuss] How to assign stereochemistry from CIP code?

2017-07-27 Thread Jason Biggs
When creating a molecule, I can set a stereocenter by setting the chiralTag
to be clockwise or counterclockwise, and get back the absolute chirality

int atoms [5] = {7, 6, 6, 8, 9};
RDKit::RWMol *mol = new RDKit::RWMol();
for (int i : atoms) {
RDKit::Atom atom(i);
mol->addAtom();
}
mol->addBond(0, 1, RDKit::Bond::SINGLE);
mol->addBond(1, 2, RDKit::Bond::SINGLE);
mol->addBond(1,3, RDKit::Bond::SINGLE);
mol->addBond(1, 4, RDKit::Bond::SINGLE);

std::cout << RDKit::MolToSmiles(*mol, true) << std::endl;


mol->getAtomWithIdx(1)->setChiralTag(RDKit::Atom::CHI_TETRAHEDRAL_CW);
RDKit::MolOps::sanitizeMol(*mol);
RDKit::MolOps::assignStereochemistry(*mol);
std::cout << RDKit::MolToSmiles(*mol, true) << std::endl;
std::string cipCode;
mol->getAtomWithIdx(1)->getProp(RDKit::common_properties::_CIPCode,
cipCode);
std::cout << cipCode << std::endl;

returns

CC(N)(O)F
C[C@](N)(O)F
S


How could I do the reverse?  Given the absolute chirality in terms of the
CIP code, how can I assign the chirality such that it propagates to the
smiles string?  If I replace the line with "setChiralTag" with

mol->getAtomWithIdx(1)->setProp(RDKit::common_properties::_ChiralityPossible,
1);
mol->getAtomWithIdx(1)->setProp(RDKit::common_properties::_CIPCode, "S");

then I see no chirality in the returned smiles string.


Jason
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss