Re: [Rdkit-discuss] Sanitizing SD file

2017-12-13 Thread Greg Landrum
On Thu, Dec 14, 2017 at 6:35 AM, Francois BERENGER <
beren...@bioreg.kyushu-u.ac.jp> wrote:

> On 12/14/2017 02:10 PM, Greg Landrum wrote:
> >
> > On Thu, Dec 14, 2017 at 4:22 AM, Francois BERENGER
> > >
> > wrote:
> >
> > On 12/14/2017 05:15 AM, Sundar wrote:
> > > Hi RDkit users,
> > >
> > > I encounter following sanitize issue while I was trying to load an
> SD
> > > file using
> > > Chem.SDMolSupplier('lig.sdf')
> > >
> > > Explicit valence for atom # 16 N, 4, is greater than permitted
> > > ERROR: Could not sanitize molecule ending on line 3145
> >
> > I also encounter this exact error sometimes.
> >
> > Is there a way to tell rdkit to automatically correct this atom type?
> >
> >
> > The code currently only automatically corrects cases where it's really,
> > really obvious what the correction should be, like C-N(=O)=O ->
> > C-[N+](=O)[O-].
>
> Where is this in the code?
> I might have a look one day.
>

It's here:
https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/MolOps.cpp#L194

>
> > The philosophy taken in the RDKit is that it's better to have a bad
> > structure be rejected than it is to try and learn from it.
> > If you disagree with this, it is pretty easy to switch off the
> > sanitization checks and keep the bad molecules.
>
> I understand. I also guess unsanitized molecules would make some things
> crash, just later.


That depends. You can turn off the strict property checking:

In [*2*]: m = Chem.MolFromSmiles('C1CCN1(C)C')

[08:09:23] Explicit valence for atom # 3 N, 4, is greater than permitted


In [*3*]: m = Chem.MolFromSmiles('C1CCN1(C)C',sanitize=*False*)


In [*6*]: m.UpdatePropertyCache(strict=*False*)


In [*7*]:
Chem.SanitizeMol(m,sanitizeOps=Chem.SANITIZE_ALL^Chem.SANITIZE_PROPERTIES)

Out[*7*]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE


In [*8*]: Chem.MolToSmiles(m)

Out[*8*]: 'CN1(C)CCC1'


or if you want to be more aggressive you can also turn off the cleanup that
"fixes" those odd structures:

In [*9*]: m = Chem.MolFromSmiles('CCCN(=O)=O',sanitize=*False*)


In [*10*]: m.UpdatePropertyCache(strict=*False*)


In [*11*]:
Chem.SanitizeMol(m,sanitizeOps=Chem.SANITIZE_ALL^Chem.SANITIZE_PROPERTIES^Chem.SANITIZE_CLEANUP)

Out[*11*]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE


In [*12*]: Chem.MolToSmiles(m)
Out[*12*]: 'CCCN(=O)=O'

In either case, many standard molecular operations should still work,
you'll just be operating on molecules with atoms in unreasonable valence
states.

-greg
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Sanitizing SD file

2017-12-13 Thread Francois BERENGER
On 12/14/2017 02:10 PM, Greg Landrum wrote:
> 
> On Thu, Dec 14, 2017 at 4:22 AM, Francois BERENGER
> >
> wrote:
> 
> On 12/14/2017 05:15 AM, Sundar wrote:
> > Hi RDkit users,
> >
> > I encounter following sanitize issue while I was trying to load an SD
> > file using
> > Chem.SDMolSupplier('lig.sdf')
> >
> > Explicit valence for atom # 16 N, 4, is greater than permitted
> > ERROR: Could not sanitize molecule ending on line 3145
> 
> I also encounter this exact error sometimes.
> 
> Is there a way to tell rdkit to automatically correct this atom type?
> 
> 
> The code currently only automatically corrects cases where it's really,
> really obvious what the correction should be, like C-N(=O)=O ->
> C-[N+](=O)[O-].

Where is this in the code?
I might have a look one day.

> If there are additional commonly "misdrawn" functional groups, it would
> be straightforward to add them
>
> I guess that sanitization failure means the molecule
> goes to the trash, which is terrible when there are so few molecules to
> learn from.
> 
> The philosophy taken in the RDKit is that it's better to have a bad
> structure be rejected than it is to try and learn from it.
> If you disagree with this, it is pretty easy to switch off the
> sanitization checks and keep the bad molecules.

I understand. I also guess unsanitized molecules would make some things
crash, just later.

> -greg
> 
>  
> 
> 
> > The molecule RDkit complains about has a charged N atom.
> > How do I sanitize it to fix these errors without losing its charge and
> > 3D coordinates?
> > Or how to disregard all these errors and get all the molecules read with
> > nothing missing?
> >
> > Thanks,
> > Sundar
> >
> >
> >
> >
> >
> 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >
> >
> >
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> 
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> >
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> 
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> 
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SanitizeMol changing drawing

2017-12-13 Thread Greg Landrum
Hi Jason,

This is a nice one.

Here's what's going on:
The depiction code (the piece that generates 2D coordinates) attempts to
generate "canonical" coordinates : it tries to generate the same
coordinates for a molecule no matter what the input atom ordering is.
In order to do that it needs a canonical numbering of the atoms (or at
least something approximating one).
The current code uses the calculated CIP ranks of the atoms as this
canonical ordering. These ranks are generated as part of the standard
stereochemistry assignment that is done on molecule construction and are
stored as computed properties on the atoms. If the CIP ranks are not there
it more or less gives up and just uses the atomic number.
The call to SanitizeMol() clears the computed properties on atoms, thus
blowing out the CIP rank information that the depiction code uses.

If you want to resolve this, you can call
Chem.AssignStereochemistry(m2,cleanIt=True, force=True) after you sanitize
the molecule. Note that this can be a computationally expensive call, so
you may not want to make a habit out of it.

I'll create an issue to explore updating the depiction code and replacing
the use of CIP ranks with the atom ranking generated by Nadine's
canonicalization code

-greg


On Wed, Dec 13, 2017 at 10:38 PM, Jason Biggs  wrote:

> using the recent release,
>
>
> m = Chem.MolFromSmiles("N[C@@H](C)C(=O)O")
> m2 = Chem.MolFromSmiles("N[C@@H](C)C(=O)O")
> Chem.rdmolops.SanitizeMol(m2)
>
>
>
> The two molecules above seem identical - MolFromSmiles already performs a
> sanitization so why wouldn't they be?  They produce the same pickle,
>
> pickle.dumps(m) == pickle.dumps(m2)
>
> True
>
>
> So why do they get treated differently by the drawing code? The only way
> to return m2 to its original state is to run AssignStereoChemistry with
> force = True.  What variable is being thrown off by SanitizeMol?
>
> [image: Inline image 1]
>
> Jason Biggs
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Sanitizing SD file

2017-12-13 Thread Greg Landrum
On Thu, Dec 14, 2017 at 4:22 AM, Francois BERENGER <
beren...@bioreg.kyushu-u.ac.jp> wrote:

> On 12/14/2017 05:15 AM, Sundar wrote:
> > Hi RDkit users,
> >
> > I encounter following sanitize issue while I was trying to load an SD
> > file using
> > Chem.SDMolSupplier('lig.sdf')
> >
> > Explicit valence for atom # 16 N, 4, is greater than permitted
> > ERROR: Could not sanitize molecule ending on line 3145
>
> I also encounter this exact error sometimes.
>
> Is there a way to tell rdkit to automatically correct this atom type?
>

The code currently only automatically corrects cases where it's really,
really obvious what the correction should be, like C-N(=O)=O ->
C-[N+](=O)[O-].
If there are additional commonly "misdrawn" functional groups, it would be
straightforward to add them


> I guess that sanitization failure means the molecule
> goes to the trash, which is terrible when there are so few molecules to
> learn from.
>

The philosophy taken in the RDKit is that it's better to have a bad
structure be rejected than it is to try and learn from it.
If you disagree with this, it is pretty easy to switch off the sanitization
checks and keep the bad molecules.

-greg



>
> > The molecule RDkit complains about has a charged N atom.
> > How do I sanitize it to fix these errors without losing its charge and
> > 3D coordinates?
> > Or how to disregard all these errors and get all the molecules read with
> > nothing missing?
> >
> > Thanks,
> > Sundar
> >
> >
> >
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >
> >
> >
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Sanitizing SD file

2017-12-13 Thread Francois BERENGER
On 12/14/2017 05:15 AM, Sundar wrote:
> Hi RDkit users,
> 
> I encounter following sanitize issue while I was trying to load an SD
> file using
> Chem.SDMolSupplier('lig.sdf')
> 
> Explicit valence for atom # 16 N, 4, is greater than permitted
> ERROR: Could not sanitize molecule ending on line 3145

I also encounter this exact error sometimes.

Is there a way to tell rdkit to automatically correct this atom type?

I guess that sanitization failure means the molecule
goes to the trash, which is terrible when there are so few molecules to
learn from.

> The molecule RDkit complains about has a charged N atom.
> How do I sanitize it to fix these errors without losing its charge and
> 3D coordinates?
> Or how to disregard all these errors and get all the molecules read with
> nothing missing?
> 
> Thanks,
> Sundar
> 
> 
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> 
> 
> 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Sanitizing SD file

2017-12-13 Thread Ling Chan
Hello Sundar,

Without access to your sd file I cannot be sure. But it is likely that you
have a nitrogen with a valence exceeding 4. You may need to, e.g., change
the N=O form into [N+][O-]. You may take a look at some of the following
threads.

https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg06450.html
https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg06774.html
https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg06796.html
https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg06798.html

Ling


On Wed, Dec 13, 2017 at 12:15 PM, Sundar  wrote:

> Hi RDkit users,
>
> I encounter following sanitize issue while I was trying to load an SD file
> using
> Chem.SDMolSupplier('lig.sdf')
>
> Explicit valence for atom # 16 N, 4, is greater than permitted
> ERROR: Could not sanitize molecule ending on line 3145
>
> The molecule RDkit complains about has a charged N atom.
> How do I sanitize it to fix these errors without losing its charge and 3D
> coordinates?
> Or how to disregard all these errors and get all the molecules read with
> nothing missing?
>
> Thanks,
> Sundar
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] SanitizeMol changing drawing

2017-12-13 Thread Jason Biggs
using the recent release,


m = Chem.MolFromSmiles("N[C@@H](C)C(=O)O")
m2 = Chem.MolFromSmiles("N[C@@H](C)C(=O)O")
Chem.rdmolops.SanitizeMol(m2)



The two molecules above seem identical - MolFromSmiles already performs a
sanitization so why wouldn't they be?  They produce the same pickle,

pickle.dumps(m) == pickle.dumps(m2)

True


So why do they get treated differently by the drawing code? The only way to
return m2 to its original state is to run AssignStereoChemistry with force
= True.  What variable is being thrown off by SanitizeMol?

[image: Inline image 1]

Jason Biggs
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Stereo information lost

2017-12-13 Thread Paolo Tosco

Hi Marina,

If you wish to assign stereochemistry from the atom block parity flag 
rather than from 3D coordinates you might try the following:


#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv) {
    std::string in_file_name = "zinc_3833800.sdf";

    RDKit::SDMolSupplier mol_supplier( in_file_name , true );

    for (unsigned int i = 0; i < mol_supplier.length(); ++i) {
    RDKit::ROMOL_SPTR mol(mol_supplier[i]);

    if (!mol){
    std::cout << "Error, molecule not found!" << std::endl;
    continue;
    }


    for (unsigned int j = 0; j < mol->getNumAtoms(); j++){
    RDKit::Atom* atom = mol->getAtomWithIdx(j);
    if (atom->hasProp(RDKit::common_properties::molParity)) {
    int parity;
atom->getProp(RDKit::common_properties::molParity, parity);
    switch (parity) {
    case 1:
atom->setChiralTag(RDKit::Atom::CHI_TETRAHEDRAL_CW);
    break;
    case 2:
atom->setChiralTag(RDKit::Atom::CHI_TETRAHEDRAL_CCW);
    break;
    default:
atom->setChiralTag(RDKit::Atom::CHI_UNSPECIFIED);
    break;
    }
    }
    std::cout << "Chirality atom " << j+1  << ": " << 
atom->getChiralTag() << std::endl;

    }

    std::string usmiles = RDKit::MolToSmiles (*mol, true);
    std::cout << "Mol: " << usmiles << std::endl;
    }
}

Cheers,
p.

On 13/12/2017 09:39, Marina Garcia de Lomana wrote:

Thanks for your answer Paolo!
But I would need to read in the stereochemistry information from the 
SD file (7th column in the atom block) and not calculate the chirality 
from the coordinates, since this way the information about racemates 
would get lost.


Is there a way to do that?


El 13 dic 2017, a las 10:27, Paolo Tosco > escribió:


Hi Marina,

I assume you are using this SDF file:

http://zinc11.docking.org/fget.pl?l=0=57393683=d

which contains 3D coordinates and no wedge bond information. If this 
is the case, you will need to call 
MolOps::assignStereochemistryFrom3D() 
 
to assign the chiral tags before accessing them.


If you are also interested in E/Z double bond stereochemistry you 
will need the following three calls before accessing stereochemistry 
descriptors:


MolOps:: 
detectBondStereochemistry 
 
(*mol)
MolOps:: 
assignStereochemistryFrom3D 
 
(*mol)
MolOps:: 
assignStereochemistry 
 
(*mol);


Hope that helps, cheers
p.

On 12/12/17 18:18, Marina Garcia de Lomana wrote:


Hi,
I am using the following C++ script to read a molecule from a SD 
file (with defined stereochemistry), but the information about the 
stereochemistry gets lost. The file I am using is from glucose 
(ZINC03833800; http://zinc.docking.org/substance/3833800)
When I check the chiral information of the atoms, all of them have 
the chiral tag 0.


How can I keep the stereo information?

SCRIPT:

|std::fstream infilestr; infilestr.open(in_file_name.c_str( )); 
RDKit::SDMolSupplier mol_supplier( in_file_name , true ); for 
(unsigned i = 0; i < mol_supplier.length(); ++i) { RDKit::ROMOL_SPTR 
mol(mol_supplier[i]); if (!mol){ std::cout << "Error, molecule not 
found!" << std::endl; continue; } for (unsigned int j = 0; j < 
mol->getNumAtoms(); j++){ RDKit::Atom* atom = 
mol->getAtomWithIdx(j); std::cout << "Chirality atom " << j+1 << ": 
" << atom->getChiralTag() << std::endl; } std::string usmiles = 
RDKit::MolToSmiles (*mol); std::cout << "Mol: " << usmiles << 
std::endl; } |



OUTPUT FOR GLUCOSE (ZINC03833800):

Chirality atom 1: 0
Chirality atom 2: 0
Chirality atom 3: 0
Chirality atom 4: 0
Chirality atom 5: 0
Chirality atom 6: 0
Chirality atom 7: 0
Chirality atom 8: 0
Chirality atom 9: 0
Chirality atom 10: 0
Chirality atom 11: 0
Chirality atom 12: 0
Mol: OCC1OC(O)C(O)C(O)C1O



--
Check out the vibrant tech community on one of the world's most
engaging tech sites,Slashdot.org !http://sdm.link/slashdot


___
Rdkit-discuss mailing list

Re: [Rdkit-discuss] Stereo information lost

2017-12-13 Thread Marina Garcia de Lomana
Ok thank you for the help!

Marina

> El 13 dic 2017, a las 11:22, Greg Landrum  escribió:
> 
> 
> On Wed, Dec 13, 2017 at 10:39 AM, Marina Garcia de Lomana 
> > wrote:
> Thanks for your answer Paolo!
> But I would need to read in the stereochemistry information from the SD file 
> (7th column in the atom block) and not calculate the chirality from the 
> coordinates, since this way the information about racemates would get lost.
> 
> Is there a way to do that?
> 
> There is not. In this case the RDKit follows the recommendation of the 
> documentation for Mol files and ignores that field when reading structures.
> A standard way to indicate that you don't know the stereochemistry of a 
> center despite there being 3D coordinates is to make one of the bonds from it 
> a "squiggly" bond (i.e. bond type 4). The RDKit will recognize this and avoid 
> assigning stereochemistry there.
> 
> A request: please do not post questions both to the mailing list and to 
> github. It's ok to use either one, but if you pick both it just makes more 
> work and confusion for those of us who are trying to answer things.
> 
> Thanks,
> greg
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Stereo information lost

2017-12-13 Thread Paolo Tosco

Hi Marina,

I assume you are using this SDF file:

http://zinc11.docking.org/fget.pl?l=0=57393683=d

which contains 3D coordinates and no wedge bond information. If this is 
the case, you will need to call MolOps::assignStereochemistryFrom3D() 
 
to assign the chiral tags before accessing them.


If you are also interested in E/Z double bond stereochemistry you will 
need the following three calls before accessing stereochemistry descriptors:


MolOps:: 
detectBondStereochemistry 
 
(*mol)
MolOps:: 
assignStereochemistryFrom3D 
 
(*mol)
MolOps:: 
assignStereochemistry 
 
(*mol);


Hope that helps, cheers
p.

On 12/12/17 18:18, Marina Garcia de Lomana wrote:


Hi,
I am using the following C++ script to read a molecule from a SD file 
(with defined stereochemistry), but the information about the 
stereochemistry gets lost. The file I am using is from glucose 
(ZINC03833800; http://zinc.docking.org/substance/3833800)
When I check the chiral information of the atoms, all of them have the 
chiral tag 0.


How can I keep the stereo information?

SCRIPT:

|std::fstream infilestr; infilestr.open(in_file_name.c_str( )); 
RDKit::SDMolSupplier mol_supplier( in_file_name , true ); for 
(unsigned i = 0; i < mol_supplier.length(); ++i) { RDKit::ROMOL_SPTR 
mol(mol_supplier[i]); if (!mol){ std::cout << "Error, molecule not 
found!" << std::endl; continue; } for (unsigned int j = 0; j < 
mol->getNumAtoms(); j++){ RDKit::Atom* atom = mol->getAtomWithIdx(j); 
std::cout << "Chirality atom " << j+1 << ": " << atom->getChiralTag() 
<< std::endl; } std::string usmiles = RDKit::MolToSmiles (*mol); 
std::cout << "Mol: " << usmiles << std::endl; } |



OUTPUT FOR GLUCOSE (ZINC03833800):

Chirality atom 1: 0
Chirality atom 2: 0
Chirality atom 3: 0
Chirality atom 4: 0
Chirality atom 5: 0
Chirality atom 6: 0
Chirality atom 7: 0
Chirality atom 8: 0
Chirality atom 9: 0
Chirality atom 10: 0
Chirality atom 11: 0
Chirality atom 12: 0
Mol: OCC1OC(O)C(O)C(O)C1O



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] ?==?utf-8?q? Similarity maps using machine learning

2017-12-13 Thread gregori

Hi,

Getting the similarity maps for multiple structures should be straightforward, 
just add the function call in your loop.
Concerning the export of the generated images to a web service, I did the 
following:
(assuming the figure returned by the SimilarityMaps.GetSimilarityMapForModel() 
function is called "fig")
import io
from matplotlib import pyplot as plt
buf = io.BytesIO()
DPI = fig.get_dpi()
plt.savefig(buf, format='png', bbox_inches='tight', pad_inches=0, dpi=DPI/2)
plt.close(fig)
img = buf.getvalue()
buf.close()

I remember having to play around with the picture size, the coordScale and dpi 
in order to get an image of a proper size (as requested by the user, in pixels) 
and adequate font size:
fig = Draw.MolToMPL(mol, coordScale=1.5, size=(int(width/1.3),int(height/1.3)))
I can't exactly remember the details, but worth looking at these parameters if 
the image you get is not ok.

I use Flask for the web server;
I send the picture trough the server this way:
flask.send_file(io.BytesIO(img), mimetype='image/png')

Best,

Grégori

On Wednesday, December 13, 2017 06:39 CET, Greg Landrum 
 wrote:
 I know that Michal has done some work with this as part of the beta for the 
new ChEMBL interface.@Michal: do you have a bit of time to explain what you did 
in order to get images that you could serve via the web? On Tue, Dec 12, 2017 
at 1:50 PM, Bruno Neves  wrote:Dear colleagues I want 
to develop mechanistically interpretable machine learning models (i.e., using 
similarity maps) and implement them in web services. I've already managed to 
generate a map from a smiles. However, I can not generate maps for multiple 
molecules in a data set (CSV ou SDF). I’m also having some difficulty trying to 
save the new similarity map images. The scripts available in RDKit the tutorial 
do not provide detailed information to solve this problem.  Do you have any 
idea how I can solve this?  # Use the random forest to predict a new molecule 
(SMILES)>>> m = 
Chem.MolFromSmiles('FC(F)(F)C1=CC=C(OC(CCNC2=CC=CC=C2)C2=CC=CC=C2)C=C1')>>> fp 
= np.zeros((1,))>>> 
DataStructs.ConvertToNumpyArray(AllChem.GetMorganFingerprintAsBitVect(m, 
radius, nBits, useFeatures), fp)>>> print(rf.predict((fp,)))>>> 
print(rf.predict_proba((fp,))) # Get predicted probability map>>> def 
getProba(fp, predictionFunction):>>> return predictionFunction((fp,))[0][1]>>> 
fig, maxweight = SimilarityMaps.GetSimilarityMapForModel(m, 
SimilarityMaps.GetMorganFingerprint, lambda x: getProba(x, rf.predict_proba)) # 
Open CSV file with multiple molecules>>> m = 
pd.read_csv('C:\\Users\\bruno\\Desktop\\maps\\data\\logBB_S.csv', 
delimiter=',')>>> mols = []>>> y = []>>> for mol in Chem.SDMolSupplier(fname):  
  >>> if mol is not None:        >>> mols.append(mol)>>> fps = 
[AllChem.GetMorganFingerprintAsBitVect(m, radius, nBits,useFeatures) for m in 
mols]>>> def rdkit_np_convert(fp):   >>> output = []    >>> for f in fp:        
>>> arr = np.zeros((1,))        >>> DataStructs.ConvertToNumpyArray(f, arr)     
   >>> output.append(arr)    >>> return np.asarray(output)>>> x = 
rdkit_np_convert(fps)>>> x.shape>>> print(fp)>>> print(rf.predict(x))>>> 
print(rf.predict_proba((x))) # Get predicted probability maps for multiple 
structures??  Best regards, Prof. Dr. Bruno Junior 
NevesLaboratório de QuimioinformáticaCentro Universitário de Anápolis - 
UniEVANGÉLICA 
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
 
 
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss