Re: [Rdkit-discuss] Code efficiency improvement

2019-12-20 Thread Dimitri Maziuk via Rdkit-discuss
On 12/19/19 7:27 PM, Francois Berenger wrote:
> 
> You should parallelize the processing of molecules, since each can be
> worked at independently.
> 

Well, for "a lot" of conformers on "a lot" of molecules that'll work if
you have access to a compute cluster and/or are willing to pay for
spinning up a bunch of VMs on amazon etc. Otherwise the best you can
hope for is to run maybe two per CPU core.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

2019-12-16 Thread Dimitri Maziuk via Rdkit-discuss
On 12/16/19 10:35 AM, Illimar Hugo Rekand wrote:
> Fair point.
> 
> But when working in the 100s and 1000s range of PDB-files it would be nice to 
> have some fewer steps when designing a pipeline.

But what's the selection criteria? NMR structures are usually deposited
with 20 models, do you want the ligand from every one? Only from the
representative one? There's at least one PDB ID (forget which) with 3
stable conformers, i.e. model 1 is not the representative structure.

Structures annotated by PDB will have HETATM instead of ATOM for
non-standards and ligands, but if your files haven't been processed by
them, all bets are off.

And so on
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand

2019-12-16 Thread Dimitri Maziuk via Rdkit-discuss

On 12/16/2019 10:07 AM, Illimar Hugo Rekand wrote:


Would it be viable to create a function where you could create a mol object 
from specific lines within a pdb-file?


PDB file is simple text. There's any number of utilities to extract the 
lines you want, incl. a plain text editor, why spend time on reinventing 
the wheel?


Dima


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Saving chains from PDB file

2019-10-05 Thread Dimitri Maziuk via Rdkit-discuss

On 10/5/2019 10:34 AM, Maciek Wójcikowski wrote:

Paolo and Chris,

There actually is Rdkit function to do this very task: SplitMolByPDBChainId


Why, though? -- It's a punch-card format with chain id in specific 
column, you just read the lines and sort them into buckets on line[X]. 
Unless you have NMR multi-model ones where you need to keep track of 
model/endmdl


Dima




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code

2019-08-14 Thread Dimitri Maziuk via Rdkit-discuss
On 8/14/19 3:42 PM, Nicola Zonta wrote:
> Hi Greg,
> yeah, coordgen issues are probably the best way to keep track of where we 
> perform poorly.

So... is it "schrodinger/coordgenlibs"? I can open an issue and upload
the sdf.

(And yes, since we actually need all the protons with labels, I am
painfully aware of how crowded the image becomes. Perhaps some day I'll
manage to persuade my spectroscopist to use a 3D image instead -- her
argument is that having to move the mouse to the other screen to rotate
that thing all the time is too distracting.)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code

2019-08-13 Thread Dimitri Maziuk via Rdkit-discuss
PS I played with it a bit: the least ugly version is if you
MMFF94-optimize it after rdkit.Chem.rdCoordGen.AddCoords()

It's still far from perfect.
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] drawing code

2019-08-13 Thread Dimitri Maziuk via Rdkit-discuss
Hi all,

for our workflow we need molecule drawings with all atoms (incl. Hs)
explicitly labeled. And every once in a while we run into molecules that
don't look so good.

I wonder if it's worth collecting them somewhere, maybe another github
repo under rdkit? -- for future developers of 2D layout algorithms.

Here's out latest one for example. The thing about this one is, the
molecule itself is not that bad, it not clear why the picture isn't any
better.

Enjoy. (Try it in OB if you think RDKit's pix is bad. ;)
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


CID_10955174_alatised.sdf
Description: application/vnd.kinar


signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] High-quality matplotlib drawing?

2019-08-09 Thread Dimitri Maziuk via Rdkit-discuss
On 8/9/19 12:42 PM, Wout Bittremieux wrote:

> Alternatively I could export both the spectrum plot and the molecule to
> SVG files and then combine them afterwards. But in that case it's not
> possible to manipulate both elements in a single matplotlib figure.

Yeah, that's what I meant but you're right: getting that to work in an
interactive display in a notebook would be a hassle.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] High-quality matplotlib drawing?

2019-08-08 Thread Dimitri Maziuk via Rdkit-discuss

On 8/7/2019 7:20 PM, Wout Bittremieux wrote:
...
Unfortunately the quality of the molecule drawing is rather poor (see 
attachment; nonsensical spectrum and molecule). This seems to be true 
for non-SVG drawing in general, and unfortunately it's not really 
possible to combine SVG output with Matplotlib functionality.


Hmm... have you tried wrapping the molecule svg in a `transform="translate(x,y)"` or something along those lines?


Dima


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Read only first model of a pdb-file

2019-05-29 Thread Dimitri Maziuk via Rdkit-discuss
On 5/29/19 3:31 PM, David Cosgrove wrote:
> Biopython is excellent for extracting particular models from a PDB file. As
> Dimitri suggests, you can then pass the result into your processing script.
> It is quite straightforward to write the relevant PDB model to a string in
> PDB format and parse with RDKit’s PDB reader, for example.

Just to add more confusion, if you are working with PDB entries, you may
also want to look at
"""
REMARK 210 CONFORMERS, NUMBER CALCULATED   :
REMARK 210 CONFORMERS, NUMBER SUBMITTED:
REMARK 210 CONFORMERS, SELECTION CRITERIA  :
REMARK 210
REMARK 210

REMARK 210 BEST REPRESENTATIVE CONFORMER IN THIS ENSEMBLE :
"""
(the last one is the one I mentioned earlier)

You would typically have "lowest energy" as selection criteria and "best
reperesentative" is the minimized average of those submitted.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Read only first model of a pdb-file

2019-05-29 Thread Dimitri Maziuk via Rdkit-discuss
On 5/29/19 8:19 AM, Illimar Hugo Rekand wrote:
> Hey, RDKitters!
> 
> 
> I am currently trying to figure out how to only read in the first model of a 
> pdb-file. I've designed a script that performs calculations on a per-atom 
> basis, and this is very slow when it tries to account for multiple models, 
> for example with a NMR-structure.

Pre-process the PDB file to cut out the model you want. In the files
annotated by PDB it should be the first model and I belive tehre is a
REMARK something-or-other "best model in this ensemble".

However this fails for multiple conformers in one file, there is at
least one in PDB.

(It's been a while since I did this so I don't remember the remark
number, nor the multi-conormer entry id off the top of my head.)
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit Release 2018.09.2 available

2019-02-22 Thread Dimitri Maziuk via Rdkit-discuss
On 2/22/19 5:01 PM, Markus Sitzmann wrote:

> It is odd, but one thing I learned from using conda is, sometimes it helps
> to ignore problems and wait for a bit and they might go away ... well, I
> have similar experiences with maven :-) ... but most likely I do something
> stupid which I don't see right now :-)

Simple test is to make a clean one and install only rdkit and nothing
else and see what happens. It's pretty common for packagers to do
something-that-may-or-may-not-be-stupid and have a dependency on an
specific version of some other package that depends on a specific
version of another package that depends on... turtles all the way down.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Warning as error

2019-01-21 Thread Dimitri Maziuk via Rdkit-discuss
On 1/21/19 1:42 PM, Jean-Marc Nuzillard wrote:

>             sys.stderr.write("Bad: %s\n" % (mol.GetProp("_Name"),))

> I know which bond has a problem but I still do not know in which molecule.

Are you sure they all have _Name's? I'd just print the count outside of
the try/catch block and ignore ones not followed by the warning message.
(And run with #!/usr/bin/python -u and/or flush sys.stdout/stderr on
every iteration for good measure.)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] InChI to Mol to InChi

2018-12-18 Thread Dimitri Maziuk via Rdkit-discuss
On 12/18/18 1:57 PM, JEAN-MARC NUZILLARD wrote:

> Dimitri, how can alatis help me to find a first draft of 3D structure
> for a few ten thousands of compounds from InChI strings?

It won't, you have to feed it a 3D structure. However its InChI string
and/or MOL block will give you the same 3D structure with the same atom
labels on round-trip, *as long as you don't removeH/addH/recalculate
conformers etc.*

(At least on all molecules they tried and I think that includes the
entire PubChem.)
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] InChI to Mol to InChi

2018-12-18 Thread Dimitri Maziuk via Rdkit-discuss
On 12/18/18 11:34 AM, JEAN-MARC NUZILLARD wrote:

> Molecules m1 and m2 have identical SMILES representations
> but different InChI representations, which I find odd.

*shrug* this is precisely why they came up with alatis: take a molecule
in any input format, round-trip it through any cheminformatics program,
there's 50% chance you'll get a different molecule out.

That's how chemistry works when it meets compsci.
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] InChI to Mol to InChi

2018-12-17 Thread Dimitri Maziuk via Rdkit-discuss
On 12/17/18 4:50 PM, JEAN-MARC NUZILLARD wrote:
> Is there any more deterministic procedure than the one of trying until
> success is obtained?
> 
> How do I determine the InChI string of a conformer obtained after
> multiple embedding?

This representation keeps 3D config: http://alatis.nmrfam.wisc.edu/

Generally speaking the problem with InChI is that the only *required*
layer is the formula. Therefore *an* InChI string cannot be used to
differentiate conformers, you need the InChI string with all the
relevant layers and all the protons.

https://www.nature.com/articles/sdata201773

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 回复: 回复: Help: How to set timeout for the function namedRunReactants

2018-11-15 Thread Dimitri Maziuk via Rdkit-discuss
On 11/15/2018 11:41 AM, Francis Atkinson wrote:

>     products = rxn.RunReactants([mol], maxProducts=1)
> Boost.Python.ArgumentError: Python argument types in
>     ChemicalReaction.RunReactants(ChemicalReaction, list)
> did not match C++ signature:
>     RunReactants(RDKit::ChemicalReaction*, boost::python::list)
>     RunReactants(RDKit::ChemicalReaction*, boost::python::tuple)
> 
>     I presume I am missing something, but what?!

It doesn't list a candidate w/ the 3rd parameter so I'd say maxProducts
is not exposed to python in your version. ICBW, though: c++ - boost -
swig - python is not something I'd want to ever become familiar with...

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] svg: next question

2018-11-02 Thread Dimitri Maziuk via Rdkit-discuss
On 11/02/2018 12:19 AM, Greg Landrum wrote:
> On Fri, Nov 2, 2018 at 12:32 AM Dimitri Maziuk via Rdkit-discuss <
> rdkit-discuss@lists.sourceforge.net> wrote:
> 
>> Does anyone know where TH does
>>
>> 
>>
>> come from? --
> 
> 
> assuming you're using the RDKit's MolDraw2DSVG class, that comes from here:
> https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/MolDraw2D/MolDraw2DSVG.cpp#L53

Should it be changed to utf-8? I suspect any system where RDKit builds
at this point is using that, and I believe technically  element
can contain unicode.

E.g. you should be able to render your amino-acids with atoms labeled w/
Greek alphas, betas, etc. as per IUPAC.

>> I have two SVGs generated by the same container running on
>> the same linux host and one has the above, the other has
>>
>> 
>>
> 
> No idea where that might have come from, but it's not MolDraw2DSVG

Weird. I'll see if I get any more of those...

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Plotting values next to atoms

2018-11-02 Thread Dimitri Maziuk via Rdkit-discuss
On 11/02/2018 07:59 AM, Eric Jonas wrote:
> Hello! I'm trying to figure out if there's any known or sane way to
> automatically plot numerical values adjacent to atoms using the rdkit
> drawing machinery. Ideally I'd like to annotate certain atoms
> programmatically with values.

This draws atom labels:

op = dr.drawOptions()
for i in range( self._mol.GetNumAtoms() ) :
op.atomLabels[i] = self._mol.GetAtomWithIdx( i ).GetSymbol() +
str( (i + 1) )

HTH,
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] svg: next question

2018-11-01 Thread Dimitri Maziuk via Rdkit-discuss
Does anyone know where TH does



come from? -- I have two SVGs generated by the same container running on
the same linux host and one has the above, the other has



The host is on utf-8, of course, and I can double-check the container
thought I don't see it using anything else... certainly not cp1252.

Any ideas?
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] svg transparent background

2018-11-01 Thread Dimitri Maziuk via Rdkit-discuss
Hi all,

I finally got around to playing w/ drawing code in the current version
and I like it, but how do I set background colour to transparent?


 dr = rdkit.Chem.Draw.rdMolDraw2D.MolDraw2DSVG( 1000, 1000 )

results in

 

in the output, which for now I'll just post-process away.

Thx,
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Compilation Errors on RHEL7

2018-10-24 Thread Dimitri Maziuk via Rdkit-discuss
On 10/24/2018 12:10 PM, Dimitri Maziuk via Rdkit-discuss wrote:
> Yes. I once spent a couple of hours trying and ended up installing docer

docker

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Compilation Errors on RHEL7

2018-10-24 Thread Dimitri Maziuk via Rdkit-discuss
On 10/24/2018 11:32 AM, Oellien, Frank wrote:
> Hi,
> 
> I am trying to compile RDKit on a RHEL7 system using Python 2.7 and Boost 
> 1.68 
...
> Has somebody already seen this error?

Yes. I once spent a couple of hours trying and ended up installing docer
and pulling a conda/rdkit container instead. I strongly recommend doing
that, or finding a singularity version of the same.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Are atom and bond indexes deterministic?

2018-10-03 Thread Dimitri Maziuk via Rdkit-discuss
On 10/03/2018 03:23 PM, Peter St. John wrote:
> Ah, well I suppose the follow up question is then does 'AddHs' add
> hydrogens in a deterministic fashion?

It should, what's not guaranteed is that it will be the right order.
Obviously, if (using my previous example) L- and D-alanine is the "same
molecule" for your purposes, then it doesn't matter.

If it does mater, then alatis (the link I sent earlier) is the best
option that I know of.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Are atom and bond indexes deterministic?

2018-10-02 Thread Dimitri Maziuk via Rdkit-discuss
On 10/02/2018 03:32 PM, Peter St. John wrote:

> I.e., if I create a new rdkit Molecule with rdkit.Chem.MolFromSmiles(xxx),
> will the bond ordering always be the same? If not, does anyone know a a
> robust way of specifying a bond within a molecule as a string-based
> representation?

https://www.nature.com/articles/sdata201773


-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Creating Mol Object From SD File

2018-08-29 Thread Dimitri Maziuk via Rdkit-discuss
On 08/29/2018 01:54 PM, Chris Murphy wrote:
> Hi,
> 
> I finally realized that when passing an sdf string to Chem.MolFromMolBlock,
> the Mol object will not retain the properties from the sdf.

Ugh. You're right.

+1 for a MolFromSdfBlock() that doesn't lose the properties.

> Also, it seems that SDMolSupplier.next() does not work anymore? 

if sys.version_info[0] == 2 : next()
elif sys.version_info[0] == 3 : __next()__
else : raise Exception( "Go! is looking better every day" )

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Can't import Chem from rdkit in Anaconda Python 3.6.5

2018-06-13 Thread Dimitri Maziuk via Rdkit-discuss
On 06/13/2018 01:10 PM, Greg Landrum wrote:

> You don't excerpt the earlier message where I explained how to get things
> working without needing X installed. Was that not clear? If you don't want
> to have X installed but would still like to use the conda builds, you can
> just install the two packages from the RDKit channel.

There's more to it than that. Conda as packaged for a given distro has
itself a set of dependencies. As I said before, *for example* installing
conda on a centos 6 server will take it off the network at the next
maintenance reboot, unless the installer knows they're doing and watches
the whole ting very carefully.

No, it's not your problem, you're doing the best you can, and thank you
for that. But the end result is that ready-made builds are getting
increasingly too bloated to be of use, and custom builds are too
"non-trivial" to attempt.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Can't import Chem from rdkit in Anaconda Python 3.6.5

2018-06-13 Thread Dimitri Maziuk via Rdkit-discuss
On 06/13/2018 11:44 AM, Geoffrey Hutchison wrote:
> 
> No, you can compile RDKit yourself if you don't want to use X11 features. You 
> wanted to install through conda, which has a set of packages for 'most use' - 
> YMMV.

Sadly, MM does indeed V. On my box I can't, not without also compiling
boost myself -- and I haven't looked further. Wouldn't be surprised if
it takes me all the way to compiling GNU Compiler Collection myself too.
Some day I'll get a round tuit to set up an alpine build VM and see if
it compiles there... so I can roll it into a reasonable-sized docker
container, but compiling it on my desktop is simply not worth my time.

(And our compute nodes are the same or older as my desktop, so if it
doesn't work on my box, we can't deploy it anywhere.)
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Can't import Chem from rdkit in Anaconda Python 3.6.5

2018-06-13 Thread Dimitri Maziuk via Rdkit-discuss

On 6/13/2018 10:06 AM, Greg Landrum wrote:
Note that my answer assumes that there is a reason that you don't have 
X11 installed on your linux box. If that's not the case, you should be 
able to fix things "more easily" by installing X


Quite frankly, this is rapidly becoming unusable as a software platform. 
I need to install X11 to UUF-optimize a MOL? Seriously?


E.g. on centos anaconda installs NetworkManager (why?) which comes in 
"enabled at boot" but not configured, so next time you reboot, perhaps 
weeks later, tada! -- you've lost the network. And don't get me started 
on having several versions of boost coexist...


Dima

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] convert a smiles file to a xyz file

2018-05-23 Thread Dimitri Maziuk via Rdkit-discuss

On 5/23/2018 10:23 AM, Chenyang Shi wrote:

A separate 
question is that is the converted molecular structure from SMILES the 
same as that taken from a crystal structure?


Provided there's no undefined/different stereochemistry on SMILES side, 
no quirks with added protons, and so on and so forth... for a small 
simple molecule... maybe.


Dima


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Atom mapping

2018-05-09 Thread Dimitri Maziuk
On 05/09/2018 10:27 AM, carlo del moro wrote:
> Dear All,
> 
> we would like to know if it is possible to map the atom's ID of a SMILES
> represented substructure to the atom sequence of a ligand contained in a
> pdb file. This in order to get the spatial coordinates related to such
> substructure.

http://alatis.nmrfam.wisc.edu/ will generate unique stable IDs from a 3D
structure, and output the old->new ID map. It'll take a PDB,  you'll
have to convert your SMILES into a 3D .mol. ALATIS atom IDs should be
the same in the two maps, *provided both inputs describe the exact same
ligand*.

(It's the *substructure* bit that I'm not entirely sure about.)
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] mol file parsing, 3D or 2D

2018-01-17 Thread Dimitri Maziuk

On 2018-01-17 10:25, Jason Biggs wrote:

For the case in question, I find that if I read in a mol file containing 
2D coordinates, and I skip the sanitization step altogether, then the 3D 
embedding algorithms fail.


Well, yes, as I mentioned in the other thread: the only way you can get 
it to work reliably is if you start with 3D coordinates to begin with. 
Otherwise your users have to get in there every once in a while and 
decide which way to slice that cake they don't get to eat. ;)


Dima

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] mol file parsing, 3D or 2D

2018-01-17 Thread Dimitri Maziuk

On 2018-01-16 22:46, Greg Landrum wrote:

It might be worth thinking about adding an option to the aromaticity 
perception code to maintain the original bond types and just set the 
"isAromatic" flag on the bonds.


This is how it's modeled in mmCIF chem. comp. It may or may not come 
from openeye they were using originally to process their ligands/chem comps.


From programming perspective it's pretty annoying since you have to 
remember to add an extra if stanza to all your code, queries, etc.


What's wrong with keeping a copy of the original molecule around? -- I'm 
not sure I get the "I want to sanitize and keep the original bonds too", 
it sounds too much like the proverbial cake.


Dima

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit and Google Summer of Code 2018

2018-01-15 Thread Dimitri Maziuk
On 01/15/2018 02:43 PM, Tim Dudgeon wrote:

> Could there be something in a more general project to bridge the
> compound (mol/smiles), sequence (protein/nucleotide seq + alignments)
> and structure (pdb/mmcif/mmtf) worlds?

FWIW PDB builds everything up from structure because they can derive
bonds from the coordinates and that's the only way you can do it in the
code. Without bonds, trying to link compounds in a sequence doesn't
really work even if you have two cysteins in a bog standard protein
sequence, with generic compounds it gets too hard fast.

PDB has in the mmCIF chem. comp. model "leaving atom flag" that marks
*a* bonding site but it doesn't tell you what kind of bond can form
there, nor what to do if there's more than one. You need a whole lot of
other code to figure out how to link two compounds into a sequence.

And then there's structure calculation that I don't know if there's
anything that works on not proteins, or can predict disordered regions
well etc.

If anyone's counting votes, pretty 2D depictions get mine.
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Issue with the latest RDKit DB build

2017-12-29 Thread Dimitri Maziuk
PS the real question is why you're trying to run psql built with a newer
toolset when there's 2 perfectly good ones available: one from the
distro vendor and one from postgres repos.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu





signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Issue with the latest RDKit DB build

2017-12-29 Thread Dimitri Maziuk
On 12/29/2017 01:57 PM, Paul Emsley wrote:
> On 29/12/2017 19:01, Drew Gibson via Rdkit-discuss wrote:

>> psql: error while loading shared libraries: libncursesw.so.6: cannot
>> open shared object file: No such file or directory
> 
> install the ncurses-libs package (I have
> ncurses-libs-6.0-8.20170212.fc26.x86_64 on fedora)

On centos 7 that's not gonna get you libncursesw.so.6:

$ rpm -q -l ncurses-libs
/usr/lib64/libform.so.5
/usr/lib64/libform.so.5.9
/usr/lib64/libformw.so.5
/usr/lib64/libformw.so.5.9
/usr/lib64/libmenu.so.5
/usr/lib64/libmenu.so.5.9
/usr/lib64/libmenuw.so.5
/usr/lib64/libmenuw.so.5.9
/usr/lib64/libncurses++.so.5
/usr/lib64/libncurses++.so.5.9
/usr/lib64/libncurses++w.so.5
/usr/lib64/libncurses++w.so.5.9
/usr/lib64/libncurses.so.5
/usr/lib64/libncurses.so.5.9
/usr/lib64/libncursesw.so.5
/usr/lib64/libncursesw.so.5.9
/usr/lib64/libpanel.so.5
/usr/lib64/libpanel.so.5.9
/usr/lib64/libpanelw.so.5
/usr/lib64/libpanelw.so.5.9
/usr/lib64/libtic.so.5
/usr/lib64/libtic.so.5.9
/usr/lib64/libtinfo.so.5
/usr/lib64/libtinfo.so.5.9


-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDkit and Pubchem

2017-12-01 Thread Dimitri Maziuk
On 12/01/2017 11:55 AM, Tim Dudgeon wrote:
> In what way? Given a single PubChem compound or substance ID you just
> want to pull the smiles or molfile into RDKit?

Furthermore what's your definition of "a compound"? If it includes
stereochemistry, pubchem usually has 3d mol files, except where it doesn't.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Transparent background for 2D molecule images

2017-11-20 Thread Dimitri Maziuk
On 11/20/2017 04:45 PM, Markus Metz wrote:

> opts.clearBackground=False
> 
> or
> 
> opts.setBackgroundColour((1,1,0))
> 
> are not working for me.

What's your output format?

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] ImportError: No module named rdkit

2017-09-14 Thread Dimitri Maziuk
On 09/14/2017 03:04 PM, Markus Sitzmann wrote:
> Not on Centos 6 - Docker requires Centos 7 for the host system.

You can't win... :(

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] ImportError: No module named rdkit

2017-09-14 Thread Dimitri Maziuk
On 09/14/2017 02:58 PM, Andrew Dalke wrote:

> If only Greg got as much money for long term RDKit support as Red Hat
> gets for long term RHEL support. :)

Yep. But an rdkit docker container might be feasible.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] ImportError: No module named rdkit

2017-09-14 Thread Dimitri Maziuk
On 09/14/2017 01:41 PM, Riccardo Vianello wrote:

> True, but there shouldn't be any strong need for using the system python
> for running application software. Python 2.7 (together with python 3) has
> been available to RHEL6 subscribers since almost five years, as part of the
> Red Hat Software Collections (also available to non-subscribers from the
> upstream CentOS/Fedora repositories). A detailed discussion is available
> from this post
> http://www.curiousefficiency.org/posts/2015/04/stop-supporting-python26.html
> 
> And the anaconda python distribution of course provides another alternative.

All great when it's one computer and that one's your own personal laptop.

> # yum ls \*python27\*
> ...
> python27.x86_642.7.13-2.ius.el6   
>  @salt-2015.8  
> python27-babel.noarch  0.9.4-5.2.el6  
>  @salt-2015.8  
> python27-chardet.noarch2.2.1-3.el6
>  @salt-2015.8  
> python27-crypto.x86_64 2.6.1-4.el6
>  @salt-2015.8  
> python27-futures.noarch3.0.3-2.el6
>  @salt-2015.8  
> python27-jinja2.noarch 2.8.1-2.el6
>  @salt-2015.8  
> python27-libs.x86_64   2.7.13-2.ius.el6   
>  @salt-2015.8  
> python27-markupsafe.x86_64 0.11-11.el6
>  @salt-2015.8  
> python27-msgpack.x86_640.4.6-2.el6
>  @salt-2015.8  
> python27-psutil.x86_64 5.2.2-1.ius.el6
>  @salt-2015.8  
> python27-pycurl.x86_64 7.19.0-10.el6  
>  @salt-2015.8  
> python27-requests.noarch   2.6.0-4.el6
>  @salt-2015.8  
> python27-six.noarch1.9.0-3.el6
>  @salt-2015.8  
> python27-tornado.x86_644.2.1-3.el6
>  @salt-2015.8  
> python27-urllib3.noarch1.10.2-2.el6   
>  @salt-2015.8  
> python27-zmq.x86_6414.5.0-3.el6   
>  @salt-2015.8  
...
> python27-babel.noarch  0.9.6-7.sc1.el6
>  centos-sclo-rh
> python27-pip.noarch9.0.1-1.ius.el6
>  salt-2015.8   
...
> python27-python.x86_64 2.7.13-3.el6   
>  centos-sclo-rh
> python27-python-babel.noarch   0.9.6-7.sc1.el6
>  centos-sclo-rh
> python27-python-jinja2.noarch  2.6-10.sc1.el6 
>  centos-sclo-rh
> python27-python-libs.x86_642.7.13-3.el6   
>  centos-sclo-rh
> python27-python-markupsafe.x86_64  0.11-11.sc1.el6
>  centos-sclo-rh
> python27-python-pip.noarch 8.1.2-1.el6
>  centos-sclo-rh
...

Any guesses as to how many things will break in my infrastructure
manglement setup (saltstack) if I enable Software Collections and some
of those get updated from SCL and some: from Salt?

And don't get me started on PIP.
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] ImportError: No module named rdkit

2017-09-14 Thread Dimitri Maziuk
On 09/14/2017 10:43 AM, Greg Landrum wrote:

> Just to do some expectation management: python 2.6 is pretty ancient and
> there's no guarantee that all of the RDKit code will work with it. Python
> 2.7 is the minimum version that we "officially" support. It's a very good
> idea to update.

Just FYI: python 2.6 is the system python on (at least) RHEL-6 family of
linux distros that will be officially with us until June 30, 2024.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Non-redundant database of molecules

2017-09-13 Thread Dimitri Maziuk
On 09/13/2017 11:46 AM, Markus Sitzmann wrote:
> The case that you have 3D information available for a molecule dataset is 
> rare, if you want it trustworthy it gets even worse than that. And what is 
> the point then to generate the configuration of a molecule first if you can 
> not trust that either?

Veering further off topic, do you even care in the first place? E.g. if
your molecule always exists as a mixture of isomers, except in some
megabuck-per-microgram painstakingly created reference samples, a
3D-based system will represent it as two distinct molecules. Whereas you
want it represented as one.

Last I looked PDB Ligand Expo had two different benzenes. Their software
doesn't (didn't?) do the circle version so they don't have the third one.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Non-redundant database of molecules

2017-09-13 Thread Dimitri Maziuk

On 2017-09-13 10:17, Markus Sitzmann wrote:
Canonical SMILES are only a very rough approximation for "unique 
molecule" as they usually don't work well for tautomeric forms of compound.

InChI or Standard InChI is much better although also not perfect.


ALATIS I linked to above does impose a stable consistent ordering for 
everything including hydrogens. The downside is it's garbage in - 
garbage out: you need to start with a 3D structure, otherwise it has an 
option to addHs and gen3D but no guarantee it'll generate the one you want.


Dima

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Non-redundant database of molecules

2017-09-13 Thread Dimitri Maziuk

On 2017-09-13 09:56, TJ O'Donnell wrote:

Let the database do the work for you.  Create a canonical SMILES column
and/or InChI column and declare them to be unique.  As you insert new
rows, postgres will let  you know if there is already a row with the same
SMILES or InChI.
Here's some help on how to handle that.
https://www.postgresql.org/docs/9.5/static/sql-insert.html#SQL-ON-CONFLICT


One of the problems with this is it normally fails on the first conflict 
whereas users very often want a list of all conflicts to look at and see 
what's up. The above mentions a "special excludes table" in passing but 
I don't see anything about accessing it or what it actually contains.


If you don't care what molecules get dropped or why, "on conflict 
ignore" should work very nicely.


Dima

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Non-redundant database of molecules

2017-09-13 Thread Dimitri Maziuk

On 2017-09-13 05:13, Wandré wrote:

Compare if the SMILES as already inserted is easy (text compare), but, 
compare fingerprint of molecule...


Here's one option: http://alatis.nmrfam.wisc.edu/ -- you can use string 
comparison on the resulting inchi string.


Dima

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Is there a Ubuntu ppa or some repository with the latest rdkit release as .deb ?

2017-06-22 Thread Dimitri Maziuk

On 2017-06-22 01:36, Francois BERENGER wrote:


make deb # in rdkit source tree

Some people might ask for a make rpm target also.


You'd have to track any changes that redhat, canonical, suse, and 
whoever else's out there might make to e.g. filesystem layout, linked 
libraries, python and so on.


There is snap packages and AppImage packages. I've no idea if either is 
suitable for shared libraries with python libraries etc.


I'd just grab the newer source .deb and try to build it on your system.

Dima


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] atom indexes and order of atoms in the input file

2017-06-15 Thread Dimitri Maziuk
On 06/15/2017 01:14 PM, Brian Kelley wrote:
> Sorry to hear about the flooding.

>> Unfortunately we got flooded day before yesterday and the servers doing
>> the crunching are currently down.

I should have mentioned that the server (URL is in the article), which
I'll hopefully get back up today, will output a MOL file with atoms
ordered as per the article.

The downside is it only works on 3D MOLs.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] atom indexes and order of atoms in the input file

2017-06-15 Thread Dimitri Maziuk
On 06/15/2017 10:13 AM, Maciek Wójcikowski wrote:
> Hi,
> 
> If you really want to rely on the order of atom you can renumber them
> anyhow you like with Chem.RenumberAtoms()
> http://rdkit.org/Python_Docs/rdkit.Chem.rdmolops-module.html#RenumberAtoms
> There is also a function which returns canonical order of atoms for
> you: Chem.CanonicalRankAtoms() As I remember correctly the order may differ
> from the canonical smiles, although that might have changed.

https://www.nature.com/articles/sdata201773

Unfortunately we got flooded day before yesterday and the servers doing
the crunching are currently down.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Memory issue when storing more than 300K mol in a list

2017-06-10 Thread Dimitri Maziuk

On 2017-06-10 07:42, Chris Swain wrote:
This sounds like the situation where a database might be a better 
option, tuned to store fingerprints in RAM?


The issue is how much programming time it will take, how much that time 
is worth, and how many times the solution will be reused. A clever 
coding solution could be preferable for other reasons, like a 
programming exercise. If it's a one-off and you just need it done and 
move on, throwing more hardware at it is often the most cost-effective 
solution.


Dima



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Memory issue when storing more than 300K mol in a list

2017-06-09 Thread Dimitri Maziuk

On 2017-06-09 08:12, Alexis Parenty wrote:

Dear Greg and Brian,
Many thanks for your response. I was also thinking of your streaming 
approach! I think the RAM of most machine would deal with lists of 100K 
mol so we could put the threshold higher than 1000. Actually, I was 
thinking to monitor the available RAM and only start processing the 
matrix and clearing the list when less than 20% of RAM is left. This 
way, the best machines could skip the clearing process and gain time. 
What do you think?


Take $100, buy a 200GB SSD, set it up as the swap space, don't worry 
about the RAM.


Dima



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Molecule representation

2017-03-08 Thread Dimitri Maziuk
On 03/07/2017 05:42 PM, Markus Metz wrote:
> Dear Stephane:
> Thank you very much.
> I will give it a try.

An alternative:

import os
import sys
import time
import threading
PYMOL_PATH = "/SOME/PLACE/lib64/python"
sys.path.append( PYMOL_PATH )
import pymol

def make_image( infile, outfile ) :

pymol.pymol_argv = ['pymol','-qc']
pymol.finish_launching()
cmd = pymol.cmd

cmd.load( infile )
cmd.hide( "everything" )
cmd.show( "sticks" )

cmd.util.cbaw()

cmd.set( "cartoon_discrete_colors", 1 )
cmd.set( "ray_opaque_background", "off" )
cmd.set( "ray_trace_mode",  1 )
cmd.set( "antialias", 2 )
cmd.set( "ray_trace_color", "grey" )
cmd.set( "cartoon_fancy_helices", 1 )
cmd.set( "cartoon_side_chain_helper", "on" )
cmd.png( outfile, width = 800, dpi = 300, ray = 1 )

while threading.active_count() > 2 :
time.sleep( 2 )
cmd.quit()


HTH,
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PBF precision is to high to determine good planarity

2017-03-02 Thread Dimitri Maziuk
On 2017-03-02 04:37, Guillaume GODIN wrote:

> Based on the precision of the coordinates (in rdkit sdf files it's 4
> digits) can we infer the precision on the PBF value based on that ?

Only if you *know* the values are actually accurate to 4 digits and not 
e.g. were printed as "%.4f" just because the programmer thought it was a 
"reasonable" mask.
:(

Dimitri


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-12-29 Thread Dimitri Maziuk
On 12/29/2016 02:35 PM, Peter S. Shenkin wrote:
> Dimitri,
> 
> You were the one who suggested that all the structural depictions be
> generated.
> 
> I, in contrast, suggested that only the ones users need to look at need be
> generated. I further suggested that these would only constitute a small
> fraction of those in a large DB.

My objection was to using numbers like

> ... for 92877507
> structures (current size PubChem Compound):
> 1s per structure = 1074 days (~3 years)
> 100 ms per structure = 107 days
> 1ms per structure = 25 hours

as if they actually mean something.

I responded that *if* the requirement is to generate all 100M
depictions, making the code faster on a single CPU core is rarely the
cost-effective solution. That was a purely academic "if" because I don't
believe that regenerating all the depictions at once on a regular basis
is a realistic use case, either.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-12-29 Thread Dimitri Maziuk
On 12/29/2016 12:43 PM, Peter S. Shenkin wrote:

> Of the
> billion structures, only a fraction will ever be visualized, so a
> memoization strategy sounds reasonable, which in turn implies that you want
> rapid response when an unstored structure has to be generated.

:)

Now I have a mental picture of a phd student tied to a chair with his
eyes taped open, forced to look at a billion depictions for 10ms each.

Pictures are only useful if you have a human looking at them. Looking is
only useful if you do it long enough for the brain to process it. The
whole "what if we need a billion depictions all at once" implies that
you have a billion users looking at them all at once. If you don't, then
rapid response is a very interesting academic exercise but its practical
usefulness might be somewhat questionable.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-12-29 Thread Dimitri Maziuk
On 12/29/2016 12:43 PM, Peter S. Shenkin wrote:
> Look, it all boils down to (CPU) time, and time is money.

It's very hard to say how much a single cpu core actually costs 'cause
they don't make them anymore. Similarly, our small molecule SVGs average
at around 4K, storing 10M of those will require about 40GB and they
don't make disks that small anymore either. 64GB USB stick is twenty bucks.

I've no idea how much I actually cost our funding agency per hour, nor
how many hours it would take me to even figure out if a piece of code of
any kind of complexity can be optimized. But I can guarantee you that a)
it's much more than $20, and b) hiring a competent programmer will cost
you more than buying a "better computer" and is not guaranteed to result
in any appreciable speed-up.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-12-29 Thread Dimitri Maziuk
On 2016-12-29 07:19, John M wrote:

> For why you need sub-second depiction consider these times for 92877507
> structures (current size PubChem Compound):
>
> 1s per structure = 1074 days (~3 years)
> 100 ms per structure = 107 days
> 1ms per structure = 25 hours

The Dilbert answer is buy a better computer. The serious answer is if 
you run millions of jobs sequentially on a single core, your problem is 
not how long a single job takes: no matter how fast you can make it, it 
will only scale linearly. There will be 1B compounds in PubChem two 
years from now and your painstakingly crafted 1ms/structure code will 
still take 3 years, the only difference is you get garbage depictions.

Condor can be persuaded fire up 92877507 EC2 VMs and run all of those in 
parallel -- provided you're willing to pay Amazon for it of course. If 
you can code the algorithm into GPGPU/SIMD parallel flow, you can 
probably push it into an FPGA and then get that baked into ASICs in 
China -- they'll give you discount if you order more than ten thousand. 
That gets you a $20 USB dongle that will run them at umpteen K/second. 
And so on.

If you don't want quality depictions because bad ones will work just 
fine for your needs, that's a perfectly good argument. If you don't want 
them because generating 10M sequentially on a single core will take a 
long time, that's BS argument.

Dima


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-12-15 Thread Dimitri Maziuk
On 12/15/2016 04:23 PM, Peter S. Shenkin wrote:

> Obviously, it doesn't matter if you're rendering just few structures, but
> in a scenario where you might be downloading a hundred SMILES from a DB and
> displaying them on a grid in a browser, computing the 2D depictions on the
> fly, waiting 5 sec for a page refresh wouldn't be great.

Maybe not, but depending how the browser lays out the grid, it may take
5 seconds anyway.

My recommendation for that use case would be to pre-generate the images
and store the URLs in that database. Which is what we do here.

;)
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-12-15 Thread Dimitri Maziuk
On 12/15/2016 02:53 PM, Peter S. Shenkin wrote:
> Looks good, but maybe too slow for production use... (?)

I wonder what kind of production use would require sub-second wall clock
time for this.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Extracting SMILES from text

2016-12-02 Thread Dimitri Maziuk
On 12/02/2016 03:12 PM, George Papadatos wrote:
> Here's a pragmatic idea:
... would it not be safe to
> assume that *any *word containing more than 4 'C' or 'c' characters would
> only be a SMILES string?

pneumonoultramicroscopicsilicovolcanoconiosis


-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] comparing two or more tables of molecules

2016-11-29 Thread Dimitri Maziuk
On 11/29/2016 11:56 AM, Chris Swain wrote:

> However I’ve found that the success is very much dependent on the
> fact 1 described by Greg, get all the structures standardised then comparison
using canonical SMILES or InChi seems to work fine.

+1. Essentially you need to get standardized representation of all the
properties you consider relevant and produce a unique hash of that.
Doesn't matter if it's a SHA-1 string or some graph-based magic or a
matrix voodoo. (String comparison is of course easier.)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] comparing two or more tables of molecules

2016-11-28 Thread Dimitri Maziuk
On 11/28/2016 10:25 AM, Stephen O'hagan wrote:
> Has anyone come up with fool-proof way of matching structurally equivalent 
> molecules?

This is somewhat convoluted and there is no proof that it's fool-proof.

A few years ago we had good results from running graphpowerhash()
function here:
http://madgik.github.io/madis/aggregate.html#module-functions.aggregate.graph
on the PDB ligand database.

The parameters were

- atom1, atom2 IDs (names) as node1, node2.

- Atom stereo (R, S, N), aromatic (y/n), and "leaving atom" (y/n) for
the atoms as node1_details, node2_details (packed into single string
with jpack() function: see http://madgik.github.io/madis/row.html).

Looking at it now, I don't think nodeN_details parameter needs to
include atom's "aromatic" flag.

- Massaged bond type and bond stereo (E, Z, N) as edge_details. Also
packed into a string as above.

PDB chem comp model has bond type as SING or DOUB with a separate yes/no
"aromatic" column. We changed it to AROM for the ones where that was a yes.

The basic model is a list of bonds with atom1, atom2, and type, and a
list of atoms with stereo, aromatic, and "leaving" flags -- the last one
is "Y" for atoms that "go away" when forming a bond.

The algorithm itself, as far as I know (I am not the author), takes the
two "matrices" representing the molecule "graphs", computes their
largest eigenvalue/eigenvectors, and compares those. We have no proof
that it's 100% correct, but all duplicates it found in the PDB ligand
expo at the time were genuine.

Enjoy,
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GenerateDepictionMatching[23]DStructure (a bit off-topic)

2016-11-17 Thread Dimitri Maziuk
On 11/17/2016 02:41 PM, Peter S. Shenkin wrote:

...

> I have to say that Marvin displays the connectivity of the structures much 
> more 
> clearly than RDKit.

Philosophically speaking, there must exist molecules for which a legible
2D projection is simply not possible. PubChem CID 2537 comes close.
Marvin doesn't do much better on this one even if you don't turn on all
the labels.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SVG BUG (Re: Fwd: 2D drawing with atoms labeled by index)

2016-10-27 Thread Dimitri Maziuk
On 2016-10-26 23:39, Peter S. Shenkin wrote:
> Hey, by the way, my agenda is trying to understand all this.

(Using python syntax instead of ML)

Recommended by TFM:

from "http://www.w3.org/2000/svg; import *

All svg names should work with or without package qualifier: point(), 
line(), etc., as well as svg.point(), svg.line(), ...

Rdkit way:

import "http://www.w3.org/2000/svg; as svg

All svg names must be prefixed: svg.point(), svg.line(). Using 
unqualified point() should throw an error. (Unless there's another 
'point' in the name resolution chain, yadda, yadda, yadda.)

Unfortunately I find the fact that a lot of software out there doesn't 
get it right entirely unsurprising. :(

Dima


--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] SVG BUG (Re: Fwd: 2D drawing with atoms labeled by index)

2016-10-25 Thread Dimitri Maziuk
On 10/25/2016 11:21 AM, Peter S. Shenkin wrote:
> Hi, Hongbin,
> 
> Thanks. Indeed. svg2.svg, when renamed to svg2.html, shows the correct
> image in Chrome. svg.html shows garbage.
> 
> Still, it would be good to be able to create a real .svg file from RDKit.

OK, you made me look and I learned something today.

Mozilla claims valid SVG must include the namespace declarations
(https://developer.mozilla.org/en-US/docs/Web/SVG/FAQ) citing this
document: https://jwatt.org/svg/authoring/#namespace-binding

There it states
"""
http://www.w3.org/2000/svg;
...
Be careful not to type xmlns:svg instead of just xmlns when you bind the
SVG namespace. This is an easy mistake to make, but one that can break
everything. Instead of making SVG the default namespace, it binds it to
the namespace prefix 'svg', and this is almost certainly not what you
want to do in an SVG file. A standards compliant browser will then fail
to recognise any tags and attributes that don't have an explicit
namespace prefix (probably most if not all of them) and fail to render
your document as SVG.
"""

Sure enough, rdkit's files start with
"""
http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fwd: 2D drawing with atoms labeled by index

2016-10-24 Thread Dimitri Maziuk
On 2016-10-24 19:04, Peter S. Shenkin wrote:

> My second conclusion (based on the .svg-file experiments) is that it's
> not an iPython problem and, since you see the same thing on Firefox,
> it's unlikely to be a Chrome problem.

Well, what I got it from (Greg's I think) tutorial that if you don't 
strip off the svg:'s the image won't show up in whatever he's using. In 
my case (firefox) the image won't show up if you do.

Programs that read XML correctly (the gimp, inkscape, eog to name a 
couple) will show the image either way.

So to your original question: no rhyme or reason that I know of to when 
you should or should not strip off the svg:'s.

Dima


--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 2D drawing with atoms labeled by index

2016-10-24 Thread Dimitri Maziuk
On 10/24/2016 04:39 PM, Peter S. Shenkin wrote:

> Or is it
> rather because chemists in your target audience will be thinking of the
> first atom in, say, a structure from an sd file as atom #1?

That

> 2. Regarding the last line, most of the RDKit code I've seen in past
> examples displays the molecule using code like the following. When is it
> necessary/not necessary to remove the "svg" string from the results of
> GetDrawingText()?

Not sure: it's a namespace, I'm assuming ipython can't deal with xml
namespaces. Properly written programs should show it either way,
unfortunately my target viewer is firefox (it's a web application and
the user's default browser is firefox) and firefox isn't one of them.
Without svg:'s it'll show the file as xml text instead of the image.

HTH
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 2D drawing with atoms labeled by index

2016-10-24 Thread Dimitri Maziuk
Since you already got your answer I'll just post this for posterity:


import sys
import rdkit
import rdkit.Chem
import rdkit.Chem.AllChem
import rdkit.Chem.Draw
import rdkit.Chem.Draw.rdMolDraw2D

mol=rdkit.Chem.SupplierFromFilename(sys.argv[1],removeHs=False).next()
dr=rdkit.Chem.Draw.rdMolDraw2D.MolDraw2DSVG(800,800)
dr.SetFontSize(0.3)
op = dr.drawOptions()
for i in range(mol.GetNumAtoms()) :
op.atomLabels[i]=mol.GetAtomWithIdx(i).GetSymbol() + str((i+1))
rdkit.Chem.AllChem.Compute2DCoords(mol)
dr.DrawMolecule(mol)
dr.FinishDrawing()
svg=dr.GetDrawingText()


-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to find the idx of hydrogens bonded to a specific atom

2016-10-13 Thread Dimitri Maziuk
On 10/13/2016 12:12 PM, Paul Emsley wrote:

> Are you sure? I use HasSubstrMatch to match hydrogens.

Perhaps this
http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg05897.html
may be useful?

If OP's looking at crystal structures, they're likely dealing with pdb
data model where all hydrogens are explicitly present and indexed. I
wonder if they stay that way throughout the steps leading to (and past)
the smarts match.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The RDKit and modern C++

2016-09-29 Thread Dimitri Maziuk
On 09/29/2016 10:16 AM, Greg Landrum wrote:

> My hope is that all of those people will be able to keep happily
> using a reasonably up-to-date version of the RDKit.

Well, that's kinda the point: there are no RPMs that let you run
binaries linked to GLIBC_2.17 on GLIBC_2.5, nor compile c++-14 code with
c++-03 compilers.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The RDKit and modern C++

2016-09-29 Thread Dimitri Maziuk
On 2016-09-29 00:57, Markus Sitzmann wrote:
> I get the feeling, RH/Centos 6 becomes the next XP kind of story - to
> many legacies that make the update impossible or very hard. Also docker,
> a great technology that could mitigate this problem, is very painful
> under RH/Centos 6.

systemd, corosync/pacemaker, apache 2.4, gnome.whichever are some of 
RH7's "exciting new technologies" a lot of us don't want.

Dimitri



--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-09-27 Thread Dimitri Maziuk
On 2016-09-26 18:19, Peter S. Shenkin wrote:

> 2D drawing code is tough. The 90/10 rule applies: the last 10% of
> I think for the present purposes what we need is something correct,
> robust and legible, and of course the example shown does not exhibit
> that. (But I don't know what the starting SMILES is, so I don't know
> whether the 7-bonded C is due to a bad SMILES, in which case all bets
> are off.)

That was actually a "kudos to RDKit" post.

I have an application where I need a drawing with all Hs and all atom 
labels, and molecule description in mmCIF(-ish) format. I use RDKit for 
the latter because of OpenBabel's stereochemistry "model", and OpenBabel 
for the drawings because 90% of the time it generates better layouts.

THE comment is that RDKit's layout algorithm appears to be more stable: 
for this molecule OB generated a "better" picture from the original SDF 
downloaded from PubChem, and that complete mess when we re-ordered the 
atoms. RDKit generated the same picture in both cases. only one is a 
mirror image of the other.

Dima


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 3

2016-09-26 Thread Dimitri Maziuk
On 09/26/2016 04:42 PM, Peter S. Shenkin wrote:
> Also, the C attached to H44 has an extra H (its own or someone else's?)
> superimposed upon it.

I wonder if 2D drawing code should really work the same way as the 3D
conformer generation: generate a bunch of candidate layouts and pick the
one(s) with least clashes/overlaps.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] drawing code take 3

2016-09-26 Thread Dimitri Maziuk

On the plus side, when drawing PubChem CID 5057 from a 3D SDF before and
after our canonicalization, RDKit draws a mirror image, but otherwise
the same 2D structure. OB's "after" version is attached: enjoy the
7-bond carbon in the ring.

;)
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The RDKit and modern C++

2016-09-24 Thread Dimitri Maziuk
On 2016-09-24 01:25, Greg Landrum wrote:

> https://medium.com/@greg.landrum_t5/the-rdkit-and-modern-c-48206b966218?source=linkShare-d698b3fa9f7-1474698147
>
> This is a big and important change and I'd love to hear whatever
> feedback members of the community may have. Please comment either on the
> blog post or here.

What are the concrete benefits -14 will bring to the toolkit?

C++ committee has long been criticized for attempting to solve the wrong 
problems every time and every time coming up with solutions that are 
reasonable, logical, and wrong. We've been forced to update our code 
several times due to g++ updates being incompatible with the "language 
formerly known as C++" and if that's the case with RFKit, then you don't 
have much choice. However, if I were rewriting the code for the sake of 
making it "cleaner" or "more maintainable", I'd be seriously considering 
go or objective c or maybe gnat even.

At this point I can only recommend C++ to Comp. Sci. students in the 
Programming languages unit; as an object example of where good 
intentions usually end up. I certainly wouldn't recommend it to chemists 
as a "modern tool", or even a good tool.

Just mu $.02 as "it professional".
Dimitri


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] FindChiralCenters in MOL/SDF files howto

2016-09-14 Thread Dimitri Maziuk
On 09/14/2016 02:23 PM, Dimitri Maziuk wrote:
> lbl=mol.GetAtomWithIdx(s[0]).GetSymbol() + str(s[0]+1)
> print label, ":", s[1]
^^^
Should be lbl


-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] FindChiralCenters in MOL/SDF files howto

2016-09-14 Thread Dimitri Maziuk
I don't know if this fits in the Getting Started, or Cookbook, or if
TPTB decide to wikify the docs and it should go there, but anyway, here
goes. With thanks to everyone responsible, of course.

It does need corrections/clarifications.

-
# MOL/SDF FindChiralCenters howto

```python
# list chiral centers in molecule

sts=rdkit.Chem.FindMolChiralCenters(mol,includeUnassigned=True,force=False)
for s in sts:
lbl=mol.GetAtomWithIdx(s[0]).GetSymbol() + str(s[0]+1)
print label, ":", s[1]
```

The values for `s[1]` are 'R', 'S', or '?' for unknown/unassigned. So
how do you get rid of question marks?

## Best: use coordinates

3D coordinates are stored in `rdkit.Chem.rdchem.Conformer` and every
`rdkit.Chem.rdmol.Mol` has at least one -- **question** is that always
the case?

```python
# get molecule's list of conformers,
# if the 1st one has 3D coordinates,
# flag chiral centers based on those

c=mol.GetConformers()
if c[0].Is3D():
rdkit.Chem.rdmolops.AssignAtomChiralTagsFromStructure(mol)
```

## CTFiles parity flags

CTFile (MOL, SDF) can include stereo flags as either bond annotation
("up" or "down"), or atom parity annotation ("cw", "ccw", or undefined),
or both. According to the specification, atom parity should be ignored
when reading the file, so bond annotation is the only one that actually
matters.

RDKit will do the right thing and read the flags from the bond block
(and write them out when exporting MOL blocks). The problem is there's
plenty of broken software that does not do that and populates atom
parity flags instead.

RDKit will read the atom parity flags also, and store them in
`molParity` property of the `rdkit.Chem.rdchem.Atom` (but not turn them
into chirality tags). The values are
* 1 for clockwise,
* 2 for counter-clockwise, and
* 3 for unspecified.
**Note** that the winding is relative to atom order in the atom list. If
you did anything to the molecule that changed the original atom order,
the flags are most likely no longer valid.

```python
# assign chiral tags from atom parity flags

for a in mol.GetAtoms():
  if a.HasProp("molParity"):
try:
  parity=int(a.GetProp("molParity"))
except ValueError:
  parity=None
if parity==1:
  a.SetChiralTag(rdkit.Chem.rdchem.ChiralType.CHI_TETRAHEDRAL_CW)
elif parity==2:
  a.SetChiralTag(rdkit.Chem.rdchem.ChiralType.CHI_TETRAHEDRAL_CCW)
elif parity==3:
  a.SetChiralTag(rdkit.Chem.rdchem.ChiralType.CHI_UNSPECIFIED)
```

## Still unassigned

That may be deliberate. For example, PubChem CID 602 (as of the time of
this writing) is for "DL-Alanine" describing *either* D- or L-Alanine.
In this case "unspecified" is the correct value for chirality tag.

(And in the case of "2D" SDF it will be; unfortunately PubChem software
will generate a "3D" SDF for CID 602 and it will have a single
conformer: L-Alanine.)

-

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Has3D?

2016-09-13 Thread Dimitri Maziuk
On 09/13/2016 05:24 PM, Rich Lewis wrote:
> Hi Dimitri,
> 
> 3D geometry information for rdkit `Mol`s is stored as `Conformer`s.
These can be accessed with the `GetConformer` method, which takes a
confId as an argument. If you have loaded the molecule from a mol/sdf
file, there should be a single conformer with ID 0, with the coordinates
from the file. `Conformer`s have a `Is3D` method, which *should* do what
you want.

It does. "There's conformer[0]" is the bit I was missing. It seems to be
there for 2D MOLs as well with Is3D() -> False.
Thank you.
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Has3D?

2016-09-13 Thread Dimitri Maziuk
Hi all,

a quick one hopefully: is there something like Mol.Has3D()?

I'm looking at "2D" vs "3D" MOL files and the best I can tell in the 2D
ones Z coord is always 0 whereas in 3D there may (should?) be a non-zero
one. Is there a quick way find out after reading in a MOL if it's one or
the other?

TIA,
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AddHs()

2016-09-13 Thread Dimitri Maziuk
On 09/10/2016 12:08 PM, David Cosgrove wrote:

> On the subject of the documentation, I would encourage you to find the
> GettingStartedWithRDKit.rst in the Docs directory, find somewhere where
> this discussion fits, add it, and send the new version to Greg. If everyone
> did this every time they spent time working out how to do something, the
> documentation would grow very rapidly and by definition grow fastest in
> areas that people are actively using. We don't need to wait for Greg to do
> it all!  He's busy enough as it is, and let's face it, writing docs is dull
> and I'm sure he would appreciate the help.

GitHub does have a wiki. One has to become a "collaborator" to get edit
permissions, AFAIK it doesn't do fine-grained, but what it does should
be good enough.

The wiki is a git repo itself so it could be pulled and integrated into
release builds etc.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SDF and FindMolChiralCenters()

2016-09-10 Thread Dimitri Maziuk
On 09/10/2016 05:20 PM, Paolo Tosco wrote:

> https://gist.github.com/ptosco/ab668ad5c35875d8c47e0e6be9e37e79#file-set_chirality_from_atom_parity_flags-ipynb

Nice. I do have 3D SDFs, that is part of the reason I'm going though
this exercise, but looking at 2D SDF for PubChem's L-alanine, they do
have atom parity set even though there's no 3D coordinates. So I'll
probably go with your solution instead of TagsFromStructure b/c it'll
work for both 2D and 3D MOL files.

(elif p == 3 -> rdkit.Chem.rdchem.ChiralType.CHI_UNSPECIFIED, of course)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SDF and FindMolChiralCenters()

2016-09-10 Thread Dimitri Maziuk
On 09/10/2016 04:34 PM, David Cosgrove wrote:
...
> Also, the atoms in a molecule should have the property _CIPRank set, you
> might be able to do something with that.

Possibly, but since the non-typo'ed function seems to do the trick,
that's good enough for me.

Thanks
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SDF and FindMolChiralCenters()

2016-09-10 Thread Dimitri Maziuk
Oops. AssignAtomChiralTagsFromStructure() does indeed work.

>> If your file has 3D coordinates, AssignAtomChrialTagsFromStructure 

Good to see I'm not the only one with lysdexic fnigers. Apologies for
the noise:

;)
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu





signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SDF and FindMolChiralCenters()

2016-09-10 Thread Dimitri Maziuk
On 09/09/2016 10:42 PM, Ling Chan wrote:
> If your file has 3D coordinates, AssignAtomChrialTagsFromStructure may help
> you.

Maybe, if I wasn't getting

rdkit.Chem.rdmolops.AssignAtomChrialTagsFromStructure( self._mol )
AttributeError: 'module' object has no attribute
'AssignAtomChrialTagsFromStructure'

-- same when calling from rdkit.Chem without rdmolops.
(centos 7, python2-rdkit-2016.03.1-1.el7.centos.x86_64,
rdkit-2016.03.1-1.el7.centos.x86_64)

>> the MOL reader perceives chirality based on the bond stereo field of the
>> bond block. Instead the atom stereo parity value of the atom block is read
>> and stored in the "molParity" atom property, but it is ignored for the
>> purpose of chirality perception, as per the MOL file specs:
>>
>> http://c4.cabrillo.edu/404/ctfile.pdf (see in particular Figure 4)
>>
>> Therefore, if the MOL file lacks the bond stereo information chirality
>> won't be set.

GetProp( "molParity" ) does work, thank you, but as I understand it's
based on atom ordering in the CTAB and not on CIP rules. So it's just as
good as OB's stereo "feature" for my purposes: either way I'd have to
roll my own CIP ordering code to arrive at R/S.

Thanks.
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] SDF and FindMolChiralCenters()

2016-09-09 Thread Dimitri Maziuk
Hi everyone,

m = rdkit.Chem.SupplierFromFilename( filename, removeHs = False ).next()
sts = rdkit.Chem.FindMolChiralCenters( m, includeUnassigned = True )
for s in sts :
lbl = m.GetAtomWithIdx( s[0] ).GetSymbol() + str( s[0] + 1 )
print lbl, ":", s[1]

For L-ALA 3D SDF the output is
 C4 : ?
For D-ALA 3D SDF the output is also
 C4 : ?
And for ALA 2D SDF the output is also
 C4 : ?
If I change includeUnassigned to False, the list returned by
FindMolChiralCenters() is empty.

The SDFs have 1, 2, and 3 in the 7th column in the atom block.

If I use
m = rdkit.Chem.MolFromSmiles( 'C[C@@H](C(=O)O)N' )
instead, the output is
 C2 : S
(this is L-ALA from the same PubChem record as the SDF).

So it looks like MOL reader ignores chirality, is that the case?

Thanks,
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AddHs()

2016-09-09 Thread Dimitri Maziuk
On 09/09/2016 12:56 AM, Greg Landrum wrote:

> This is absolutely correct: if you remove the Hs and then later re-add them
> it is extremely unlikely that you will end up with the same H indices
> before and after the change. It makes much more sense to just use
> removeHs=False

That's what I expected and "removeHs = False" works, thanks. And I was
kidding about histidine of course.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AddHs()

2016-09-08 Thread Dimitri Maziuk
On 09/08/2016 02:26 PM, Brian Kelley wrote:
> Dimitri, Hs are removed.
> 
> Their is a removeHs argument in MolFromMolBlock (python) that defaults to
> true.
> 
> There is a corollary in SDMolSupplier if you are using that.
> 
> supplier = SDMolSupplier(filename, removeHs=false)
> 
> if this helps.

Thank you, it does:
rdkit.Chem.SupplierFromFilename(sys.argv[1], removeHs = False ).next()
returns a molecule with -- presumably "physical" -- hydrogens.

The reason I ask is if they're removed and re-added, I'd worry about
their indexes matching what's in the source file. Which might matter in
the case of e.g. stereospecifically assigned methylene protons. (Or so
they tell me ;)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] drawing code take 2

2016-09-08 Thread Dimitri Maziuk
On 09/08/2016 10:25 AM, Greg Landrum wrote:
> Dimitri,

> Why do you want 2D drawings that include H atoms?

I have an NMR spectroscopist doing peak assignments, proton spectra are
the most commonly used kind for metabolites & such.

As you're well aware, atom nomenclature is an "interesting" issue. We
need stable atom indexing, including protons, and we need indexes on the
picture.

Most of the pipeline is automated, I have a webpage where the
spectroscopist hits a button and gets a usable 2D picture and an atom
table where she can fill in the numbers. So it's not like someone will
sit in front of Marvin and play with options until they get a perfect
picture for their paper.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] drawing code take 2

2016-09-07 Thread Dimitri Maziuk
:(

Here's one where AddHs() really breaks things. It is an unpleasant
molecule to draw.

So it looks like the issue is that
- for CID 112084 call to AddHs() changed the layout (arguably not for
the better), whereas
- for CID 260719 it didn't change the layout/shorten the bonds when it
really should have.

For reference, CID260719.ob.svg is the other toolkit's rendering of the
same file with (atom indexes changed to green from OB's default red).

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


alatis_output_Structure3D_CID_260719.sdf
Description: application/vnd.kinar


signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rdMolDraw2D drawing code

2016-09-06 Thread Dimitri Maziuk
PS. looking at this, it may have come out a little confusing, so

I start with a 3D mol file and run it through
rdkit.Chem.AllChem.Compute2DCoords()

CID112084.svg is what comes out.

Running rdkit.Chem.AddHs() before Compute2DCoords() generates the layout
seen on other 3 pictures.

CID112084.allh.svg is what comes out after AddHs().

CID112084.nonum.svg is what comes out after
op = dr.drawOptions()
for i in range( mh.GetNumAtoms() ) :
op.atomLabels[i] = mh.GetAtomWithIdx( i ).GetSymbol()

CID112084.all.svg is with the above loop changed to
op.atomLabels[i] = mh.GetAtomWithIdx( i ).GetSymbol() + str ((i+1))

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rdMolDraw2D drawing code

2016-09-06 Thread Dimitri Maziuk
On 09/05/2016 03:47 AM, Greg Landrum wrote:
> Dave's right about the font size: it's expressed in whatever coordinates
> the molecule is being drawn in.

I'd put Dave's explanation in the doc++ (?) comment on FontSize
get/setter in MolDraw2D:
http://www.rdkit.org/Python_Docs/rdkit.Chem.Draw.rdMolDraw2D.MolDraw2D-class.html

Right now they just say "float". If you're dealing with SVG output, font
sizes are px, em, and points, and you need a couple of tries to figure
them out. It's a minor thing.

> I suspect the other part of Dmitri's question is about the way bonds are
> shortened so that they don't draw all the way through the atom labels.

Yes, I think. It looks like there's more padding on the top and the
right, and less padding on the left and bottom. E.g. H39 in the attached
"all" version is the worst, but it is consistent in all 3.

The other one on the "all" picture is that the bond to O21 isn't
shortened enough and there isn't much room left between 021 and O24.

OTOH all three labels: H38, O21, H39 could be drawn without even
shortening the bonds if we could just move them up a little. (Well, H39
could be just drawn without shortening the bond at all.)

Last but not least, my starting point is a 3D MOL file and I call
rdkit.Chem.AllChem.Compute2DCoords( mh ) -- last thing before
DrawMolecule(). Attached CID112084.svg is the one generated without
AddHs(), notice how layout is very different on that one.

I've a suspicion that that layout might look better with all the Hs and
numbers added, than the one I get (the other 3 pictures).

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] rdMolDraw2D drawing code

2016-09-02 Thread Dimitri Maziuk
Hi all,

I finally got a round tuit for playing with the drawing code and I like
it -- great job, thank you Greg and Dave and everyone who contributed.

One question though: is it possible to add padding around atom labels?
Or use some other trick to make the attached look less crowded? (Yes, I
do want all Hs and all atom labels with numbers.)

The best I can come up with is reduce the font size a little, that works
fine. I think it'd be nice if the fine manual for MolDraw2D said what
the units used by FontSize()/SetFontSize() are.

So, any better ideas than just slightly smaller labels?

TIA
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The Chlorine molfile question

2016-01-21 Thread Dimitri Maziuk
On 01/20/2016 08:30 PM, Peter S. Shenkin wrote:

> ... the problem that I thought we were trying to
> address is rather the lack of extensibility, the lack of lower-case, the
> fact that different users (even for deposited structures, IIRC) and
> different software products overload the available fields differently (like
> putting partial charge in the Temperature Factor field) and have violated
> the standard by doing necessary but formally disallowed things ...

PDB has a format, with API and everything, that takes care of all of
that. It's called mmCIF. After 25 years (or however long it's been
around) nobody uses it outside of PDB.

I've seen this discussion countless times. It always does this exact
circle. Everybody wants to *have* a better format. Nobody wants to *use*
it because it's "too complex" and "too difficult".

In the meantime we are left trying to guess whether a given "CA" stands
for C-alpha or calcium.
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The Chlorine molfile question

2016-01-20 Thread Dimitri Maziuk
On 01/20/2016 10:06 AM, Peter Shenkin wrote:

... the terrible old PDB file format ...

> As for those who would write that format, fight it! :-)
> 
> The above, in my view, represents the voice of reason, and is therefore
> unlikely to be generally adopted

The long story is that most applications actually using the data need
only the table of coordinates and that's pretty much what PDB file is.
PDB's replacement: mmCIF includes everything and the kitchen sink
wrapped in a subset of STAR-98 syntax. All of that is excess baggage
nobody wants. As much as PDB wants the old busted PDB format gone, they
are not offering a usable alternative that I know of.

That's exactly what we've been doing at BMRB, too, and then complaining
about low rate of adoption of NMR-STAR by the NMR community.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The Chlorine molfile question

2016-01-20 Thread Dimitri Maziuk
On 01/20/2016 04:57 PM, Peter S. Shenkin wrote:
> On Wed, Jan 20, 2016 at 5:33 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu>
> wrote:

>> JSON encodes a single string. That is a problem for sending larger files
>> over the net, say, an NMR structure of a larger molecule with 100 models
>> in the file.
>>
> 
> That's not a problem, conceptually, because you can have an array of
> structures.

No, my point was that streaming isn't a part of JSON specification and
common implementations do not offer it.

https://en.wikipedia.org/wiki/JSON_Streaming

You can cut one model out of a PDB file (or one structure out of and
SDF) and the result is a valid file.

In ASN.1 the length of the value is at the front. If you define your
array as sequence, a single structure pulled out of the middle should be
OK, but the entire sequence is invalid until you read it to the end. I
think in practice you wouldn't define your array as a sequence and
instead have a file full of "disjoint" single structures, possibly with
some kind of metadata header. (I haven't touched ASN.1 since school, so
don't quote me on this.)

Oh wait, that sounds exactly like PDB with its REMARKs and MODELs.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Clustering 1M molecules

2015-08-23 Thread Dimitri Maziuk
On 08/23/2015 11:38 AM, Jing Lu wrote:
 Thanks, Andrew!
 
 Yes, I was thinking about using scikit-learn also. But I guess I need to
 use a data structure for sparse matrix and define a function for
 connectivity. I hope the memory issue won't be a problem.
 Most AgglomerativeClustering algorithms have time complexity with N^2. Will
 that be a problem?

Usual programming solutions are
- if you don't need the whole matrix in RAM at once, cache it to disk.
Otherwise try to split the job into smaller batches.
- Big-Oh notation is relative complexity. In absolute terms, if it
finishes overnight and you only intend to run it a handful of times, N^2
is not worth worrying about. Otherwise try to split into smaller batches
that you can run in parallel on a cluster of computers.

FWIW
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Two SMILES that (I think) should canonicalize to the same thing, but don't

2015-06-17 Thread Dimitri Maziuk
On 06/17/2015 08:36 AM, Peter Shenkin wrote:
 We could consider some quantum-mechanical calculations 
 
 Yes! for the question of the true nature of the molecule. But that not
 need not affect the way canonicalization is done.

Again, define canonical. If you insist on using kekule form in a
binary computer, you'll have to have 2 distinctly different canonical
benzenes. That's just how a binary computer works.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMILES: Why are rings consisting of wildcards assumed to be aromatic?

2015-06-15 Thread Dimitri Maziuk
On 06/15/2015 12:58 PM, Nicholas Firth wrote:

[... RDKit canonicalizes ...]

 Is this not a case of Schrodingers aromaticity? Until we know what the dummy 
 atom is, it is both aromatic and not? 

Well, according to pikiwedia, physicists have the canonical ensemble,
that would fit except you'd probably need to invent a notation other
than SMILES for that whole ensemble wave function quantum brain rocket
surgery. In math and comp. sci. canonical tends to imply unique which
together with wildcards is self-contradictory. Or you could take the
easy way out: canonical as in gospel, in which case RDKit does it
because Greg said so.

HTH
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SDF properties in case of error

2015-05-04 Thread Dimitri Maziuk
On 2015-05-03 15:06, Michael Reutlinger wrote:

 Well... I think my proposal should enable us to put more strict, robust
 QC in place, but I guess you are missing this point.

My definition of strict and robust is if the input is bad, what comes 
out does is an out of band error signal. Such that there is no way it 
can possibly be mistaken for any kind of output other than the error signal.

Dimitri



--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SDF properties in case of error

2015-05-01 Thread Dimitri Maziuk
On 04/30/2015 05:01 PM, Michael Reutlinger wrote:

 However, in some cases this does not help. E.g. when an unknown atom (most
 of the time this is X) is found in the MolBlock the import fails with an
 Post-condition Violation and None is yielded. This is fine to detect the
 problem BUT it is impossible to get any information about the molecule
 which failed.

I'd say the best you can do skip over to the next molecule and report
molecule in lines X to Y is corrupt. Cutting out a chunk of lines from
a file is trivial, and if you're reading from a stream rather than a
file then, well, don't. Without a valid mol block you don't have a
molecule and you shouldn't be making one up. As in conservative in what
you produce.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


  1   2   >