Re: [Rdkit-discuss] Cheminformatics Graduate School Recommendations?

2021-07-20 Thread Stiefl, Nikolaus
Hi Patrick,
Sorry yet another non US-based lab but I will still throw in Sereina Riniker’s 
group (Riniker, Sereina, Prof. Dr. | ETH 
Zurich)
 who recently attracted quite some talent ;-)
Ciao
Nik

From: Markus Sitzmann 
Sent: Monday, July 19, 2021 10:34 PM
To: RDKit Discuss 
Subject: Re: [Rdkit-discuss] Cheminformatics Graduate School Recommendations?


This Message is from an External Sender. Do not click links or open attachments 
unless you trust the sender.
Hi Patrick,

labs I would take a look at (in no particular order and well, a bit heavy on 
European labs):

Irwin Lab, UCFS: 
https://profiles.ucsf.edu/john.irwin
Bajorath Group, Bonn, Germany: 
https://www.limes-institut-bonn.de/forschung/arbeitsgruppen/unit-4/abteilung-bajorath/abt-bajorath-startseite/
Reymond Group, Bern, Switzerland: 
https://www.gdb.unibe.ch/
Rarey Group, Hamburg, Germany: 
https://www.zbh.uni-hamburg.de/personen/amd/mrarey.html
Leach Team, Cambridge, UK: 
https://www.ebi.ac.uk/about/people/andrew-leach
Czodrowski Lab, Dortmund, Germany: 
https://www.czodrowskilab.org/team

Best,
Markus


On Mon, Jul 19, 2021 at 6:17 PM Patrick Neal 
mailto:prnma...@gmail.com>> wrote:
Hi All,
I apologize if this is too far off topic, but I got a recommendation to ask 
here since this community is the most likely to know!
I'm about to graduate from my undergrad chemistry program and I'm looking for 
graduate schools. I started in traditional computational chemistry research, 
but have really loved the cheminformatics/datascience aspects of drug 
discovery. I'm hoping to ask the community if you all have any recommendations 
for academic labs (ideally US based) with interesting cheminformatics research?
I'm specifically interested in fingerprinting methods (encoding 
3D/conformational information), similarity search/clustering compounds at 
scale, and automation tools for QM calculations. But, I would be grateful to 
hear of any labs you think are doing great cheminformatics work!
All the best,
Patrick
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] CIx position at NIBR Cambridge, US

2020-06-08 Thread Stiefl, Nikolaus
Dear all,
I wanted to bring to your attention that our position for a cheminformatics 
expert in the CADD group in our global chemistry community at NIBR is opened 
again.

https://www.novartis.com/careers/career-search/job-details/288340BR

If you feel like you want to apply your skill set to real-world drug discovery 
problems within a group of molecular modellers, data scientists and other CIx 
experts please go ahead :).
Looking forward to your applications.
Best
Nik (Stiefl)

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RGroup matching in RGroup decomposition code

2018-12-13 Thread Stiefl, Nikolaus
HI Paolo
That's cool thanks. This will also maybe help me in trying to solve my problem 
of R-group label numbering not taking into account the actual R-group numbering 
(ie if a molecule has R8 and R5 as sole R-group definitions then they get R1,R2 
labels).
I also was in contact with Brian Kelley and he suggested to fix it in the 
underlying codebase so I hope this will be fixed in the next version :)
Cheers
Nik


From: Paolo Tosco 
Sent: Thursday, December 13, 2018 11:09 AM
To: Stiefl, Nikolaus ; RDKit Discuss 

Subject: Re: [Rdkit-discuss] RGroup matching in RGroup decomposition code


Hi Nik,

There is a way to achieve what you describe, even though it is slightly 
cumbersome:

from rdkit import Chem

from rdkit.Chem import rdmolops

from rdkit.Chem.Draw import MolsToGridImage, IPythonConsole

from rdkit.Chem.rdRGroupDecomposition import (

RGroupDecomposition, RGroupDecompositionParameters)

smis = ['Cc1ccnc(O)c1', 'Cc1cc(Cl)ccn1', 'Nc1n1',

'Nc1ccc(Br)cn1', 'c1ccncc1']

mols = [Chem.MolFromSmiles(smi) for smi in smis]

MolsToGridImage(mols)
[cid:image001.png@01D49312.8A22D520]

params = RGroupDecompositionParameters()

# rather than using the built-in flag we will manually

# adjust the query in two steps using AdjustQueryProperties()

params.onlyMatchAtRGroups = False

# just atom number the rgroups

core1 = Chem.MolFromSmiles('n1ccc([*:2])cc([*:1])1')

# make dummies queries

core1_params = rdmolops.AdjustQueryParameters()

core1_params.makeDummiesQueries = True

core1_params.adjustDegree = False

core1 = rdmolops.AdjustQueryProperties(core1, core1_params)

# change the atoms connected to the dummies into dummies

former_atomic_nums = {}

for b in core1.GetBonds():

if (b.GetBeginAtom().GetAtomicNum() == 0):

a = b.GetEndAtom()

elif (b.GetEndAtom().GetAtomicNum() == 0):

a = b.GetBeginAtom()

else:

continue

former_atomic_nums[a.GetIdx()] = a.GetAtomicNum()

a.SetAtomicNum(0)

# this has the same effect as setting onlyMatchAtRGroups to True

# but we can avoid applying it the atoms connected to the R groups

core1_params.adjustHeavyDegreeFlags = Chem.ADJUST_IGNOREDUMMIES

core1_params.makeDummiesQueries = False

core1_params.adjustDegree = False

core1_params.adjustHeavyDegree = True

core1 = rdmolops.AdjustQueryProperties(core1, core1_params)

# restore the original atomic numbers

for i, an in former_atomic_nums.items():

core1.GetAtomWithIdx(i).SetAtomicNum(an)

rg1 = RGroupDecomposition(core1, params)

failMols = []

for m in mols:

res = rg1.Add(m)

if res < 0:

failMols.append(m)

rg1.Process()

True

print("FailedMols: %s"%" ".join([Chem.MolToSmiles(m) for m in failMols]))

FailedMols: Nc1ccc(Br)cn1

core1
[cid:image002.png@01D49312.8A22D520]

d = rg1.GetRGroupsAsColumns(asSmiles=False)

MolsToGridImage(d['Core'])
[cid:image003.png@01D49312.8A22D520]

MolsToGridImage(d['R1'])
[cid:image004.png@01D49312.8A22D520]

MolsToGridImage(d['R2'])
[cid:image005.png@01D49312.8A22D520]

Hope that helps, cheers
p.

On 12/11/18 11:01, Stiefl, Nikolaus wrote:
Hi all,

I was playing around with the RGroup decomposition code and must say that I am 
pretty impressed by it. The fact that one can directly work with a MDL R-group 
file and that the output is a pandasDataFrame makes analysis really slick - 
well done !

However, one thing that irritates me is the fact that seemingly when I have 
R-groups defined in my core and enforce matching only at R-groups then 
molecules having hydrogen atoms in that position are ignored in the "Add" step. 
I would expect those to be included as long as the molecules don't have 
additional heavy atoms in positions that are not defined as R-groups in the 
core.

__ snip 

from rdkit import Chem
from rdkit.Chem.rdRGroupDecomposition import RGroupDecomposition, 
RGroupDecompositionParameters


smis = ['Cc1ccnc(O)c1', 'Cc1cc(Cl)ccn1', 'Nc1n1', 'Nc1ccc(Br)cn1', 
'c1ccncc1']
mols = [Chem.MolFromSmiles(smi) for smi in smis]
params = RGroupDecompositionParameters()

params.onlyMatchAtRGroups = True

# just atom number the rgroups
core1 = Chem.MolFromSmiles('n1ccc([*:2])cc([*:1])1')
rg1 = RGroupDecomposition(core1, params)

failMols = []
for m in mols:
  res = rg1.Add(m)
  if res < 0:
failMols.append(m)

rg1.Process()

print("FailedMols: %s"%" ".join([Chem.MolToSmiles(m) for m in failMols]))

 end snip 


the output shows that molecules 3-5 are not included at the "Add" step

>> FailedMols: Nc1n1 Nc1ccc(Br)cn1 c1ccncc1

For molecules 4 (the 5-bromo substituted aminopyridine) I agree, however I 
don't understand how I can make sure mols 3 and 5 are also included ... is 
there a magic parameter that I can set?

Cheers
Nik







___

Rdkit-discuss mailing list

Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-d

[Rdkit-discuss] RGroup matching in RGroup decomposition code

2018-12-11 Thread Stiefl, Nikolaus
Hi all,

I was playing around with the RGroup decomposition code and must say that I am 
pretty impressed by it. The fact that one can directly work with a MDL R-group 
file and that the output is a pandasDataFrame makes analysis really slick - 
well done !

However, one thing that irritates me is the fact that seemingly when I have 
R-groups defined in my core and enforce matching only at R-groups then 
molecules having hydrogen atoms in that position are ignored in the "Add" step. 
I would expect those to be included as long as the molecules don't have 
additional heavy atoms in positions that are not defined as R-groups in the 
core.

__ snip 

from rdkit import Chem
from rdkit.Chem.rdRGroupDecomposition import RGroupDecomposition, 
RGroupDecompositionParameters


smis = ['Cc1ccnc(O)c1', 'Cc1cc(Cl)ccn1', 'Nc1n1', 'Nc1ccc(Br)cn1', 
'c1ccncc1']
mols = [Chem.MolFromSmiles(smi) for smi in smis]
params = RGroupDecompositionParameters()

params.onlyMatchAtRGroups = True

# just atom number the rgroups
core1 = Chem.MolFromSmiles('n1ccc([*:2])cc([*:1])1')
rg1 = RGroupDecomposition(core1, params)

failMols = []
for m in mols:
  res = rg1.Add(m)
  if res < 0:
failMols.append(m)

rg1.Process()

print("FailedMols: %s"%" ".join([Chem.MolToSmiles(m) for m in failMols]))

 end snip 


the output shows that molecules 3-5 are not included at the "Add" step

>> FailedMols: Nc1n1 Nc1ccc(Br)cn1 c1ccncc1

For molecules 4 (the 5-bromo substituted aminopyridine) I agree, however I 
don't understand how I can make sure mols 3 and 5 are also included ... is 
there a magic parameter that I can set?

Cheers
Nik

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Stiefl, Nikolaus
Even better ☺

From: Greg Landrum 
Date: Friday 29 June 2018 at 18:04
To: GMCProfile 
Cc: "rdkit-discuss@lists.sourceforge.net" 
Subject: Re: [Rdkit-discuss] elimination of small fragments

How about just GetLargestFragment()?

On Fri, 29 Jun 2018 at 16:45, Stiefl, Nikolaus 
mailto:nikolaus.sti...@novartis.com>> wrote:
Quick question – mostly to the core developers I guess:

I just checked and have that kind of thing in my code in at least 5 different 
places - wouldn’t it make sense to have that kind of functionality as a 
convenience function as part of the GetMolFrags method?

Something along the lines of

rdmolops.GetMolFrags(mol, asMols = True, largestFragmentOnly = True)

? Just a thought …
Cheers
Nik


From: Alfredo Quevedo mailto:maquevedo@gmail.com>>
Date: Friday 29 June 2018 at 12:06
To: Andrew Dalke mailto:da...@dalkescientific.com>>
Cc: Stephen Roughley via Rdkit-discuss 
mailto:rdkit-discuss@lists.sourceforge.net>>
Subject: Re: [Rdkit-discuss] elimination of small fragments

thank you much much Andrew for this detailed explanation
regards
Alfredo
Enviado desde 
BlueMail<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.bluemail.me_r-3Fb-3D13187=DwMFaQ=ZbgFmJjg4pdtrnL2HUJUDw=ye79geYsJOYow8nmAS-YeajnH05xvpvKYegxy7w7vuo=75LbuhLBfnqpbB3VArtJWjFpQrdROqJLMAweVe6ID9Q=i8SBAYy6pifuFUux0kfKjgrXeSo64fbM7qq1vf10v_w=>
En 29 de junio de 2018, en 07:02, Andrew Dalke 
mailto:da...@dalkescientific.com>> escribió:

On Jun 28, 2018, at 22:08, Paolo Tosco 
mailto:paolo.tosco.m...@gmail.com>> wrote:

 if you wish to keep only the largest disconnected fragment you may try the 
following:

 mols = list(rdmolops.GetMolFrags(mol, asMols = True))
 if (mols):
 mols.sort(reverse = True, key = lambda m: m.GetNumAtoms())
 mol = mols[0]

A somewhat simpler .. or at least shorter ... version is:

mols = rdmolops.GetMolFrags(mol, asMols = True)
mol = max(mols, default=mol, key=lambda m: m.GetNumAtoms())

The max() function goes through the molecules that GetMolFrag returns.

If the list is empty, it returns the 'default' value, which is the original 
molecule. (This is what Paolo's code does. Another option is to use None as the 
default value.)

Otherwise, since 'key' is specified, its value is used as a function to 
determine a value for each molecule. That is, for each term 'm' in the list of 
'mols', it computes m.GetNumAtoms(), and uses that return value to select an 
object with the maximum value.

In this case, it selects a molfrag output molecule with the most atoms.

I think I've just added a topic to cover for the upcoming Python/RDKit training 
session in September! :)

For those interested, remember to sign up soon.

Cheers,

Andrew
da...@dalkescientific.com<mailto:da...@dalkescientific.com>



Check out the vibrant tech community on one of the world's most
engaging tech sites, 
Slashdot.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__Slashdot.org=DwMFaQ=ZbgFmJjg4pdtrnL2HUJUDw=ye79geYsJOYow8nmAS-YeajnH05xvpvKYegxy7w7vuo=75LbuhLBfnqpbB3VArtJWjFpQrdROqJLMAweVe6ID9Q=h4ZtqIyv1SgsHQdUOLsu4O6hQ8h8t6pkdfVyvnZ08t4=>!
 
http://sdm.link/slashdot<https://urldefense.proofpoint.com/v2/url?u=http-3A__sdm.link_slashdot=DwMFaQ=ZbgFmJjg4pdtrnL2HUJUDw=ye79geYsJOYow8nmAS-YeajnH05xvpvKYegxy7w7vuo=75LbuhLBfnqpbB3VArtJWjFpQrdROqJLMAweVe6ID9Q=dWUkmyQ6plFLpxwGe7NZWqz46s0w27g19OXXzmG0UtU=>



Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_rdkit-2Ddiscuss=DwMFaQ=ZbgFmJjg4pdtrnL2HUJUDw=ye79geYsJOYow8nmAS-YeajnH05xvpvKYegxy7w7vuo=75LbuhLBfnqpbB3VArtJWjFpQrdROqJLMAweVe6ID9Q=RIM0cdqho8moISItb75211bxVlxWPUCZCf6dWh4KNtI=>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! 
http://sdm.link/slashdot<https://urldefense.proofpoint.com/v2/url?u=http-3A__sdm.link_slashdot=DwMFaQ=ZbgFmJjg4pdtrnL2HUJUDw=ye79geYsJOYow8nmAS-YeajnH05xvpvKYegxy7w7vuo=A7rDVIg3atHUoU-oJqjIc2Dtoe0tPudYQEuvWycF0uo=IR6a83Iykc2U9efkA-6GKIkjBhscyNNDv4CaXwNcbkM=>___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_rdkit-2Ddiscuss=DwMFaQ=ZbgFmJjg4pdtrnL2HUJUDw=ye79geYsJOYow8nmAS-YeajnH05xvpvKYegxy7w7vuo=A7rDVIg3atHUoU-oJqjIc2Dtoe0tPudYQEuvWycF0uo=a3wTrovDTDYS6qjT6jBR6ymsnt5gXd6db5UKg7hLiEA=>
--
Check out the 

Re: [Rdkit-discuss] elimination of small fragments

2018-06-29 Thread Stiefl, Nikolaus
Quick question – mostly to the core developers I guess:

I just checked and have that kind of thing in my code in at least 5 different 
places - wouldn’t it make sense to have that kind of functionality as a 
convenience function as part of the GetMolFrags method?

Something along the lines of

rdmolops.GetMolFrags(mol, asMols = True, largestFragmentOnly = True)

? Just a thought …
Cheers
Nik


From: Alfredo Quevedo 
Date: Friday 29 June 2018 at 12:06
To: Andrew Dalke 
Cc: Stephen Roughley via Rdkit-discuss 
Subject: Re: [Rdkit-discuss] elimination of small fragments

thank you much much Andrew for this detailed explanation
regards
Alfredo
Enviado desde 
BlueMail
En 29 de junio de 2018, en 07:02, Andrew Dalke 
mailto:da...@dalkescientific.com>> escribió:

On Jun 28, 2018, at 22:08, Paolo Tosco  wrote:

 if you wish to keep only the largest disconnected fragment you may try the 
following:

 mols = list(rdmolops.GetMolFrags(mol, asMols = True))
 if (mols):
 mols.sort(reverse = True, key = lambda m: m.GetNumAtoms())
 mol = mols[0]

A somewhat simpler .. or at least shorter ... version is:

mols = rdmolops.GetMolFrags(mol, asMols = True)
mol = max(mols, default=mol, key=lambda m: m.GetNumAtoms())

The max() function goes through the molecules that GetMolFrag returns.

If the list is empty, it returns the 'default' value, which is the original 
molecule. (This is what Paolo's code does. Another option is to use None as the 
default value.)

Otherwise, since 'key' is specified, its value is used as a function to 
determine a value for each molecule. That is, for each term 'm' in the list of 
'mols', it computes m.GetNumAtoms(), and uses that return value to select an 
object with the maximum value.

In this case, it selects a molfrag output molecule with the most atoms.

I think I've just added a topic to cover for the upcoming Python/RDKit training 
session in September! :)

For those interested, remember to sign up soon.

Cheers,

Andrew
da...@dalkescientific.com




Check out the vibrant tech community on one of the world's most
engaging tech sites, 
Slashdot.org!
 
http://sdm.link/slashdot



Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

2017-01-19 Thread Stiefl, Nikolaus
Then I guess Greg’s solution is the better suited one since you don’t have to 
specify a list of isotopes (assuming that your input compounds will not have 
things like 12C specified). Minor modification:

In [23]: q = rdqueries.IsotopeGreaterQueryAtom(1)

In [24]: atomNums = [1,6,7,8,15,16]

In [25]: [x.GetIdx() for x in 
Chem.MolFromSmiles('CCC[13CH3]').GetAtomsMatchingQuery(q) if x.GetAtomicNum() 
in atomNums]
Out[25]: [3]

In [26]: [x.GetIdx() for x in 
Chem.MolFromSmiles('CCC[19F]').GetAtomsMatchingQuery(q) if x.GetAtomicNum() in 
atomNums]
Out[26]: []





From: Milinda Samaraweera 
Date: Wednesday 18 January 2017 at 23:01
To: Bob Funchess 
Cc: RDKit Discuss , Greg Landrum 

Subject: Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

Hi Bob,
I am trying to filter out any compound that does not have the most stable 
isotopic form;  (anything other than: 12C,1H,14N,16O, 31P, 32S) or to contain 
only MonoIsotopic compounds.
Thanks,
Milinda
​
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

2017-01-18 Thread Stiefl, Nikolaus
Hi
Maybe this is much less efficient but I guess if you need it for specific 
isotopes then you could try using a smarts pattern and check for that?

In [20]: q = Chem.MolFromSmarts("[13C,14C,2H,3H,15N,24P,46P,33S,34S,36S]")

In [21]: m = Chem.MolFromSmiles('CC[15NH2]')

In [22]: m.HasSubstructMatch(q)
Out[22]: True


So you could loop over your molecules and then remove the ones that match the 
smarts.
Ciao
Nik


From: Milinda Samaraweera 
Date: Wednesday 18 January 2017 at 20:47
To: Greg Landrum 
Cc: RDKit Discuss 
Subject: Re: [Rdkit-discuss] Check for Heavy Isotopes using RdKit

Greg,
I am looking to remove entries that contain un-stable isotopes of elements 
CHNOPS (e.g. heavy_isotopes =['13C', '14C', '2H', '3H', '15N', '24P', '46P', 
'33S', '34S', '36S'] ). Is there a way to modify the above code to achieve that?
Thanks,
Milinda


On Wed, Jan 18, 2017 at 11:16 AM, Greg Landrum 
> wrote:
Hi Milinda,

Here's an approach that finds all the atoms that have an isotope specified:

In [1]: from rdkit import Chem

In [2]: from rdkit.Chem import rdqueries

In [3]: q = rdqueries.IsotopeGreaterQueryAtom(1)

In [7]: list(x.GetIdx() for x in 
Chem.MolFromSmiles('CC[13CH3]').GetAtomsMatchingQuery(q))
Out[7]: [2]

In [8]: list(x.GetIdx() for x in 
Chem.MolFromSmiles('[12CH3]CC[13CH3]').GetAtomsMatchingQuery(q))
Out[8]: [0, 3]

Does that do what you want it to do?

-greg



On Wed, Jan 18, 2017 at 3:56 PM, Milinda Samaraweera 
> wrote:
Dear Experts,
I am trying to figure out a way to exclude entries which contain heavy atoms 
(13C, 2H, 3H, etc), from a SD file (which has close to two thousand entries) 
and write an updated file with the remaining entries.

I do understand how to read/write SD files using rdkit.

What I do understand is how to detect entries with heavy isotopes: Is there an 
efficient and correct way of achieving this using rdkit?

thanks,
--
Milinda Samaraweera

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




--
Milinda Samaraweera, Ph.D.
Postdoctoral Fellow, Department of Pharmacy
University of Connecticut
69 North Eagleville road
Storrs, CT, 06269
milindaatw...@gmail.com
860-617-8046
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Corresponding Features to Mol2 file

2016-03-29 Thread Stiefl, Nikolaus
Hi Nick
What you are after is (based on your example below)

feat = feats[0]
feat.GetAtomIds()

(here for the first feature. In addition you might also want to check out for a 
better understanding what is what:

feat.GetFamiliy()
feat.GetType()

Ciao
Nik


From: Nicholas Michelarakis 
>
Date: Tuesday 29 March 2016 15:52
To: 
"rdkit-discuss@lists.sourceforge.net"
 
>
Subject: [Rdkit-discuss] Corresponding Features to Mol2 file

Hello,

Possibly a stupid question. I have a mol2 file of a ligand (I can also use a 
pdb or mol file) which I translate into a SMILES which I then use in:

m = Chem.MolFromSmiles(ligand_smiles)
feats = factory.GetFeaturesForMol(m)

The ligand consists of 23 atoms and feats gives me 16 features. I was wondering 
if there is a a way to know which atoms of this mol2 (or mol or pdb) file 
correspond to which features given by the GetFeaturesforMol function?

Thank you very much in advance.

Best,
Nick
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] creating new properties on a molecule

2016-03-14 Thread Stiefl, Nikolaus
Hi Chris
No - that should just work:

m = Chem.MolFromSmiles("CC")
list(m.GetPropNames())
[]

m.SetProp("NewProp","TheNewProp")
list(m.GetPropNames())
['NewProp']

m.GetProp("NewProp")
'TheNewProp'

Ciao
Nik

From: chris dalton >
Date: Monday 14 March 2016 21:19
To: 
"rdkit-discuss@lists.sourceforge.net"
 
>
Subject: [Rdkit-discuss] creating new properties on a molecule

I want to create some new properties for an RDKit molecule and add calculated 
values for them. I have tried using SetProp() to create the properties but this 
only appears to modify values of already-present properties. Is there something 
else I should be using to create them?

thanks,
Chris
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Load mol2 file with partial charges

2015-11-20 Thread Stiefl, Nikolaus
Hi Gaetano
The properties of the mol2 file are stored as atom properties. Here is an
example (sorry - the only thing I have at hand right now is a benzene mol2
file created with moe - note the mol2 file parser was tested on corina
mol2 files)

Here is the file
stiefni2@nrchbs-ldl30105:rdkitMol2>cat benzenePC.mol2
@MOLECULE
NoName
12 12 1 0 0 
SMALL
USER_CHARGES




@ATOM
  1 C1  -0.0278 1.3844 0.0097 C.ar  1 <1>  -0.0618
  2 H1  -0.9760 1.9148 0.0213 H 1 <1>   0.0618
  3 C2   1.1708 2.0983 0.0020 C.ar  1 <1>  -0.0618
  4 H2   1.1561 3.1847 0.0074 H 1 <1>   0.0618
  5 C3   2.3883 1.4173-0.0129 C.ar  1 <1>  -0.0618
  6 H3   3.3217 1.9733-0.0188 H 1 <1>   0.0618
  7 C4   2.4072 0.0223-0.0200 C.ar  1 <1>  -0.0618
  8 H4   3.3554-0.5081-0.0316 H 1 <1>   0.0618
  9 C5   1.2087-0.6916-0.0123 C.ar  1 <1>  -0.0618
 10 H5   1.2235-1.7781-0.0179 H 1 <1>   0.0618
 11 C6  -0.0088-0.0106 0.0026 C.ar  1 <1>  -0.0618
 12 H6  -0.9423-0.5667 0.0087 H 1 <1>   0.0618
@BOND
  1   1   2  1   
  2   1   3  ar  
  3   1  11  ar  
  4   3   4  1   
  5   3   5  ar  
  6   5   6  1   
  7   5   7  ar  
  8   7   8  1   
  9   7   9  ar  
 10   9  10  1   
 11   9  11  ar  
 12  11  12  1   
@SUBSTRUCTURE
  1    5 GROUP 4   0


# MOE 2014.09 (io_trps.svl 2012.10)

And this is the file parsed


In [21]: from __future__ import print_function
In [22]: mol2 = Chem.MolFromMol2File('benzenePC.mol2¹)
In [23]: atoms = mol2.GetAtoms()

In [24]: list(atoms[0].GetPropNames())
Out[24]: ['_TriposAtomName', '_TriposAtomType', '_TriposPartialCharge']

In [25]: for a in atoms:
print(" ".join([a.GetProp(x) for x in a.GetPropNames()]))
   : 
C1 C.ar -0.0618
C2 C.ar -0.0618
C3 C.ar -0.0618
C4 C.ar -0.0618
C5 C.ar -0.0618
C6 C.ar -0.0618

Hope this helps

Nik


On 20/11/15 00:28, "Gaetano Calabro"  wrote:

>Hi there,
>
>I would like to load a mol2 file with partial charge information in
>RDkit. How can I retrieve the atomic partial charge in RDkit? I haven't
>seen any function related to it.
>
>Cheers,
>
>Gaetano
>
>--
>
>___
>Rdkit-discuss mailing list
>Rdkit-discuss@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Chemistry 101 question...

2013-10-22 Thread Stiefl, Nikolaus
Hi James,
Interesting. I wonder if this is also dependent on the transport phase that was 
used. Do you have any info on that? Was it a typical 10% MeOH or more something 
with dichlormethane?
Cheers
Nik


From: James Davidson j.david...@vernalis.commailto:j.david...@vernalis.com
Date: Tuesday, October 22, 2013 1:13 PM
To: 
rdkit-discuss@lists.sourceforge.netmailto:rdkit-discuss@lists.sourceforge.net
 
rdkit-discuss@lists.sourceforge.netmailto:rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Chemistry 101 question...

Hi JP, Nik, Greg, RDKitters

The question about the lipophilicity (or otherwise) of nitro groups was 
interesting to me…  I came from a CNS background, where there was, of course, a 
stricter requirement for molecules to be suitably lipophilic to cross the 
blood-brain barrier.  My recollection was that the observed lipophilicity of 
nitro groups was dependent on their local environments (ie electron rich / +m 
gave more polar character, and electron poor / -m gave more polar character)…  
But rather than rely on my hazy recollections, I decided to have a quick look 
back at some historical reverse-phase analytical LC data.

What I did was took all retention times (in mins) under one well-used gradient 
method, and generated the matched-molecular pairs using George’s KNIME node.  I 
was then only interested in *[H]  [*][N+](=O)[O-] transformations, so 
filtered-down to just those changes involving 5 atoms in the transformation 
(because this was quicker than chemically searching!).  I then grouped across 
the examples of transformations to give some average changes in retention time, 
plus n, range, sd:

Transformation

Mean RT change (min)

RT range (min)

SD

n

*[H][*]CCC

2.5

3.3

0.999

28

*[H][*]C(C)C

2.19

5.47

1.09

37

*[H][*]CCCl

1.91

1.5

1.06

2

*[H][*]C(F)F

1.22

1.36

0.748

3

*[H][*]C1CC1

1.18

1.04

0.436

4

*[H][*]N(C)C

1.08

1.21

0.472

6

*[H][*]CSC

0.67

0

0

1

*[H][*]OCC

0.479

4.67

1.17

15

*[H][*][N+](=O)[O-]

0.169

2.82

0.645

35

*[H][*]NCC

0.0625

0.045

0.0318

2

*[H][*]CCO

0.06

0.04

0.0283

2

*[H][*]COC

-0.001

2.46

0.62

14

*[H][*]CC=C

-0.357

0

0

1

*[H][*]C(C)=O

-0.397

1.21

0.696

3

*[H][*]C(=O)O

-0.848

4.3

2.17

3

*[H][*]CC#N

-1.3

2.35

1.66

2

*[H][*]C(N)=O

-2.72

0

0

1

*[H][*]CCN

-2.77

0

0

1



So on average over the 35 examples of H -- NO2 the change made the molecules 
slightly more lipophilic (or, at least, they were retained slightly longer on a 
C18 column).
I expect there is much more data-digging that could be done – particularly with 
larger data sets, and (maybe) with proper logP / logD measurements; but for now 
I am going to stick to thinking NO2 groups can be lipophilic additions(!)

Cheers

James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or 
postmas...@vernalis.commailto:postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
100 Berkshire Place
Wharfedale Road
Winnersh, Berkshire
RG41 5RD, England
Tel: +44 (0)118 938 

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the Company address and 
registration details link at the bottom of the page..
__
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDkit user beginner.

2013-07-04 Thread Stiefl, Nikolaus
Hi - welcome to the community
You have a typo there

GenMACCSkeys


Should actually be 

GenMACCSKeys


(ie an uppercase K in Keys)

Try ipython (with the ipython notebook) then autocomplete will solve those
issues for you more or less. The notebook is cool and there is some info
in the mailing list on how to use it.
Ciao
Nik


On 7/4/13 9:16 AM, segie...@sanbi.ac.za segie...@sanbi.ac.za wrote:

Hi all,

I just started using RDkit for chemoinformatics analysis.
I tried out this code in the tutorial but got an error(please advise):

The code:

from rdkit import Chem
from rdkit.Chem import MACCSkeys


nat = Chem.SDMolSupplier(nat.sdf)
fps = [MACCSkeys.GenMACCSkeys(x) for x in nat]
fp = DataStructs.FingerprintSimilarity(fps[0], fps[89])
print fp




The error message:

Traceback (most recent call last):
  File rdkittest2.py, line 7, in module
fps = [MACCSkeys.GenMACCSkeys(x) for x in nat]
AttributeError: 'module' object has no attribute 'GenMACCSkeys'



--

This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SmilesWriter

2013-07-03 Thread Stiefl, Nikolaus
Hmm,
I don't know if there is a predefined option in RDKit but if you have a
list of properties (say propertyList) you want to pick you could write
directly to a text file something along those lines:

w.write(%s\t%s%(Chem.MolToSmiles(m),\t.join([m.GetProp(p) for p in
propertyList if m.HasProp(p)])))

Ciao
Nik


On 7/3/13 7:59 AM, paul.czodrow...@merckgroup.com
paul.czodrow...@merckgroup.com wrote:

Dear RDKitter,

is there a way to include (all or some selected) SD tags when writing a
mol object into a SMILES file?

My experience is that only the name of the mol object is outputted to the
SMILES file.


Cheers,
Paul


This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended
recipient, you must not copy this message or attachment or disclose the
contents to any other person. If you have received this transmission in
error, please notify the sender immediately and delete the message and
any attachment from your system. Merck KGaA, Darmstadt, Germany and any
of its subsidiaries do not accept liability for any omissions or errors
in this message which may arise as a result of E-Mail-transmission or for
damages resulting from any unauthorized changes of the content of this
message and any attachment thereto. Merck KGaA, Darmstadt, Germany and
any of its subsidiaries do not guarantee that this message is free of
viruses and does not accept liability for any damages caused by any virus
transmitted therewith.

Click http://www.merckgroup.com/disclaimer to access the German, French,
Spanish and Portuguese versions of this disclaimer.

--

This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Volume Overlap using RDKit

2013-01-22 Thread Stiefl, Nikolaus
Hi JP,
Do you want to do a shape align or just any sort of alignment?

There is a MolAlign in All.Chem which will give you an RMSD align. This works 
well if you have reasonably similar molecules (do a GetSubstructMatch before to 
get the atom list).
Don't think there is a shape alignment for whole molecules – there is however 
the subshapeAligner module in rdkit.Chem but I never used this one.

Ciao
Nik



From: JP jeanpaul.ebe...@inhibox.commailto:jeanpaul.ebe...@inhibox.com
Date: Tue, 22 Jan 2013 12:20:15 +
To: 
rdkit-discuss@lists.sourceforge.netmailto:rdkit-discuss@lists.sourceforge.net
 
rdkit-discuss@lists.sourceforge.netmailto:rdkit-discuss@lists.sourceforge.net
Subject: [Rdkit-discuss] Volume Overlap using RDKit


RDKitters,

Long time no type, I've been busy with that little chestnut of my PhD...

I would like to align two molecules and calculate the shape tanimoto with 
ShapeTanimotoDist(...).  The issue is that this method requires a pre-defined 
alignment - which I do not have.

Is there a way how to do a molecular volume overlap in RDKit?  I cannot seem to 
find it and the only related discussion I can find is 
herehttp://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg00512.html.
  But the fourth slide 
herehttp://www.slideshare.net/baoilleach/cinfony-bring-cheminformatics-toolkits-into-tune,
 clearly states that RDKit is able to do this.

If this is not RDKit-doable anyone else has come across some publicly available 
tools to do this?  A quick search lead me to 
Shape-ithttp://silicos-it.com/software/shape-it/1.0.1/shape-it.html, from 
Hans (who I met at the user group meeting) - anyone used this before?

p.s. no one ever sent/made available the group photo we took at the 1st RDKit 
meeting :(

-
Jean-Paul Ebejer
Early Stage Researcher
-- 
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, 
Windows 8 Apps, JavaScript and much more. Keep your skills current with 
LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. 
ON SALE this month only -- learn more at: 
http://p.sf.net/sfu/learnnow-d2d___ 
Rdkit-discuss mailing list 
Rdkit-discuss@lists.sourceforge.netmailto:Rdkit-discuss@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] non-smallest rings

2013-01-21 Thread Stiefl, Nikolaus
do you just have to check if an atom is in a 6-membered ring? If so then


In [8]: m = 
Chem.MolFromSmiles('COc1ccc(cc1O[C@H]1C[C@@H]2CC[C@H]1C2)C1CNC(=O)NC1')

In [9]: [a.IsInRingSize(6) for a in m.GetAtoms()]
Out[9]: 
[False,
 False,
 True,
 True,
 True,
 True,
 True,
 True,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 True,
 True,
 True,
 True,
 False,
 True,
 True]

Should help.

Sorry - maybe I do not fully understand your question

Ciao
Nik

On 1/21/13 4:13 PM, Paul Emsley pems...@mrc-lmb.cam.ac.uk wrote:


I am making heavy weather of the following problem - and am wondering if
I am missing something (such as a useful RDKit function).

I am working on this beasty (as an example):

http://www.rcsb.org/pdb/ligand/ligandsummary.do?hetId=0CP

COc1ccc(cc1O[C@H]1C[C@@H]2CC[C@H]1C2)C1CNC(=O)NC1

which has a norbornane substituent. I am trying to prepare input for a
downstream program that needs to know if the norbornane atoms are in a
6-membered ring [1].  RingInfo gives me the 2 5-membered rings.  I am
strugging to make use of that information to find 6-membered rings.  I
have been using makeRingNeighborMap() and pickFusedRings().  Am I
missing an RDKit function that finds all rings?

Cheers,

Paul.


[1] actually, all atoms but it is the norbornane atoms with which I
struggle.


--

Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412__
_
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] fdef files

2012-10-09 Thread Stiefl, Nikolaus
Hi James,
Yes – I just checked the files I generated for the ph4 definitions and they 
definitely do not have the tautomer recognition (guess I am just not SMART 
enough for this ;-). I will follow up with Greg and see how quickly we can get 
the fdef file as a contribution so people can start to add/delete/change things 
in there.
I believe someone also mentioned that there might be better / other SMARTS 
definitions for ph4 properties around. A pointer to those would be appreciated 
on my end so maybe I can compare a bit before release?
Cheers
Nik


From: James Davidson j.david...@vernalis.commailto:j.david...@vernalis.com
Date: Tue, 9 Oct 2012 07:47:38 +0100
To: 
rdkit-discuss@lists.sourceforge.netmailto:rdkit-discuss@lists.sourceforge.net
Subject: [Rdkit-discuss] fdef files

Dear All,

At the UGM I promised to send a reminder (mainly for Greg, but possibly also 
Nik?) about the potential use of two fdef files – one with and one without 
explicit hydrogens.  During the demo of the PyMOL interactivity with 3D ligand 
+ pharmacophoric features (showfeats.py script), it was clear that explicitly 
‘protonated’ nitrogens (eg in a specific tautomer of pyrrazole) were being 
flagged as possible hydrogen-bond acceptors, as well as donors, due to the fdef 
being used recognising the potential for a 1,2-H shift.  I made the comment at 
the time that in cases where the molecule had hydrogens specifically added 
(perhaps a docking result), then it would probably be best to flag ‘specific’ 
rather than ‘potential’ Ph4 features – hence the use of separate fdef files.

My recollection (from Nik?) was that this was perhaps already the case in NIBR 
user of RDKit(?)

Kind regards

James

_
Dr James Davidson
Senior Team Leader,
Medicinal  Computational Chemistry
Vernalis (RD) Ltd
Granta Park
Great Abington
Cambridgeshire
CB21 6GB
t: +441223 89
f: +441223 895556
d: +441223 895428
m: +447595 258005

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or 
postmas...@vernalis.commailto:postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
100 Berkshire Place
Wharfedale Road
Winnersh, Berkshire
RG41 5RD, England
Tel: +44 (0)118 938 

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the Company address and 
registration details link at the bottom of the page..
__
-- 
Don't let slow site performance ruin your business. Deploy New Relic APM Deploy 
New Relic app performance management and know exactly what is happening inside 
your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and 
get our sweet Data Nerd shirt too! 
http://p.sf.net/sfu/newrelic-dev2dev___
 Rdkit-discuss mailing list 
Rdkit-discuss@lists.sourceforge.netmailto:Rdkit-discuss@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] matching substructures to molecules

2012-08-07 Thread Stiefl, Nikolaus
Hi Gonzalo,

SmilesToMol has a sanitize flag which you can set to False. However - I am not 
sure how well you molecule fingerprints will work with an unsanitized molecule. 
I would imagine that you will run into all sorts of funny problems wrt 
aromaticity detection etc.

Not sure if this helps in your case – but if you write out the molecules as 
non-aromatic (ie alternating ring bonds) maybe this would help with your 
approach? How do you fragment your molecules? Do you need the vendor code or 
could you just do the same in RDKit – eg using BRICS or similar?

Ciao
Nik


From: Gonzalo Colmenarejo-Sanchez 
gonzalo.2.colmenar...@gsk.commailto:gonzalo.2.colmenar...@gsk.com
Date: Tue, 7 Aug 2012 09:50:30 +
To: 
rdkit-discuss@lists.sourceforge.netmailto:rdkit-discuss@lists.sourceforge.net
 
rdkit-discuss@lists.sourceforge.netmailto:rdkit-discuss@lists.sourceforge.net
Subject: [Rdkit-discuss] matching substructures to molecules

Hi,

I have a vendor fragmentation algorithm and I want to evaluate the presence of 
the fragments/substructures in a list of molecules with the RDKit C++ API. In 
order to avoid a slow SubstructMatch comparison of n fragments x m molecules I 
first SmilesToMol the fragment, generate a fingerprint, calculate the Tversky 
similarity with the molecule fingerprint, and only if the value is high a 
SubstructMatch is run. This makes the process extremely fast.

The problem I observe is that for many SmilesToMol of the substructures I’m 
getting exceptions like

[10:34:46] Can't kekulize mol
[10:34:46] non-ring atom 4 marked aromatic

Is there a way to “force” SmilesToMol to accept the fragment SMILES so that a 
fingerprint of the fragment graph can be generated (btw, I don’t understand why 
a kekulization is performed) even when the fragment is not a complete molecule?

Thanks a lot,
Gonzalo

-- 
Live Security Virtual Conference Exclusive live event will cover all the ways 
today's security and threat landscape has changed and how IT managers can 
respond. Discussions will include endpoint security, mobile security and the 
latest in malware threats. 
http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
 Rdkit-discuss mailing list 
Rdkit-discuss@lists.sourceforge.netmailto:Rdkit-discuss@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Detecting rings and bond types from PDB HETATM record

2012-07-06 Thread Stiefl, Nikolaus
Hi JP,
Not sure if this is of any help. If it's an pdb file from rcsb or an
in-house one where you have a corresponding smiles available maybe you
could use this information to properly setup the bond types using bond
matches? I know the components.cif file still has quite a few errors -
however, maybe it could be of help.
Cheers
Nik


Maybe you could use the smiles information ion

On 7/6/12 5:04 AM, Greg Landrum greg.land...@gmail.com wrote:

Hi JP,

On Thu, Jul 5, 2012 at 7:07 PM, JP jeanpaul.ebe...@inhibox.com wrote:


 I generate a RWMol instance from the HETATM portion of a PDB file.  My
atoms
 are currently only joined by a single bond as defined in the connect
portion
 of the pdb file, e.g.

 CONECT 2235 2234 2236
 CONECT 2236 2231 2235 2251
 CONECT 2237 2238 2242

ah, yes, the missing bond orders, one of the many reasons that I have
never done a PDB parser for the RDKit. :-S

I think you're doing this work in C++, so I'm going to answer the rest
of the questions accordingly.

 Are there any obvious rdkit ways how to detect :-

 0. rings

Sure.

If you just want to know if each atom/ring is in a ring you can use
MolOps::fastFindRings and then mol.getRingInfo().numAtomRings(idx)0
or mol.getRingInfo().numBondRings(idx)0

If you want to know what the SSSR rings are, then you should use
MolOps::symmetrizeSSSR(). You can pass that an extra argument where it
will return the rings as defined by atom indices. After calling this,
you can also get the set of atom rings using
mol.getRingInfo().atomRings() or the bond rings with
mol.getRingInfo().bondRings();

 1. aromatic rings/atoms
 2. double/triple bonds
 3. charges (if any)

Here's where the trouble starts.

I guess you want to perceive the bond types and atom hybridizations
from the geometry. From there you can get the charges. The RDKit does
not currently have anything to do this. There was a discussion on the
mailing list last year:
http://comments.gmane.org/gmane.science.chemistry.rdkit.user/85
where Geoff Hutchinson very kindly offered to donate the OpenBabel
bond perception code to the RDKit. He sent the code, but I've never
had the time to port it from OpenBabel to RDKit. If you're
interested in implementing this and were willing to do it in a way
that could be integrated into the main RDKit, I can send you the
donated code; it's about 300 lines of well-commented C++.


 I would like to set these properties on every atom instance contained
in my
 RWMol - so I generate a correct molecule representation.
 I assume sanitize would not clean these up for me? Correct?

Correct. Sanitize uses the bond information that's there.

-greg

--

Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] improving substructure search behavior with real molecules

2012-03-01 Thread Stiefl, Nikolaus
Hi Greg,

Personally I like your suggestion of the behaviour similar to SMARTS. That way 
one can decide to whwat level of granularity one wants. Obviously it also means 
that we have to think a bit more about our queries and database preparations - 
I am sure though this will only improve our data pool.

My 2 pence
Nik


-Original Message-
From: Greg Landrum [mailto:greg.land...@gmail.com] 
Sent: Thursday, March 01, 2012 8:38 AM
To: RDKit Discuss
Subject: [Rdkit-discuss] improving substructure search behavior with real 
molecules

Dear all,

Jean-Paul has recently posted a couple of bugs related to the way the
RDKit handles substructure matches between molecules that are not
built from SMARTS.
A short, general summary of the problem is shown here:

In [2]: 
Chem.MolFromSmiles('CCC').HasSubstructMatch(Chem.MolFromSmiles('CC[14C]'))
Out[2]: True

In [3]: 
Chem.MolFromSmiles('CCO').HasSubstructMatch(Chem.MolFromSmiles('CC[O-]'))
Out[3]: True

The reason this happens is that the atom-atom matching code at the
moment only considers atomic number, so any O matches any other O.
Here's a table showing some examples of the current behavior (easier
to see in a fixed-width font):
| Molecule | Query   | Match |
| CCO  | CCO | Yes   |
| CC[O-]   | CCO | Yes   |
| CCO  | CC[O-]  | Yes   |
| CC[O-]   | CC[O-]  | Yes   |
| CC[O-]   | CC[OH]  | Yes   |
| CCOC | CC[OH]  | Yes   |
| CCOC | CCO | Yes   |
| CCC  | CCC | Yes   |
| CC[14C]  | CCC | Yes   |
| CCC  | CC[14C] | Yes   |
| CC[14C]  | CC[14C] | Yes   |

It is quite simple to change this behavior so that it's somewhat more
intuitive, but doing so requires making some decisions about what the
semantics of these searches should be.

The easiest thing would be to go from the current overly general
definition to something that is very specific where all atomic
properties in the molecule and query must match. This gives the
following table:
| Molecule | Query   | Match |
| CCO  | CCO | Yes   |
| CC[O-]   | CCO | No|
| CCO  | CC[O-]  | No|
| CC[O-]   | CC[O-]  | Yes   |
| CC[O-]   | CC[OH]  | No|
| CCOC | CC[OH]  | No|
| CCOC | CCO | Yes   |
| CCC  | CCC | Yes   |
| CC[14C]  | CCC | No|
| CCC  | CC[14C] | No|
| CC[14C]  | CC[14C] | Yes   |

I think this would also provide unexpected results. Particularly
things like this:
| CC[O-]   | CCO | No|

My proposal for a fix is to adopt semantics similar to SMARTS: if you
don't specify something in the query, then it's not used as part of
the matching criteria. This gives the following table:
| Molecule | Query   | Match |
| CCO  | CCO | Yes   |
| CC[O-]   | CCO | Yes   |
| CCO  | CC[O-]  | No|
| CC[O-]   | CC[O-]  | Yes   |
| CC[O-]   | CC[OH]  | No|
| CCOC | CC[OH]  | No|
| CCOC | CCO | Yes   |
| CCC  | CCC | Yes   |
| CC[14C]  | CCC | Yes   |
| CCC  | CC[14C] | No|
| CC[14C]  | CC[14C] | Yes   |

This is easy to implement and should not have too much impact on
substructure search speeds.

Comments? Suggestions?

-greg

--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Mol2 Format problem ? Can't kekulize mol -- but with a twist.

2012-01-12 Thread Stiefl, Nikolaus
Well - Corina does make use of C.ar and N.ar - just not in this combination. 
Problem with having all ar (bonds and atoms) is that it can be be non-specific.


Here is indole as retrieved from Corina:

@TRIPOSMOLECULE
NoName
  16   17000
SMALL
NO_CHARGES


@TRIPOSATOM
   1 C1-0.0170 1.4025 0.0098 C.ar
   2 C2-1.2389 2.0675 0.0301 C.ar
   3 C3-2.4112 1.3443 0.0423 C.ar
   4 C4-2.3870-0.0438 0.0346 C.ar
   5 C5-1.1984-0.7171 0.0152 C.ar
   6 C6 0.0021-0.0041 0.0020 C.ar
   7 C7 1.4152-0.3895-0.0184 C.2
   8 C8 2.1316 0.7457-0.0225 C.2
   9 N9 1.2929 1.8266-0.0005 N.pl3
  10 H10   -1.2681 3.1471 0.0365 H
  11 H11   -3.3587 1.8624 0.0584 H
  12 H12   -3.3153-0.5957 0.0445 H
  13 H13   -1.1873-1.7970 0.0097 H
  14 H141.8064-1.3961-0.0279 H
  15 H153.2102 0.7981-0.0362 H
  16 H161.5785 2.7536 0.0013 H
@TRIPOSBOND
   116 ar
   219 1
   312 ar
   423 ar
   52   10 1
   634 ar
   73   11 1
   845 ar
   94   12 1
  1056 ar
  115   13 1
  1267 1
  1378 2
  147   14 1
  1589 1
  168   15 1
  179   16 1

#   End of record

And yes - I would say that this is a Won't fix. Unfortunately, but the mol2 
file format (documentation) is such a pain in general and the multiple 
different implementations doesn't make it any better.

Sorry
Nik



From: JP [mailto:jeanpaul.ebe...@inhibox.com]
Sent: Thursday, January 12, 2012 3:33 PM
To: Stiefl, Nikolaus
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Mol2 Format problem ? Can't kekulize mol -- but 
with a twist.

Thanks for the explanation Stiefl.  File formats - what a pain.
So Corina does not make use of C.ar or N.ar?

This is a Won't Fix then ... right?  Maybe a note in the documentation of the 
list of unsupported atom types from the spec (pg 53 in 
http://tripos.com/data/support/mol2.pdf) which are not supported may be useful 
then (as people like me have never used corina) ?

Many thanks,

-
Jean-Paul Ebejer
Early Stage Researcher

On 12 January 2012 14:13, Stiefl, Nikolaus 
nikolaus.sti...@novartis.commailto:nikolaus.sti...@novartis.com wrote:
Dear JP,

When the Mol2 parser was implemented we had to take a decision at some point 
about which format to use. Given the unspecific Tripos specs this was 
actually quite tricky. If you write the same molecule using Sybyl, Tripos' db 
tools or other software like Corina you will get all different results (note 
that Tripos is not even giving the same results when using their own tools).

Hence, we decided on corina since this is one of them most widely used tools 
and also seems to give the most consitsent results when evaluating a largish 
set I converted and reviewed. As you can see, there is a Note when checking the 
Mol2 parser (eg MolFromMol2File) that will tell you that it is optimized for 
the atom-typing scheme by Corina.

Sorry I can't be of more help

Nik

From: JP 
[mailto:jeanpaul.ebe...@inhibox.commailto:jeanpaul.ebe...@inhibox.com]
Sent: Thursday, January 12, 2012 2:57 PM
To: 
rdkit-discuss@lists.sourceforge.netmailto:rdkit-discuss@lists.sourceforge.net
Subject: [Rdkit-discuss] Mol2 Format problem ? Can't kekulize mol -- but with a 
twist.

Hi there RDkitters,

Using RDKit 2011.09.1 on Ubuntu Linux 11.10 64 bit with a noisy fan.

I am trying to read a MOL2 file (which I think is in line with the Tripos spec 
http://tripos.com/data/support/mol2.pdf -- your favorite molecular format, I 
know).

The structure is a simple indole.  If the atom types in the mol atom block are 
C.ar or N.ar the sanitization fails (but I think this should be allowed - 
especially since the bonds are also defined as aromatic).  If I change the atom 
types to C.2 and N.2 respectively then everything works fine and the aromatic 
parts of the molecules are still correct (because of the aromatic bond 
definitions).

An example of this so you can just copy and paste it:

#!/usr/bin/env python

from rdkit import Chem

# the following is a valid molecule - why does it break?
indole_broken = @TRIPOSMOLECULE
MVSketch_Indole
10 11 1
SMALL
NO_CHARGES
@TRIPOSATOM
1  C138.6029   -19.6265 0.C.ar 1  noname
2  C238.6029   -21.1665 0.C.ar 1  noname
3  C337.2692   -21.9365 0.C.ar 1  noname
4  C435.9356   -21.1665 0.C.ar 1  noname
5  C535.9356   -19.6265 0.C.ar 1  noname
6  C637.2692   -18.8565 0.C.ar 1  noname
7  C734.4709   -21.6424 0.C.ar 1  noname
8  C8

Re: [Rdkit-discuss] multiprocessing rdkit

2011-10-11 Thread Stiefl, Nikolaus
Hi Paul,

When I look at your definition below and the one that worked there is a slight 
difference.

In fps_calc you are passing a molecule and then you try to iterate over it (in 
 fps = [GetMorganFingerprint(x,3) for x in m] ). Whereas in 
generateconformations(m) you also pass a single molecule but then just work on 
this single molecule - so you don't try to iterate over a molecule.

I don't know anything about the multiprocessing module - but given at your code 
I would assume that you try to split a list of molecules into multiple 
processes and then passing one field of the list to each call of the function 
... so your code should (in my opinion) look like 

from multiprocessing import Pool
p4 = Pool(processes=4)
def fps_calc(m):
return GetMorganFingerprint(m,3)
fps =  p4.map(fps_calc,ms)

As said - this is just by looking at your examples - and not kowing anything 
about mulyiprocessing. Sorry  that I cannot test more but I have to run to a 
meeting :-(

Nik


-Original Message-
From: paul.czodrow...@merckgroup.com [mailto:paul.czodrow...@merckgroup.com] 
Sent: Tuesday, October 11, 2011 1:55 PM
To: RDKit Discuss
Subject: [Rdkit-discuss] multiprocessing  rdkit


Dear RDkitters,

I'm trying to use Python's multiprocessing module in conjunction with
RDKit.

It should be applied in 2 cases:
(1) fingerprint calculation

(2) Picking Diverse Molecules



(1)

from multiprocessing import Pool
p4 = Pool(processes=4)
def fps_calc(m):
fps = [GetMorganFingerprint(x,3) for x in m]
return fps
fps =  p4.map(fps_calc,ms)

==
TypeError: 'Mol' object is not iterable

(2)

from multiprocessing import Pool
p4 = Pool(processes=4)
def distij(i,j,fps=fps):
return 1-DataStructs.DiceSimilarity(fps[i],fps[j])

def DivSelection(distij,nfps,quantity_train):
picker = MaxMinPicker()
picked_indices = picker.LazyPick(distij,nfps,quantity_train)
return picked_indices

pickIndices = p4.map(DivSelection, ???)



In the first case, I do not get the point what is going wrong. With the
following conformer generation snippet, multiprocessing works perfect:

from multiprocessing import Pool
def generateconformations(m):
  m = Chem.AddHs(m)
  ids=AllChem.EmbedMultipleConfs(m,numConfs=10)
  for id in ids:
AllChem.UFFOptimizeMolecule(m,confId=id)
  return m
p4 = Pool(processes=4)
ms = [x for x in Chem.SDMolSupplier('blockbusters.sdf')]
cms4=p4.map(generateconformations,ms)


In the second case, I'm not sure how to properly pass the variables.



Cheers  Thanks,
Paul

This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient,
you must not copy this message or attachment or disclose the contents to
any other person. If you have received this transmission in error, please
notify the sender immediately and delete the message and any attachment
from your system. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not accept liability for any omissions or errors in this
message which may arise as a result of E-Mail-transmission or for damages
resulting from any unauthorized changes of the content of this message and
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not guarantee that this message is free of viruses and does
not accept liability for any damages caused by any virus transmitted
therewith.

Click http://disclaimer.merck.de to access the German, French, Spanish and
Portuguese versions of this disclaimer.


--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] how to come to a good model

2011-10-10 Thread Stiefl, Nikolaus
Hi Paul,

I'd agree on Greg's comment - if this is for a hErg model then you will not 
have a lot of luck to make a reasonable model purely with physchem properties. 
I guess having information on ionizability could be of help.

Another one to test - in case you need to make a 3 class model - would be to 
modify your class borders so you will only predict really likely and really 
unlikely molecules and increase the size of your I have no idea class.

Hope this helps
Nik


-Original Message-
From: Greg Landrum [mailto:greg.land...@gmail.com] 
Sent: Saturday, October 08, 2011 4:21 PM
To: paul.czodrow...@merckgroup.com
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] how to come to a good model

On Fri, Oct 7, 2011 at 12:31 PM,  paul.czodrow...@merckgroup.com wrote:

 Dear RDKitters,

 I'm in the process of training a 3-class decision tree model. I have roughly
 about 1500 compounds with an almost equal distribution of the 3 classes.

snip


 In all cases, the statistics is really bad: about 50 percent are
 misclassified, e.g.:
 
         *** Vote Results ***
 misclassified: 580/1180 (%49.15)        580/1180 (%49.15)

 average correct confidence:    0.7837
 average incorrect confidence:  0.7528
 

 Interestingly, there is a really small difference between the average
 confidence level for the correct as well as the incorrect classifications.
 As far as I got it this tells me that the model is really bad - an
 information I already got by the vote results themselves.


 Which parameters are worthhile to test?

We talked about this at the Knime OSD meeting already, but I think
it's worth repeating for the community: I believe that prediction of
hERG binding is too challenging for simple descriptors like the
physicochemical descriptors the RDKit provides or the standard Morgan
fingerprint. This is particularly true if you're trying to build a
three-class model (which is much more difficult than a two-class
model).

One suggestion would be to try doing a two class model (either combine
two of your classes together or use only classes 0 and 2 in the
training) and see if that helps. Another would be try using different
descriptors. You might be able to get something useful with the
FeatMorgan fingerprints (similar to the FCFP fingerprints).

-greg

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] calculation of number of chiral centres in a mol

2011-05-19 Thread Stiefl, Nikolaus
If you'd have 3D molecules you could use the 
AllChem.AssignAtomChiralTagsFromStructure() ... but that's only if you have 3D 
molecules.

Nik

-Original Message-
From: JP [mailto:jeanpaul.ebe...@inhibox.com] 
Sent: Thursday, May 19, 2011 1:27 PM
To: Greg Landrum
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] calculation of number of chiral centres in a mol

On 19 May 2011 12:17, Greg Landrum greg.land...@gmail.com wrote:
 On Thu, May 19, 2011 at 12:27 PM, JP jeanpaul.ebe...@inhibox.com wrote:
 Any ideas how to calculate the number of chiral centres in a molecule?

 Is there a function for that? (ideally without overly complex SMARTS pattern)

 indeed:
     mol = Chem.MolFromSmiles('[C@H](Cl)(F)Br')
     FindMolChiralCenters(mol)
    [(0, 'R')]
     mol = Chem.MolFromSmiles('[C@@H](Cl)(F)Br')
     FindMolChiralCenters(mol)
    [(0, 'S')]

     FindMolChiralCenters(Chem.MolFromSmiles('CCC'))
    []

 Note that the current version doesn't include unspecified chiral
 centers; that's in the C++ but hasn't been made available in the
 python yet... it's coming.


When ? When ?

That is what I need :D

--
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its 
next-generation tools to help Windows* and Linux* C/C++ and Fortran 
developers boost performance applications - including clusters. 
http://p.sf.net/sfu/intel-dev2devmay
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its 
next-generation tools to help Windows* and Linux* C/C++ and Fortran 
developers boost performance applications - including clusters. 
http://p.sf.net/sfu/intel-dev2devmay
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Write SDF as a string instead of two file

2011-05-17 Thread Stiefl, Nikolaus
Hi JP,

Maybe a comment since I ran into this before - you will loose the properties of 
a molecule when just using the MolBlock.

Cheers
Nik


-Original Message-
From: Greg Landrum [mailto:greg.land...@gmail.com] 
Sent: Tuesday, May 17, 2011 6:10 AM
To: JP
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Write SDF as a string instead of two file

Dear JP,

On Mon, May 16, 2011 at 7:56 PM, JP jeanpaul.ebe...@inhibox.com wrote:
 Quick one.

 I have the following which writes to an .sdf file:

 w = AllChem.SDWriter(output_file)
    for mol in molecules:
        confIds = [c.GetId() for c in mol.GetConformers()]
        for cid in confIds:
            w.write(mol, confId=cid)
 w.close()

 Now, what I would like is instead of writing to a file - I would like
 to write to a string (so I can zip the output using the zlib
 library).

At the moment this is not possible. SDWriters can only write to named files.
It would be a useful feature and I think it should be possible, but
I'm going to have to do some research.

 Do I just need to get the MolBlock and interleave the varies entries with 
  ?

That will definitely work.

-greg

--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Best practice: which (database) fingerprints to use ?

2011-03-08 Thread Stiefl, Nikolaus
Hi JeanPaul,

The difference between featmorganbv and morganbv is that the first one uses 
pharmacophore features for atom descriptions whereas the other one atom types 
(it essentially corresponds to the ECFP descriptors). I would suggest to use 
featmorganbv_fp only if you want to do more fuzzy similarity searching 
(scaffold-hopping and the like).

RDKit fingerprint is (as stated below) a daylight fingerprint like FP that is 
using hashed molecular subgraphs - it is ok depending on what you want to do 
with it - maybe you have a better in-house descriptor that  is optimized for 
substructure searching though.

Hope that helps
Nik


From: JP [mailto:jeanpaul.ebe...@inhibox.com]
Sent: Tuesday, March 08, 2011 11:05 AM
To: rdkit-discuss@lists.sourceforge.net
Subject: [Rdkit-discuss] Best practice: which (database) fingerprints to use ?


Hi there,

I am storing a ton of molecules (~8M - it would be a ton if you print them all 
out, and hance use all of the trees in Regent's Park) in a database and using 
fingerprints for substructure and similarity searches.  The fingerprints I am 
currently using are the ones I took blindly from the wikipages documentation 
(when in doubt, copy) - specifically torsionbv_fp, morganbv_fp and 
atompairbv_fp (from http://code.google.com/p/rdkit/wiki/DatabaseCreation2).

Now I look at the database cartridge documentation - 
http://code.google.com/p/rdkit/wiki/ReferenceDocumentation and I see there are 
others - some of which I have actually heard about:

featmorganbv_fp(mol,int) : returns a bfp which is the bit vector Morgan 
fingerprint for a molecule using chemical-feature invariants. The second 
argument provides the radius. This is an FCFP-like fingerprint.
rdkit_fp(mol) : returns a bfp which is the RDKit fingerprint for a molecule. 
This is a daylight-fingerprint using hashed molecular subgraphs.

What is the best practice here?  Is it to use rdkit_fp ? (I assume this was 
added later - and possibly the original documentation is out of date)
What is the difference between featmorganbv and the one I am using (i.e. 
morganbv_fp) ?
What do you suggest in your experience?
Any ideas will be highly appreciated - as right now I am quite without any 
myself.

Many Thanks
JP
--
What You Don't Know About Data Connectivity CAN Hurt You
This paper provides an overview of data connectivity, details
its effect on application quality, and explores various alternative
solutions. http://p.sf.net/sfu/progress-d2d___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss