Re: [Rdkit-discuss] How to calculate Tanimoto similarity score between reactions

2020-06-10 Thread Francois Berenger

On 10/06/2020 13:11, 丁邵珍 wrote:

Hi, I want to calculate Tanimoto similarity score of two reactions
('CCCO>>CCC=O', 'CC(O)C>>CC(=O)C'), I found all methods of  Tanimoto
similarity score calculation are for compounds. Could you please tell
me how to calculate the Tanimoto similarity score of reactions? I am
looking forward to your reply.


I don't know how to do it in rdkit, but if you need some inspiration,
here is how chemaxon does it:

https://docs.chemaxon.com/display/docs/Reaction_fingerprint_RF.html


Yours,
shaozhen
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] trying to figure out what an rdkit warning means

2020-06-10 Thread Bennion, Brian via Rdkit-discuss
Hello,
Below I show a smiles string from MOE and the smiles string calculated from 
RDKit and the InChI string calculated by RDkit(2020_1).

The error on conversion to inchi string is confusing me after entering both 
smiles strings into a viewer I don't see any undefined stereo center.

O=C(/C=C/c1c1)c1ccc(OC/C=C(/CC/C=C(\C)/C)\C)cc1
CC(C)=CCC/C(C)=C/COc1ccc(C(=O)/C=C/c2c2)cc1
[18:10:42] WARNING: Omitted undefined stereo
InChI=1S/C25H28O2/c1-20(2)8-7-9-21(3)18-19-27-24-15-13-23(14-16-24)25(26)17-12-22-10-5-4-6-11-22/h4-6,8,10-18H,7,9,19H2,1-3H3


   while len(line) != 0:
fields = line.replace('","',' ').split()
mol1 = fields[0].replace('"','')
mol_name = fields[1]

try:
mol = Chem.MolFromSmiles(mol1,sanitize=False) #, removeHs=False)
except:
mol = None
if mol is None:
print("mol1 failed:",mol1)
output.write("mol1 failes:",mol1)
else:
rkditsmiout.write('\"'+Chem.MolToSmiles(mol, 
isomericSmiles=True)+'\"\n')
print(Chem.MolToSmiles(mol, isomericSmiles=True))
rkditsmiout.write('\"'+Chem.inchi.MolToInchi(mol)+'\"\n')
print(Chem.inchi.MolToInchi(mol))
count += 1
print(count)

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Scalability of Postgres cartridge

2020-06-10 Thread Ivan Tubert-Brohman
Thank you everyone for the suggestions. For now I don't have immediate
plans to adopt the cartridge but it's good to know these things when the
time comes.

Best,
Ivan

On Mon, Jun 8, 2020 at 6:49 PM Finnerty, Jim via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> If you have a billion molecule data source and would like to try an
> at-scale test, I'd be willing to help out with provisioning the hardware,
> looking at the efficiency of the plans, etc., using rdkit with Aurora
> PostgreSQL.
>
> If I understand how the rdkit GIST index filtering mechanism works for a
> given similarity metric, a parallel GIST index scan ought to be able to
> scale almost linearly scale with the number of cores, provided that the
> RDBMS is built on a scalable storage subsystem.
>
> If so, the largest instance size that's currently supported has 96 cores,
> so we can do a fairly high degree of parallelism.
>
> On 6/5/20, 1:07 PM, "dmaziuk via Rdkit-discuss" <
> rdkit-discuss@lists.sourceforge.net> wrote:
>
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender and
> know the content is safe.
>
>
>
> On 6/5/2020 4:45 AM, Greg Landrum wrote:
>
> > Having said that, the team behind ZINC used to use the RDKit
> cartridge with
> > PostgreSQL as the backend for ZINC. They had the database sharded
> > across multiple instances and managed to get the fingerprint indices
> to
> > work there. I don't remember the substructure search performance
> being
> > terrible, but it wasn't great either. They have since switched to a
> > specialized system (Arthor from NextMove software), which offers
> > significantly better performance.
>
> Generally speaking a database of a billion rows needs hardware capable
> of running it. Buy a server with 1TB RAM and 64 cores and a couple of
> U.2 NVME drives and see how Postgres runs on that.
>
> Then you need to look at the database, e.g. query in an indexed
> billion-row table could be OK but inserting a billion-first row will
> not be.
>
> If you want to scale to these kinds of volumes, you need to do some
> work.
>
> (And much of the point of no-sql hadoop "cloud" workflows is that if
> you
> can parallelize what you're doing to multiple machines, at some data
> size they will start outperforming a centralized fast search engine.)
>
> Dima
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMART reaction for closing rings

2020-06-10 Thread Shani Levi
Thank you so much! It helps me a lot

On Tue, Jun 9, 2020 at 6:24 PM Greg Landrum  wrote:

> Hi Shani,
>
> If you have mapped atoms in the reactants that are not in the products,
> those end up being removed
>
> I'm not sure exactly what reaction you're trying to do, but I think you
> want something like this;
>
> rxn =
> AllChem.ReactionFromSmarts("([C:1]=[C:2].[*:3][*+:4])>>[*:2]-[*:1][*+0:4][*:3]")
> m1 = Chem.MolFromSmiles('C=CC([CH2+])CCC=C(C)C')
> ps = rxn.RunReactants((m1,))
> for p in ps: print(Chem.MolToSmiles(p[0]))
>
> Note that I also explicitly neutralized the carbocation in the products.
> Otherwise the +1 from the reactants would be carried over.
>
> -greg
>
>
> On Tue, Jun 9, 2020 at 4:42 PM Shani Levi  wrote:
>
>> Hello,
>> I'm interested in using AllChem.ReactionFromSmarts to predict product for
>> a specific reaction.
>> For example, I want to describe the reaction between double bonds and a
>> carbo-cations.
>>
>> *I tried: *
>> rxn = AllChem.ReactionFromSmarts("([C:1]=[C:2].[*:3][*+:4])>>[*:1][*:4]")
>> m1 = Chem.MolFromSmiles('C=CC([CH2+])CCC=C(C)C')
>> ps = rxn.RunReactants((m1,))
>>
>> *and it gave me four molecules: *
>>
>> [CH2+]C [CH2+]C [CH2+]CCC
>> [CH2+]C(C)C
>>
>> the problem here that it does not describe the ring-closure molecules and
>> it somehow cuts the rest of the molecule, if someone has any suggestions of
>> how to change the SMARTS descriptions that it will define the right
>> reaction.
>>
>> Thank you very much,
>> Shani
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss