Re: [Rdkit-discuss] How to calculate Tanimoto similarity score between reactions
On 10/06/2020 13:11, 丁邵珍 wrote: Hi, I want to calculate Tanimoto similarity score of two reactions ('CCCO>>CCC=O', 'CC(O)C>>CC(=O)C'), I found all methods of Tanimoto similarity score calculation are for compounds. Could you please tell me how to calculate the Tanimoto similarity score of reactions? I am looking forward to your reply. I don't know how to do it in rdkit, but if you need some inspiration, here is how chemaxon does it: https://docs.chemaxon.com/display/docs/Reaction_fingerprint_RF.html Yours, shaozhen ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] trying to figure out what an rdkit warning means
Hello, Below I show a smiles string from MOE and the smiles string calculated from RDKit and the InChI string calculated by RDkit(2020_1). The error on conversion to inchi string is confusing me after entering both smiles strings into a viewer I don't see any undefined stereo center. O=C(/C=C/c1c1)c1ccc(OC/C=C(/CC/C=C(\C)/C)\C)cc1 CC(C)=CCC/C(C)=C/COc1ccc(C(=O)/C=C/c2c2)cc1 [18:10:42] WARNING: Omitted undefined stereo InChI=1S/C25H28O2/c1-20(2)8-7-9-21(3)18-19-27-24-15-13-23(14-16-24)25(26)17-12-22-10-5-4-6-11-22/h4-6,8,10-18H,7,9,19H2,1-3H3 while len(line) != 0: fields = line.replace('","',' ').split() mol1 = fields[0].replace('"','') mol_name = fields[1] try: mol = Chem.MolFromSmiles(mol1,sanitize=False) #, removeHs=False) except: mol = None if mol is None: print("mol1 failed:",mol1) output.write("mol1 failes:",mol1) else: rkditsmiout.write('\"'+Chem.MolToSmiles(mol, isomericSmiles=True)+'\"\n') print(Chem.MolToSmiles(mol, isomericSmiles=True)) rkditsmiout.write('\"'+Chem.inchi.MolToInchi(mol)+'\"\n') print(Chem.inchi.MolToInchi(mol)) count += 1 print(count) ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Scalability of Postgres cartridge
Thank you everyone for the suggestions. For now I don't have immediate plans to adopt the cartridge but it's good to know these things when the time comes. Best, Ivan On Mon, Jun 8, 2020 at 6:49 PM Finnerty, Jim via Rdkit-discuss < rdkit-discuss@lists.sourceforge.net> wrote: > If you have a billion molecule data source and would like to try an > at-scale test, I'd be willing to help out with provisioning the hardware, > looking at the efficiency of the plans, etc., using rdkit with Aurora > PostgreSQL. > > If I understand how the rdkit GIST index filtering mechanism works for a > given similarity metric, a parallel GIST index scan ought to be able to > scale almost linearly scale with the number of cores, provided that the > RDBMS is built on a scalable storage subsystem. > > If so, the largest instance size that's currently supported has 96 cores, > so we can do a fairly high degree of parallelism. > > On 6/5/20, 1:07 PM, "dmaziuk via Rdkit-discuss" < > rdkit-discuss@lists.sourceforge.net> wrote: > > CAUTION: This email originated from outside of the organization. Do > not click links or open attachments unless you can confirm the sender and > know the content is safe. > > > > On 6/5/2020 4:45 AM, Greg Landrum wrote: > > > Having said that, the team behind ZINC used to use the RDKit > cartridge with > > PostgreSQL as the backend for ZINC. They had the database sharded > > across multiple instances and managed to get the fingerprint indices > to > > work there. I don't remember the substructure search performance > being > > terrible, but it wasn't great either. They have since switched to a > > specialized system (Arthor from NextMove software), which offers > > significantly better performance. > > Generally speaking a database of a billion rows needs hardware capable > of running it. Buy a server with 1TB RAM and 64 cores and a couple of > U.2 NVME drives and see how Postgres runs on that. > > Then you need to look at the database, e.g. query in an indexed > billion-row table could be OK but inserting a billion-first row will > not be. > > If you want to scale to these kinds of volumes, you need to do some > work. > > (And much of the point of no-sql hadoop "cloud" workflows is that if > you > can parallelize what you're doing to multiple machines, at some data > size they will start outperforming a centralized fast search engine.) > > Dima > > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SMART reaction for closing rings
Thank you so much! It helps me a lot On Tue, Jun 9, 2020 at 6:24 PM Greg Landrum wrote: > Hi Shani, > > If you have mapped atoms in the reactants that are not in the products, > those end up being removed > > I'm not sure exactly what reaction you're trying to do, but I think you > want something like this; > > rxn = > AllChem.ReactionFromSmarts("([C:1]=[C:2].[*:3][*+:4])>>[*:2]-[*:1][*+0:4][*:3]") > m1 = Chem.MolFromSmiles('C=CC([CH2+])CCC=C(C)C') > ps = rxn.RunReactants((m1,)) > for p in ps: print(Chem.MolToSmiles(p[0])) > > Note that I also explicitly neutralized the carbocation in the products. > Otherwise the +1 from the reactants would be carried over. > > -greg > > > On Tue, Jun 9, 2020 at 4:42 PM Shani Levi wrote: > >> Hello, >> I'm interested in using AllChem.ReactionFromSmarts to predict product for >> a specific reaction. >> For example, I want to describe the reaction between double bonds and a >> carbo-cations. >> >> *I tried: * >> rxn = AllChem.ReactionFromSmarts("([C:1]=[C:2].[*:3][*+:4])>>[*:1][*:4]") >> m1 = Chem.MolFromSmiles('C=CC([CH2+])CCC=C(C)C') >> ps = rxn.RunReactants((m1,)) >> >> *and it gave me four molecules: * >> >> [CH2+]C [CH2+]C [CH2+]CCC >> [CH2+]C(C)C >> >> the problem here that it does not describe the ring-closure molecules and >> it somehow cuts the rest of the molecule, if someone has any suggestions of >> how to change the SMARTS descriptions that it will define the right >> reaction. >> >> Thank you very much, >> Shani >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss