You should use the *CircularFingerprinter* for similarity. On Sun, 28 Mar 2021 at 08:39, Sub Jae Shin <cnb.mons...@gmail.com> wrote:
> To John Mayfield > > Hi, I found the drugbank id property from AtomContainer's getproperties > method, so that I could specify which atom container indicates which drug. > > I think my goal to get drug-drug similarity has been achieved in my guess. > > package com.company; > import org.openscience.cdk.ChemFile; > import org.openscience.cdk.exception.CDKException; > import org.openscience.cdk.fingerprint.Fingerprinter; > import org.openscience.cdk.fingerprint.IBitFingerprint; > import org.openscience.cdk.fingerprint.IFingerprinter; > import org.openscience.cdk.graph.rebond.Bspt; > import org.openscience.cdk.interfaces.IAtomContainer; > import org.openscience.cdk.interfaces.IChemFile; > import org.openscience.cdk.io.MDLV2000Reader; > import org.openscience.cdk.similarity.Tanimoto; > import org.openscience.cdk.tools.manipulator.ChemFileManipulator; > > import java.io.*; > import java.lang.reflect.Array; > import java.util.ArrayList; > import java.util.List; > import java.util.Map; > > public class Main { > > public static void main(String[] args) { > try { > > InputStream structures = new > FileInputStream("../data/drugbank/structures.sdf"); > MDLV2000Reader reader = new MDLV2000Reader(structures); > IChemFile file = reader.read(new ChemFile()); > //Where can I find drugbank id? > > Fingerprinter finger = new Fingerprinter(); > List<IAtomContainer> AtomData = > ChemFileManipulator.getAllAtomContainers(file); > int count = AtomData.size(); > ArrayList<ArrayList> df = new ArrayList<>(); > > for(int i = 0; i < count; ++i) { > ArrayList<Object> list = new ArrayList<>(); > IAtomContainer acReference = AtomData.get(i); > Map refProperties = acReference.getProperties(); > list.add(refProperties.get("DATABASE_ID")); > for(int j = 0; j < count; ++j) { > IAtomContainer acStructure = AtomData.get(j); > Map structProperties = acStructure.getProperties(); > System.out.println("REF DATABASE_ID : " + > refProperties.get("DATABASE_ID") + > "-" + "COMP DATABASE_ID" + > structProperties.get("DATABASE_ID") + " similarity is now calculating...."); > double similarity = cdkCalculateTanimotoCoef(finger, > acReference, acStructure); > list.add(similarity); > } > df.add(list); > } > FileWriter result_csv = new > FileWriter("../data/drugbank/drug_drug_sim.csv"); > > for(ArrayList a : df){ > String row = ""; > for(int i = 0; i < a.size(); ++i) { > if(i == a.size() - 1) { > row = row + a.get(i).toString() + "\n"; > } > else { > row = row + a.get(i).toString() + ","; > } > } > // System.out.println(row); > result_csv.write(row); > } > > result_csv.close(); > > //System.out.println(acReference.toString()); > > > } catch (FileNotFoundException | CDKException e) { > System.out.println(e.getMessage()); > } catch (IOException e) { > e.printStackTrace(); > } > } > > public static double cdkCalculateTanimotoCoef(IFingerprinter > fingerprinter, IAtomContainer acReference, IAtomContainer acStructure ) { > > double ret = 0.0; > > try { > > IBitFingerprint fpReference = > fingerprinter.getBitFingerprint(acReference); > > //Tanimoto-score > IBitFingerprint fpStructure = > fingerprinter.getBitFingerprint(acStructure); > ret = Tanimoto.calculate(fpReference, fpStructure); > > } catch (Exception ex) { > //... > } > > return ret; > } > } > > > I hope this code result matches with my goal. > > I always thank you all, cdk developers. > > Sincerely > Seopjae Shin > > > On Fri, Mar 26, 2021 at 6:36 PM John Mayfield <john.wilkinson...@gmail.com> > wrote: > >> Do you have a mol2 file or a SMILES file? It's not clear. Mol2 support >> isn't great in the CDK mainly because it's more a compchem/modelling format >> than cheminformations which primarily use SMILES or MOLfile. >> >> Presume you know how to read line by line from a file here is an example >> from SMILES: >> >> IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance(); >>> // load from SMILES and compute the ECFP (circular) fingerprint >>> IFingerprinter fpr = new CircularFingerprinter(); >>> SmilesParser smipar = new SmilesParser(bldr); >>> List<String> smiles = Arrays.asList("Clc1ccccc1", >>> "Fc1ccccc1", >>> "Ic1ccccc1", >>> "Clc1ncccc1"); >>> List<BitSet> fps = new ArrayList<>(); >>> for (String smi : smiles) { >>> IAtomContainer mol = smipar.parseSmiles(smi); >>> fps.add(fpr.getBitFingerprint(mol).asBitSet()); >>> } >>> // print N^2 comparison table >>> for (int j = 0; j < fps.size(); j++) >>> System.out.print("," + smiles.get(j)); >>> System.out.print('\n'); >>> for (int i = 0; i < fps.size(); i++) { >>> System.out.print(smiles.get(i)); >>> for (int j = 0; j < fps.size(); j++) { >>> System.out.printf(",%.3f", Tanimoto.calculate(fps.get(i), >>> fps.get(j))); >>> } >>> System.out.print('\n'); >>> } >> >> >> ,Clc1ccccc1,Fc1ccccc1,Ic1ccccc1,Clc1ncccc1 >> Clc1ccccc1,1.000,0.368,0.368,0.292 >> Fc1ccccc1,0.368,1.000,0.368,0.192 >> Ic1ccccc1,0.368,0.368,1.000,0.192 >> Clc1ncccc1,0.292,0.192,0.192,1.000 >> >> There are a lot more optimal ways of doing it and for a large comparison >> table use ChemFP: https://chemfp.com/. >> >> On Wed, 24 Mar 2021 at 06:42, Stesycki, Manuel < >> stesy...@mpi-muelheim.mpg.de> wrote: >> >>> Good morning, >>> >>> Use this class for Tanimoto calucations: >>> org.openscience.cdk.similarity.Tanimoto (see doc: >>> http://cdk.github.io/cdk/latest/docs/api/index.html) >>> >>> you could do something like this to calculate your tanimoto score: >>> >>> public static double cdkCalculateTanimotoCoef(IFingerprinter >>> fingerprinter, IAtomContainer acReference, IAtomContainer acStructure ) { >>> >>> double ret = 0.0; >>> >>> try { >>> >>> IBitFingerprint fpReference = >>> fingerprinter.getBitFingerprint(acReference); >>> >>> //Tanimoto-score >>> IBitFingerprint fpStructure = >>> fingerprinter.getBitFingerprint(acStructure); >>> ret = Tanimoto.calculate(fpReference, fpStructure); >>> >>> } catch (Exception ex) { >>> //... >>> } >>> >>> return ret; >>> } >>> >>> >>> >>> Viele Grüße, >>> Manuel Stesycki >>> >>> IT >>> 0208 / 306-2146 >>> Physikbau, Büro 117 >>> stesy...@mpi-muelheim.mpg.de >>> >>> Max-Planck-Institut für Kohlenforschung >>> Kaiser-Wilhelm-Platz 1 >>> D-45470 Mülheim an der Ruhr >>> http://www.kofo.mpg.de/de >>> >>> Am 24.03.2021 um 04:55 schrieb Sub Jae Shin <cnb.mons...@gmail.com>: >>> >>> To CDK developers. >>> >>> Hello, I'm trying to get drug-drug similarity by Tanimoto score. >>> >>> I'm a beginner of cdk and java, so I'm stuck in the process of changing >>> smiles file to Tanimoto score's calculate method's variable. >>> >>> package com.company; >>> import org.openscience.cdk.ChemFile; >>> import org.openscience.cdk.exception.CDKException; >>> import org.openscience.cdk.interfaces.IChemFile; >>> import org.openscience.cdk.io.SMILESReader; >>> import java.io.*; >>> >>> public class Main { >>> >>> public static void main(String[] args) { >>> try { >>> >>> InputStream mol2DataStream = new >>> FileInputStream("../data/drugbank/structure.smiles"); >>> SMILESReader reader = new SMILESReader(mol2DataStream); >>> IChemFile file = reader.read(new ChemFile()); >>> >>> } catch (FileNotFoundException | CDKException e) { >>> System.out.println(e.getMessage()); >>> } >>> } >>> } >>> >>> Sincerely >>> Seopjae Shin. >>> >>> >>> _______________________________________________ >>> Cdk-user mailing list >>> Cdk-user@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/cdk-user >>> >>> >>> _______________________________________________ >>> Cdk-user mailing list >>> Cdk-user@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/cdk-user >>> >>
_______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user