Hi Matthew,

On Wed, Jan 14, 2015 at 7:12 PM, Matthew Lardy <[email protected]> wrote:

>
> I am playing with the fingerprinting in RDKit and I am looking for primer
> (if anyone has one) to deconvolute what feature is mapped to a particular
> bit.  I am not going to use the Python wrappers (which I see Greg provided
> a primer for), as I can't for the life of me get them to work (and I have
> completely stopped trying to get them through the tests).
>
>
Hrm... that's too bad. Which system were you building on?



> Does anyone have a path via a Java or C++ route?
>

The python wrappers for most everything in the fingerprint space are mostly
just calling C++ core functions, so there tend to be relatively short
translations from the Python code to C++ (C++ is, of course, more verbose
than Python). I've attached a demo file that shows how to generate SMILES
that "explain" the bits in Morgan fingerprints. I will try to make some
time over the next couple days to do this for a couple more fingerprint
types (and probably do a blog post or documentation entry from it all), but
I hope this is already useful.

This may or may not be possible with the Java wrapper. I suspect that it
is, but sometimes the optional arguments to wrapped functions/methods are
not completely useable and need some hand-tweaking in the wrapper code in
order to make them work. Creating the example in Java would, anyway, take
me a lot longer.

-greg
//
//  Copyright (C) 2015 Greg Landrum
//   @@ All Rights Reserved @@
//  This file is part of the RDKit.
//  The contents are covered by the terms of the BSD license
//  which is included in the file license.txt, found at the root
//  of the RDKit source tree.
//
/*  Can be built with:

g++ -o generatefp.exe generatefp.cpp -I$RDBASE/Code \
     -L$RDBASE/lib -lSubgraphs -lSmilesParse -lFingerprints \
     -lSubstructMatch -lGraphMol -lDataStructs -lRDGeometryLib -lRDGeneral
*/

#include <RDGeneral/Invariant.h>
#include <DataStructs/BitVects.h>
#include <GraphMol/RDKitBase.h>
#include <GraphMol/SmilesParse/SmilesParse.h>
#include <GraphMol/SmilesParse/SmilesWrite.h>
#include <GraphMol/Fingerprints/MorganFingerprints.h>
#include <GraphMol/Subgraphs/Subgraphs.h>

#include <boost/foreach.hpp>

#include <vector>
#include <algorithm>
#include <iostream>


using namespace RDKit;

typedef boost::shared_ptr<ExplicitBitVect> EBV_SPTR;

void ExplainMorganFp(std::string &smiles,
                     unsigned int radius=2,
                     unsigned int fplen=2048){
  std::cout<<"Doing: "<<smiles<<std::endl;
  ROMol *mol=SmilesToMol(smiles);
  TEST_ASSERT(mol);

  // generate the fingerprint and collect the bit information:
  MorganFingerprints::BitInfoMap bitInfo;
  ExplicitBitVect *fp=MorganFingerprints::getFingerprintAsBitVect(*mol,radius,fplen,
                                                                  NULL,NULL,
                                                                  false,true,false,
                                                                  &bitInfo);
  if(fp){
    typedef std::pair<boost::uint32_t,boost::uint32_t> AtomInfoType;
    
    // loop over the bits:
    BOOST_FOREACH(MorganFingerprints::BitInfoMap::value_type &bit,bitInfo){
      std::cout<<"  bit: "<<bit.first<<std::endl;
      BOOST_FOREACH(AtomInfoType &ai,bit.second){
        // each element of the vector is an (atom_index, radius) pair
        std::cout << "    atom: "<<ai.first<<" radius: "<<ai.second<<std::endl;
        
        // collect the atoms and bonds within the given radius of that atom:
        std::vector<int> *env=NULL;
        std::vector<int> atoms;
        atoms.push_back(ai.first);
        if(ai.second>0){
          env = new std::vector<int>(findAtomEnvironmentOfRadiusN(*mol,ai.second,ai.first));
          // loop over the bonds in the environment and add their atoms to the list:
          BOOST_FOREACH(int bi,*env){
            const Bond *bond=mol->getBondWithIdx(bi);
            if(std::find(atoms.begin(),atoms.end(),bond->getBeginAtomIdx())==atoms.end()){
              atoms.push_back(bond->getBeginAtomIdx());
            }
            if(std::find(atoms.begin(),atoms.end(),bond->getEndAtomIdx())==atoms.end()){
              atoms.push_back(bond->getEndAtomIdx());
            }
          }
        }
        // generate the SMILES for the fragment defined by those atoms and bonds,
        // make sure it starts at the central atom:
        std::string smi=MolFragmentToSmiles(*mol,atoms,env,NULL,NULL,false,false,ai.first);

        std::cout << "         smiles: "<<smi<<std::endl;
        delete env;
      }
    }
  }
  delete fp;
  delete mol;
}

int
main(int argc, char *argv[])
{
  std::string smi="CCOC";
  ExplainMorganFp(smi);
  smi="c1ccccc1OC";
  ExplainMorganFp(smi);
}
------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to