Hi Matthew,
On Wed, Jan 14, 2015 at 7:12 PM, Matthew Lardy <[email protected]> wrote:
>
> I am playing with the fingerprinting in RDKit and I am looking for primer
> (if anyone has one) to deconvolute what feature is mapped to a particular
> bit. I am not going to use the Python wrappers (which I see Greg provided
> a primer for), as I can't for the life of me get them to work (and I have
> completely stopped trying to get them through the tests).
>
>
Hrm... that's too bad. Which system were you building on?
> Does anyone have a path via a Java or C++ route?
>
The python wrappers for most everything in the fingerprint space are mostly
just calling C++ core functions, so there tend to be relatively short
translations from the Python code to C++ (C++ is, of course, more verbose
than Python). I've attached a demo file that shows how to generate SMILES
that "explain" the bits in Morgan fingerprints. I will try to make some
time over the next couple days to do this for a couple more fingerprint
types (and probably do a blog post or documentation entry from it all), but
I hope this is already useful.
This may or may not be possible with the Java wrapper. I suspect that it
is, but sometimes the optional arguments to wrapped functions/methods are
not completely useable and need some hand-tweaking in the wrapper code in
order to make them work. Creating the example in Java would, anyway, take
me a lot longer.
-greg
//
// Copyright (C) 2015 Greg Landrum
// @@ All Rights Reserved @@
// This file is part of the RDKit.
// The contents are covered by the terms of the BSD license
// which is included in the file license.txt, found at the root
// of the RDKit source tree.
//
/* Can be built with:
g++ -o generatefp.exe generatefp.cpp -I$RDBASE/Code \
-L$RDBASE/lib -lSubgraphs -lSmilesParse -lFingerprints \
-lSubstructMatch -lGraphMol -lDataStructs -lRDGeometryLib -lRDGeneral
*/
#include <RDGeneral/Invariant.h>
#include <DataStructs/BitVects.h>
#include <GraphMol/RDKitBase.h>
#include <GraphMol/SmilesParse/SmilesParse.h>
#include <GraphMol/SmilesParse/SmilesWrite.h>
#include <GraphMol/Fingerprints/MorganFingerprints.h>
#include <GraphMol/Subgraphs/Subgraphs.h>
#include <boost/foreach.hpp>
#include <vector>
#include <algorithm>
#include <iostream>
using namespace RDKit;
typedef boost::shared_ptr<ExplicitBitVect> EBV_SPTR;
void ExplainMorganFp(std::string &smiles,
unsigned int radius=2,
unsigned int fplen=2048){
std::cout<<"Doing: "<<smiles<<std::endl;
ROMol *mol=SmilesToMol(smiles);
TEST_ASSERT(mol);
// generate the fingerprint and collect the bit information:
MorganFingerprints::BitInfoMap bitInfo;
ExplicitBitVect *fp=MorganFingerprints::getFingerprintAsBitVect(*mol,radius,fplen,
NULL,NULL,
false,true,false,
&bitInfo);
if(fp){
typedef std::pair<boost::uint32_t,boost::uint32_t> AtomInfoType;
// loop over the bits:
BOOST_FOREACH(MorganFingerprints::BitInfoMap::value_type &bit,bitInfo){
std::cout<<" bit: "<<bit.first<<std::endl;
BOOST_FOREACH(AtomInfoType &ai,bit.second){
// each element of the vector is an (atom_index, radius) pair
std::cout << " atom: "<<ai.first<<" radius: "<<ai.second<<std::endl;
// collect the atoms and bonds within the given radius of that atom:
std::vector<int> *env=NULL;
std::vector<int> atoms;
atoms.push_back(ai.first);
if(ai.second>0){
env = new std::vector<int>(findAtomEnvironmentOfRadiusN(*mol,ai.second,ai.first));
// loop over the bonds in the environment and add their atoms to the list:
BOOST_FOREACH(int bi,*env){
const Bond *bond=mol->getBondWithIdx(bi);
if(std::find(atoms.begin(),atoms.end(),bond->getBeginAtomIdx())==atoms.end()){
atoms.push_back(bond->getBeginAtomIdx());
}
if(std::find(atoms.begin(),atoms.end(),bond->getEndAtomIdx())==atoms.end()){
atoms.push_back(bond->getEndAtomIdx());
}
}
}
// generate the SMILES for the fragment defined by those atoms and bonds,
// make sure it starts at the central atom:
std::string smi=MolFragmentToSmiles(*mol,atoms,env,NULL,NULL,false,false,ai.first);
std::cout << " smiles: "<<smi<<std::endl;
delete env;
}
}
}
delete fp;
delete mol;
}
int
main(int argc, char *argv[])
{
std::string smi="CCOC";
ExplainMorganFp(smi);
smi="c1ccccc1OC";
ExplainMorganFp(smi);
}
------------------------------------------------------------------------------
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss