Hi Gonzalo,

On Sat, Jun 8, 2013 at 10:31 AM, Gonzalo Colmenarejo-Sanchez <
[email protected]> wrote:

>
>
> Could anyone provide some advice about how to run (fast but approximate)
> substructure searches with fingerprints using C++? I have a large set of
> SMILES for molecules and a relatively large set of SMILES/SMARTS for
> substructures.****
>
> **
>

Sorry, I meant to do this last weekend but it ended up slipping my mind.

The attached file demonstrates how to use fingerprints for substructure
screening.

The usual required caveat: the substructure fingerprints are quite
efficient for molecules that don't contain query features, but their
efficacy drops as query features are introduced.

If you're going to be processing the same molecules/queries repeatedly
(i.e. multiple runs on the same sets), it probably makes sense to use the
RDKit's serialization code to save the pre-processed molecules and
fingerprints. If applicable, let me know and I can generate some sample
code for that too.

Hope this helps,
-greg
// $Id$
//
//  Copyright (C) 2008-2011 Greg Landrum
//   @@ All Rights Reserved @@
//  This file is part of the RDKit.
//  The contents are covered by the terms of the BSD license
//  which is included in the file license.txt, found at the root
//  of the RDKit source tree.
//
/*  Can be built with:
   g++ -o fingerprint_screen.exe fingerprint_screen.cpp -I$RDBASE/Code -I$RDBASE/Extern \
       -L$RDBASE/lib -lFileParsers -lSmilesParse -lFingerprints \
       -lSubstructMatch -lGraphMol -lDataStructs -lRDGeometryLib -lRDGeneral
*/

#include <RDGeneral/Invariant.h>
#include <DataStructs/BitVects.h>
#include <DataStructs/BitOps.h>
#include <GraphMol/RDKitBase.h>
#include <GraphMol/SmilesParse/SmilesParse.h>
#include <GraphMol/SmilesParse/SmilesWrite.h>
#include <GraphMol/Substruct/SubstructMatch.h>
#include <GraphMol/Depictor/RDDepictor.h>
#include <GraphMol/FileParsers/MolSupplier.h>
#include <GraphMol/Fingerprints/Fingerprints.h>


#include <RDGeneral/RDLog.h>
#include <vector>
#include <algorithm>

using namespace RDKit;

typedef boost::shared_ptr<ExplicitBitVect> EBV_SPTR;

void FPScreen()
{
  std::string rdbase = getenv("RDBASE");
  std::string sdname = rdbase + "/Regress/Data/mols.1000.sdf";
  std::string qname = rdbase + "/Regress/Data/queries.txt";
  SDMolSupplier msuppl(sdname);
  SmilesMolSupplier qsuppl(qname," ",0,-1,false);

  // --------------------------------------------
  //   Read molecules
  // --------------------------------------------
  std::vector<ROMOL_SPTR> mols;
  BOOST_LOG(rdInfoLog)<<"loading mols: "<<std::endl;
  while(!msuppl.atEnd()){
    ROMol *m=msuppl.next();
    if(!m) continue;
    ROMOL_SPTR mp(m);
    mols.push_back(mp);
  }
  std::vector<ROMOL_SPTR> queries;
  BOOST_LOG(rdInfoLog)<<"loading queries: "<<std::endl;
  while(!qsuppl.atEnd()){
    ROMol *m=qsuppl.next();
    if(!m) continue;
    ROMOL_SPTR mp(m);
    queries.push_back(mp);
  }
  
  // --------------------------------------------
  //   Construct fingerprints
  // --------------------------------------------
  std::vector<EBV_SPTR > mol_fps;
  BOOST_LOG(rdInfoLog)<<"fingerprinting mols: "<<std::endl;
  BOOST_FOREACH(ROMOL_SPTR mp,mols){
    ExplicitBitVect *fp=PatternFingerprintMol(*mp);
    mol_fps.push_back(EBV_SPTR(fp));
  }

  std::vector<EBV_SPTR > query_fps;
  BOOST_LOG(rdInfoLog)<<"fingerprinting queries: "<<std::endl;
  BOOST_FOREACH(ROMOL_SPTR mp,queries){
    ExplicitBitVect *fp=PatternFingerprintMol(*mp);
    query_fps.push_back(EBV_SPTR(fp));
  }


  // --------------------------------------------
  //   substructure searches
  // --------------------------------------------
  unsigned int nMatches=0;
  for(unsigned int i=0;i<mols.size();++i){
    ROMOL_SPTR mp=mols[i];
    EBV_SPTR mfp=mol_fps[i];
    for(unsigned int j=0;j<queries.size();++j){
      // fingerprint screen:
      EBV_SPTR qfp=query_fps[j];
      if(!AllProbeBitsMatch(*qfp,*mfp)) continue;

      // molecule substructure search:
      MatchVectType mv;
      ROMOL_SPTR qp=queries[j];
      if(SubstructMatch(*mp,*qp,mv)) ++nMatches;
    }
  }
  BOOST_LOG(rdInfoLog)<<" num matches: "<<nMatches<<std::endl;
  
  
}

int
main(int argc, char *argv[])
{
  RDLog::InitLogs();
  FPScreen();
}
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to