Hi Evgueni, On Mon, Apr 13, 2009 at 10:30, Greg Landrum <[email protected]> wrote: > Dear Egueni, > > On Mon, Apr 13, 2009 at 11:20 AM, Evgueni Kolossov > <[email protected]> wrote: >> >> Looking like I have passed successfully the first stage - writing >> fingerprints into database as BLOB. >> I have enclosed file where you will find for one structure the Smiles string >> and the fingerprints as it extracted from the database (created with >> RDKFingerprintMol(*mol)). >> It will be very nice if you can check that I am storing/extracting the right >> fingerprints for this structure. >> Second stage - extract and re-create BitVector from the extracted string - >> there is a problem: it failing in ExplicitBitVect vRtn(strNew) probably >> because of allowOldFormat is false or I need to do something with the string >> - can you suggest anything? > > I don't think it has anything to do with the allowOldFormat since you > probably aren't saving fingerprints in the old format. I'd guess you > aren't extracting the full blob into the string. Again, this is > something that's dependent on the details of the database system you > are using and is probably covered in the documentation for your > database. > >> Based on all this conversions and the cost of extracting BLOBs I am thinking >> may be it better to store just SMILES and create fingerprints on fly? Have >> you tried/compare this two ways?
I did this at one point - it is very slow compared to other methods and probably very memory-consuming as well. Currently I have something similar to what you are trying to do; I use MyChem to extend MySQL with (OpenBabel) cheminformatics UDFs. Basically, you store your fingerprints as tinyblob and compare them with a tanimoto() function. The speed is acceptable (a tanimoto search on 440k columns takes ~ 2.5s on a 8-core 8gb ram machine), but by far not as fast as a native SQL implementation (à la "Chemical substructure searching in SQL"). This is particularly useful because it allows you to do substructure/tanimoto searching in any SQL query (protein-ligand interactions, for example or activities). I think Greg has something like that for Postgresql. > I would be very, very surprised if there was any substantial overhead > associated with using BLOBs in your database. In sqlite, postgresql, > and firebird it's pretty much none (maybe a few copies). In any case, > it's nothing compared to the time required to build a molecule from > SMILES and then generating a fingerprint for it. If this is not true > for MySQL, then something is badly wrong. > > -greg > > >> Regards, >> Evgueni >> >> -----Original Message----- >> From: Greg Landrum [mailto:[email protected]] >> Sent: 08 April 2009 18:53 >> To: Evgueni Kolossov >> Subject: Re: [Rdkit-discuss] Fingerprints writing >> >> There's RDKit code either in python: >> $RDBASE/Projects/DbCLI >> or in C++ for sqlite: >> $RDBASE/Code/Demos/sqlite/rdk_funcs.cpp >> >> Maybe there's enough there to get you started with mysql >> >> On Wed, Apr 8, 2009 at 5:55 PM, Evgueni Kolossov <[email protected]> >> wrote: >>> Thanks Greg, >>> Can you describe how are you doing this? >>> >>> regards, >>> Evgueni >>> >>> 2009/4/8 Greg Landrum <[email protected]> >>>> >>>> Evgueni, >>>> >>>> I'm afraid this is something specific to the database you're using and >>>> I don't think I can help. The key is not to forget that the strings >>>> from ToString() are *binary*, any operation that's expecting standard >>>> ASCII text is very, very unlikely to work. >>>> >>>> On Wed, Apr 8, 2009 at 12:59 PM, Evgueni Kolossov <[email protected]> >>>> wrote: >>>> > Hi Greg, >>>> > >>>> > You probably getting sick with my questions.... Sorry. >>>> > I still cannot manage to create SQL string for insert ToStrring() into >>>> > the >>>> > DB (MySQL). >>>> > When I add this to my string: >>>> > >>>> > ............ >>>> > strSQL += "'"; >>>> > strSQL += fp->ToString(); //or std;:string generated by this method >>>> > strSQL += "'"; >>>> > The last single quote will not be inserted and nothing can be inserted >>>> > into >>>> > this string after ToString(). >>>> > Any replacement of the single quotes will do the same. >>>> > SMILES string works without problem >>>> > >>>> > Can you suggest something? >>>> > Or I need to use file upload instead? >>>> > >>>> > Regards, >>>> > Evgueni >>>> > >>>> > 2009/4/7 Evgueni Kolossov <[email protected]> >>>> >> >>>> >> Thanks, >>>> >> >>>> >> I still cannot figure out why it failing when I am trying to insert >>>> >> ToString() value.... >>>> >> May be I need replace single quote to something else... >>>> >> >>>> >> Regards, >>>> >> Evgueni >>>> >> >>>> >> 2009/4/7 Greg Landrum <[email protected]> >>>> >>> >>>> >>> On Tue, Apr 7, 2009 at 7:32 PM, Evgueni Kolossov >> <[email protected]> >>>> >>> wrote: >>>> >>> > Thanks Greg - you are right as usual. >>>> >>> > Can you tell me - what are you storing in database: string from >>>> >>> > ToString() >>>> >>> > or string from BitVectorToText? >>>> >>> > >>>> >>> >>>> >>> I use the ToString form, because it's more compact and faster to >>>> >>> reconstruct (I believe). >>>> >>> The argument in favor of the ToString form is that it's theoretically >>>> >>> more interoperable; I figure that if I need that I can always add a >>>> >>> bitstring column later. >>>> >>> >>>> >>> -greg >>>> >> >>>> >> >>>> > >>>> > >>> >>> >> > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > Rdkit-discuss mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >

