Hi Evgueni,

On Mon, Apr 13, 2009 at 10:30, Greg Landrum <[email protected]> wrote:
> Dear Egueni,
>
> On Mon, Apr 13, 2009 at 11:20 AM, Evgueni Kolossov
> <[email protected]> wrote:
>>
>> Looking like I have passed successfully the first stage - writing
>> fingerprints into database as BLOB.
>> I have enclosed file where you will find for one structure the Smiles string
>> and the fingerprints as it extracted from the database (created with
>> RDKFingerprintMol(*mol)).
>> It will be very nice if you can check that I am storing/extracting the right
>> fingerprints for this structure.
>> Second stage - extract and re-create BitVector from the extracted string -
>> there is a problem: it failing in ExplicitBitVect vRtn(strNew) probably
>> because of allowOldFormat is false or I need to do something with the string
>> - can you suggest anything?
>
> I don't think it has anything to do with the allowOldFormat since you
> probably aren't saving fingerprints in the old format. I'd guess you
> aren't extracting the full blob into the string. Again, this is
> something that's dependent on the details of the database system you
> are using and is probably covered in the documentation for your
> database.
>
>> Based on all this conversions and the cost of extracting BLOBs I am thinking
>> may be it better to store just SMILES and create fingerprints on fly? Have
>> you tried/compare this two ways?

I did this at one point - it is very slow compared to other methods
and probably very memory-consuming as well. Currently I have something
similar to what you are trying to do; I use MyChem to extend MySQL
with (OpenBabel) cheminformatics UDFs. Basically, you store your
fingerprints as tinyblob and compare them with a tanimoto() function.
The speed is acceptable (a tanimoto search on 440k columns takes ~
2.5s on a 8-core 8gb ram machine), but by far not as fast as a native
SQL implementation (à la "Chemical substructure searching in SQL").
This is particularly useful because it allows you to do
substructure/tanimoto searching in any SQL query (protein-ligand
interactions, for example or activities). I think Greg has something
like that for Postgresql.

> I would be very, very surprised if there was any substantial overhead
> associated with using BLOBs in your database. In sqlite, postgresql,
> and firebird it's pretty much none (maybe a few copies). In any case,
> it's nothing compared to the time required to build a molecule from
> SMILES and then generating a fingerprint for it. If this is not true
> for MySQL, then something is badly wrong.
>
> -greg
>
>
>> Regards,
>> Evgueni
>>
>> -----Original Message-----
>> From: Greg Landrum [mailto:[email protected]]
>> Sent: 08 April 2009 18:53
>> To: Evgueni Kolossov
>> Subject: Re: [Rdkit-discuss] Fingerprints writing
>>
>> There's RDKit code either in python:
>>  $RDBASE/Projects/DbCLI
>> or in C++ for sqlite:
>>  $RDBASE/Code/Demos/sqlite/rdk_funcs.cpp
>>
>> Maybe there's enough there to get you started with mysql
>>
>> On Wed, Apr 8, 2009 at 5:55 PM, Evgueni Kolossov <[email protected]>
>> wrote:
>>> Thanks Greg,
>>> Can you describe how are you doing this?
>>>
>>> regards,
>>> Evgueni
>>>
>>> 2009/4/8 Greg Landrum <[email protected]>
>>>>
>>>> Evgueni,
>>>>
>>>> I'm afraid this is something specific to the database you're using and
>>>> I don't think I can help. The key is not to forget that the strings
>>>> from ToString() are *binary*, any operation that's expecting standard
>>>> ASCII text is very, very unlikely to work.
>>>>
>>>> On Wed, Apr 8, 2009 at 12:59 PM, Evgueni Kolossov <[email protected]>
>>>> wrote:
>>>> > Hi Greg,
>>>> >
>>>> > You probably getting sick with my questions.... Sorry.
>>>> > I still cannot manage to create SQL string for insert ToStrring() into
>>>> > the
>>>> > DB (MySQL).
>>>> > When I add this to my string:
>>>> >
>>>> > ............
>>>> > strSQL += "'";
>>>> > strSQL += fp->ToString(); //or std;:string generated by this method
>>>> > strSQL += "'";
>>>> > The last single quote will not be inserted and nothing can be inserted
>>>> > into
>>>> > this string after ToString().
>>>> > Any replacement of the single quotes will do the same.
>>>> > SMILES string works without problem
>>>> >
>>>> > Can you suggest something?
>>>> > Or I need to use file upload instead?
>>>> >
>>>> > Regards,
>>>> > Evgueni
>>>> >
>>>> > 2009/4/7 Evgueni Kolossov <[email protected]>
>>>> >>
>>>> >> Thanks,
>>>> >>
>>>> >> I still cannot figure out why it failing when I am trying to insert
>>>> >> ToString() value....
>>>> >> May be I need replace single quote to something else...
>>>> >>
>>>> >> Regards,
>>>> >> Evgueni
>>>> >>
>>>> >> 2009/4/7 Greg Landrum <[email protected]>
>>>> >>>
>>>> >>> On Tue, Apr 7, 2009 at 7:32 PM, Evgueni Kolossov
>> <[email protected]>
>>>> >>> wrote:
>>>> >>> > Thanks Greg - you are right as usual.
>>>> >>> > Can you tell me - what are you storing in database: string from
>>>> >>> > ToString()
>>>> >>> > or string from BitVectorToText?
>>>> >>> >
>>>> >>>
>>>> >>> I use the ToString form, because it's more compact and faster to
>>>> >>> reconstruct (I believe).
>>>> >>> The argument in favor of the ToString form is that it's theoretically
>>>> >>> more interoperable; I figure that if I need that I can always add a
>>>> >>> bitstring column later.
>>>> >>>
>>>> >>> -greg
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>
>>>
>>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by:
> High Quality Requirements in a Collaborative Environment.
> Download a free trial of Rational Requirements Composer Now!
> http://p.sf.net/sfu/www-ibm-com
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

Reply via email to