I do not know which language you are using but in python it is very
easy with mysqldb. The trick is not to try escaping any characters;
the mysqldb api will do that for you. Here is an example:
SQL = "INSERT INTO ChemImages VALUES(%s, %s)" # NO quotes
gifs = os.listdir(GIF_DIR)
for file in gifs:
hetID = file.split('.')[0]
path = os.path.join(GIF_DIR, file)
file = open(path).read() # reads binary
CONN.execute(SQL, (hetID, file))
Important is to use exactly this syntax. As a result, you should get
something like that:
http://img17.imageshack.us/img17/1009/capturejak.png. The binary data
is preserved. ChemFP.fp in my case is a tinyblob with 128 byte
fingerprints. By the way, Greg should recognise the molecule ;)
On Mon, Apr 13, 2009 at 11:25, Evgueni Kolossov <[email protected]> wrote:
> Hi Adrian,
>
> Can you please show me how you are converting/storing/extracting binary
> string to BLOB for MySQL?
> As you can see I am straggling with the escape characters.
>
> Regards,
> Evgueni
>
> 2009/4/13 Adrian Schreyer <[email protected]>
>>
>> Hi Evgueni,
>>
>> On Mon, Apr 13, 2009 at 10:30, Greg Landrum <[email protected]>
>> wrote:
>> > Dear Egueni,
>> >
>> > On Mon, Apr 13, 2009 at 11:20 AM, Evgueni Kolossov
>> > <[email protected]> wrote:
>> >>
>> >> Looking like I have passed successfully the first stage - writing
>> >> fingerprints into database as BLOB.
>> >> I have enclosed file where you will find for one structure the Smiles
>> >> string
>> >> and the fingerprints as it extracted from the database (created with
>> >> RDKFingerprintMol(*mol)).
>> >> It will be very nice if you can check that I am storing/extracting the
>> >> right
>> >> fingerprints for this structure.
>> >> Second stage - extract and re-create BitVector from the extracted
>> >> string -
>> >> there is a problem: it failing in ExplicitBitVect vRtn(strNew) probably
>> >> because of allowOldFormat is false or I need to do something with the
>> >> string
>> >> - can you suggest anything?
>> >
>> > I don't think it has anything to do with the allowOldFormat since you
>> > probably aren't saving fingerprints in the old format. I'd guess you
>> > aren't extracting the full blob into the string. Again, this is
>> > something that's dependent on the details of the database system you
>> > are using and is probably covered in the documentation for your
>> > database.
>> >
>> >> Based on all this conversions and the cost of extracting BLOBs I am
>> >> thinking
>> >> may be it better to store just SMILES and create fingerprints on fly?
>> >> Have
>> >> you tried/compare this two ways?
>>
>> I did this at one point - it is very slow compared to other methods
>> and probably very memory-consuming as well. Currently I have something
>> similar to what you are trying to do; I use MyChem to extend MySQL
>> with (OpenBabel) cheminformatics UDFs. Basically, you store your
>> fingerprints as tinyblob and compare them with a tanimoto() function.
>> The speed is acceptable (a tanimoto search on 440k columns takes ~
>> 2.5s on a 8-core 8gb ram machine), but by far not as fast as a native
>> SQL implementation (à la "Chemical substructure searching in SQL").
>> This is particularly useful because it allows you to do
>> substructure/tanimoto searching in any SQL query (protein-ligand
>> interactions, for example or activities). I think Greg has something
>> like that for Postgresql.
>>
>> > I would be very, very surprised if there was any substantial overhead
>> > associated with using BLOBs in your database. In sqlite, postgresql,
>> > and firebird it's pretty much none (maybe a few copies). In any case,
>> > it's nothing compared to the time required to build a molecule from
>> > SMILES and then generating a fingerprint for it. If this is not true
>> > for MySQL, then something is badly wrong.
>> >
>> > -greg
>> >
>> >
>> >> Regards,
>> >> Evgueni
>> >>
>> >> -----Original Message-----
>> >> From: Greg Landrum [mailto:[email protected]]
>> >> Sent: 08 April 2009 18:53
>> >> To: Evgueni Kolossov
>> >> Subject: Re: [Rdkit-discuss] Fingerprints writing
>> >>
>> >> There's RDKit code either in python:
>> >> $RDBASE/Projects/DbCLI
>> >> or in C++ for sqlite:
>> >> $RDBASE/Code/Demos/sqlite/rdk_funcs.cpp
>> >>
>> >> Maybe there's enough there to get you started with mysql
>> >>
>> >> On Wed, Apr 8, 2009 at 5:55 PM, Evgueni Kolossov <[email protected]>
>> >> wrote:
>> >>> Thanks Greg,
>> >>> Can you describe how are you doing this?
>> >>>
>> >>> regards,
>> >>> Evgueni
>> >>>
>> >>> 2009/4/8 Greg Landrum <[email protected]>
>> >>>>
>> >>>> Evgueni,
>> >>>>
>> >>>> I'm afraid this is something specific to the database you're using
>> >>>> and
>> >>>> I don't think I can help. The key is not to forget that the strings
>> >>>> from ToString() are *binary*, any operation that's expecting standard
>> >>>> ASCII text is very, very unlikely to work.
>> >>>>
>> >>>> On Wed, Apr 8, 2009 at 12:59 PM, Evgueni Kolossov
>> >>>> <[email protected]>
>> >>>> wrote:
>> >>>> > Hi Greg,
>> >>>> >
>> >>>> > You probably getting sick with my questions.... Sorry.
>> >>>> > I still cannot manage to create SQL string for insert ToStrring()
>> >>>> > into
>> >>>> > the
>> >>>> > DB (MySQL).
>> >>>> > When I add this to my string:
>> >>>> >
>> >>>> > ............
>> >>>> > strSQL += "'";
>> >>>> > strSQL += fp->ToString(); //or std;:string generated by this method
>> >>>> > strSQL += "'";
>> >>>> > The last single quote will not be inserted and nothing can be
>> >>>> > inserted
>> >>>> > into
>> >>>> > this string after ToString().
>> >>>> > Any replacement of the single quotes will do the same.
>> >>>> > SMILES string works without problem
>> >>>> >
>> >>>> > Can you suggest something?
>> >>>> > Or I need to use file upload instead?
>> >>>> >
>> >>>> > Regards,
>> >>>> > Evgueni
>> >>>> >
>> >>>> > 2009/4/7 Evgueni Kolossov <[email protected]>
>> >>>> >>
>> >>>> >> Thanks,
>> >>>> >>
>> >>>> >> I still cannot figure out why it failing when I am trying to
>> >>>> >> insert
>> >>>> >> ToString() value....
>> >>>> >> May be I need replace single quote to something else...
>> >>>> >>
>> >>>> >> Regards,
>> >>>> >> Evgueni
>> >>>> >>
>> >>>> >> 2009/4/7 Greg Landrum <[email protected]>
>> >>>> >>>
>> >>>> >>> On Tue, Apr 7, 2009 at 7:32 PM, Evgueni Kolossov
>> >> <[email protected]>
>> >>>> >>> wrote:
>> >>>> >>> > Thanks Greg - you are right as usual.
>> >>>> >>> > Can you tell me - what are you storing in database: string from
>> >>>> >>> > ToString()
>> >>>> >>> > or string from BitVectorToText?
>> >>>> >>> >
>> >>>> >>>
>> >>>> >>> I use the ToString form, because it's more compact and faster to
>> >>>> >>> reconstruct (I believe).
>> >>>> >>> The argument in favor of the ToString form is that it's
>> >>>> >>> theoretically
>> >>>> >>> more interoperable; I figure that if I need that I can always add
>> >>>> >>> a
>> >>>> >>> bitstring column later.
>> >>>> >>>
>> >>>> >>> -greg
>> >>>> >>
>> >>>> >>
>> >>>> >
>> >>>> >
>> >>>
>> >>>
>> >>
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > This SF.net email is sponsored by:
>> > High Quality Requirements in a Collaborative Environment.
>> > Download a free trial of Rational Requirements Composer Now!
>> > http://p.sf.net/sfu/www-ibm-com
>> > _______________________________________________
>> > Rdkit-discuss mailing list
>> > [email protected]
>> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by:
>> High Quality Requirements in a Collaborative Environment.
>> Download a free trial of Rational Requirements Composer Now!
>> http://p.sf.net/sfu/www-ibm-com
>> _______________________________________________
>> Rdkit-discuss mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
> --
> Dr. Evgueni Kolossov (PhD)
> [email protected]
> Tel. +44(0)1628 627168
> Mob. +44(0)7812070446
>