On 27/05/2011 22:58, A. Heifets wrote:
> I'm trying to follow http://openbabel.org/wiki/Tutorial:Fingerprints
> and obabel -H fs on my data but I have some strange digressions from
> the tutorial.
>
> First, I'm not convinced that the index was built correctly (although
> there were no error messages to imply that it failed).

I suspect that the problem is the size of your datafile, which I guess 
is probably about 18GB. fs index files contain displacements into this 
file but they are only 32 bits, making the maximum file size 4GB. This 
means that the maximum number of molecules might be about 2 million for 
sdf files although much greater for SMILES files.

This limitation is not documented and needs to be. There also needs to 
be a warning when preparing the index when the datafile is found to be 
too large. And, of course, the deficiency needs to be eliminated by 
changing the structure of the index file, which will not happen in the 
next release, but maybe will later.

I don't understand your difficulty with  the -s parameter being 
interpreted as SMILES instead of a file name. Possibly it could be 
because of a corrupted fs file. Can you try again with a nice small 
dataset? Thanks for the detailed reporting.

Chris

   If you look at
> the first log below [1], you can see that OpenBabel found 8.2 million
> molecules and converted 7 thousand.  Is there a way to tell why it
> didn't convert the rest.  Invalid structures?  Abort after the Expand
> Warning?  I'm also unsure why OB reports taking 39 seconds when the
> date stamps report 20 minutes.
>
> I tested whether the entire database was converted by pulling the last
> molecule and querying the index for it.  If the whole file was
> successfully converted, then the search would find it (or, at least,
> other molecules with Tanimoto coefficient = 1).  So, I copy and pasted
> the last molecule into a file [2].  My second problem (see log [3])
> was that OpenBabel interprets the '-s' parameter as a SMILES string,
> unlike the "obabel -H fs" help which says I can pass in a filename.
>
> Fortunately, I had a copy of the SMILES string, so I tried querying
> with that.  As you can see in log [4], no Tanimoto coefficient 1
> molecules were pulled out, so I take that to confirm my initial
> suspicions that the index didn't get all of the molecules.  This
> surprises me since, as you can see in log [2] below, the molecule
> seems fine to me; I'm not sure why it didn't get added to the index.
>
> My question:  what am I doing wrong?
>
> Thanks!
>
> Cheers,
> Abe
>
>
> [1] The log of making the index.  The SVN build is fresh:
> nohup bash -c "date&&
> /home/aheifets/opt/openbabel-svn/build/bin/obabel --errorlevel 5 -isdf
> DB.sdf -ofs -ODB_manual.fs&&  date">index1.log 2>index1.err
> $ cat index1.*
> ==============================
> *** Open Babel Warning  in Expand
>    Alias CH3. was not chemically interpreted
>
> This will prepare an index of RXN_db.sdf and may take some time...
> It contains 8271991 molecules Estimated completion time 7e+02 minutes
>
>   It took 39 seconds
> 7151 molecules converted
> Fri May 27 16:36:35 EDT 2011
> Fri May 27 16:57:04 EDT 2011
>
> [2] The contents of last_mol.sdf:
> $ cat last_mol.sdf
> 0028.mol#1
>   OpenBabel05201115292D
>
>   21 21  0  0  0  0  0  0  0  0999 V2000
>     -0.2826   -1.4437    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>      0.5424   -1.4437    0.0000 S   0  0  0  0  0  0  0  0  0  0  0  0
>      0.5424   -0.6187    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>     -0.1721   -0.2062    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     -0.1721    0.6188    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     -0.8865    1.0313    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     -0.8865    1.8562    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>     -1.6010    2.2687    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     -2.3155    1.8562    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>      3.9796   -1.4437    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>      3.1546   -2.2687    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>      2.9149   -0.6187    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>      1.3674   -1.4437    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>      0.5424   -2.2687    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
>     -0.6951   -2.1582    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     -1.5201   -2.1582    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     -1.9326   -1.4437    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     -2.7576   -1.4437    0.0000 S   0  0  0  0  0  0  0  0  0  0  0  0
>     -3.5826   -1.4437    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     -1.5201   -0.7293    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     -0.6951   -0.7293    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>    1 21  2  0  0  0  0
>    1  2  1  0  0  0  0
>    2 14  2  0  0  0  0
>    2 13  2  0  0  0  0
>    2  3  1  0  0  0  0
>    3  4  1  0  0  0  0
>    4  5  1  0  0  0  0
>    5  6  1  0  0  0  0
>    6  7  1  0  0  0  0
>    7  8  1  0  0  0  0
>    8  9  1  0  0  0  0
>   10  9  2  0  0  0  0
>   11  9  1  0  0  0  0
>   12  8  2  0  0  0  0
>   15  1  1  0  0  0  0
>   16 15  2  0  0  0  0
>   17 16  1  0  0  0  0
>   17 18  1  0  0  0  0
>   18 19  1  0  0  0  0
>   20 17  2  0  0  0  0
>   21 20  1  0  0  0  0
> M  END
>>   <cansmi>
> CSc1ccc(cc1)S(=O)(=O)NCCCOC(=O)C(=C)C
>
>>   <formula>
> C14H19NO4S2
>
>>   <InChI>
> InChI=1S/C14H19NO4S2/c1-11(2)14(16)19-10-4-9-15-21(17,18)13-7-5-12(20-3)6-8-13/h5-8,15H,1,4,9-10H2,2-3H3
>
> $$$$
>
> [3] The log where OpenBabel interprets an SDF filename as a SMILES string:
> $ /home/aheifets/opt/openbabel-svn/build/bin/obabel ./DB_manual.fs
> -osdf -Ojunk.sdf -s last_mol.sdf -at5
> ==============================
> *** Open Babel Warning  in ReadMolecule
>    Either the file contains Atom Lists, which are not currently
> supported and are ignored
> or the atom or bond count is>999, which is not allowed in V2000 MDL files.
> ==============================
> *** Open Babel Error  in ReadMolecule
>    last_mol.sdf contained a character '_' which is invalid in SMILES
> ==============================
> *** Open Babel Error  in ObtainTarget
>    Cannot read the SMILES string
> 0 molecules converted
> $ cat last_mol.sdf  | grep '_'
> $
>
> [4] The log where OpenBabel doesn't seem to find the original query molecule:
> $ /home/aheifets/opt/openbabel-svn/build/bin/obabel ./DB_manual.fs
> -osdf -Ojunk.sdf -s 'CSc1ccc(cc1)S(=O)(=O)NCCCOC(=O)C(=C)C' -at5 -ofpt
> 5 molecules converted
> $ cat junk.sdf
>> 0450.cdx
>> 00452001.cdx#3   Tanimoto from 0450.cdx = 0.221591
>> 0590.cdx#7   Tanimoto from 0450.cdx = 0.278302
>> 0260.cdx   Tanimoto from 0450.cdx = 0.256039
>> 025001.cdx#2   Tanimoto from 0450.cdx = 0.238889
>
>


------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to