Eddie,
Thanks for the quick response.
I checked the file as you suggested and I get this:
0000000 2424 2424 000a
0000005
So it appears to end with (0x0a), correct?
Getting the file to you might be a trick as it is over 4 GB compressed.
My intention was to partition the file into multiple, smaller files, but
this weird error occurred.
-Kirk
On Mon, Nov 21, 2011 at 11:42 AM, Eddie Cao <eddie....@me.com> wrote:
> Hi Robert,
>
> It might help to create a small SD file consisting only of the last few
> structures in the SD file to make sure the error was not because the file
> does not end properly. Specifically, the latest RDKit release has a bug
> that causes it to stuck if the file does not end with line-feed character
> (0x0a). An easy way to check is to run `tail -1 INPUT.sdf | hexdump`. If
> the last character is not 0a, then you are a victim of this bug. The
> following example uses a bad SDF that ends with character 24:
>
> $ tail -1 test.sdf | hexdump
> 0000000 24 24 24 24
> 0000004
>
>
> If you provide a link to the SD file, I can also help you check.
>
> Eddie
>
>
> On Nov 21, 2011, at 10:20 AM, Robert DeLisle wrote:
>
> RDKit-sters,
>
> I'm working with a huge SD file that by all ways I measure it contains
> ~5,050,000 structures. (This is an eMolecules dataset.) In processing the
> file, I've run into an odd error. Even with the following very simple
> code, the file seems to be bottomless. I let it run overnight and I saw
> number as high as 42,000,000.
>
> Any ideas?
>
> -Kirk
>
>
>
> from rdkit import Chem
>
> sdin = Chem.SDMolSupplier
>
> for i,m in enumerate(sdin):
>
> if ( i % 100000 == 0 ):
> print 'Structure #' + str(i)
>
>
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
>
> http://p.sf.net/sfu/splunk-novd2d_______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss