Andrew,
Good catch! I had wondered if there might be a size problem but couldn't
make the connection that you made. I'll find another method to partition
the file.
-Kirk
On Mon, Nov 21, 2011 at 12:01 PM, Andrew Dalke <da...@dalkescientific.com>wrote:
> On Nov 21, 2011, at 7:47 PM, Robert DeLisle wrote:
> > Getting the file to you might be a trick as it is over 4 GB compressed.
>
> I think that's a clue.
>
> RDKit uses tell/seek operations on the underlying file stream, like this:
>
>
> ROMol *SDMolSupplier::next() {
> PRECONDITION(dp_inStream,"no stream");
> // set the stream to the current position
> dp_inStream->seekg(d_molpos[d_last]);
>
>
> d_molpos contains "std::streampos" elements,
>
> MolSupplier.h: std::vector<std::streampos> d_molpos; // vector of
> positions in the file for molecules
>
>
> and I can't tell if that's a 32-bit or 64-bit value, but there's
> code which assumes it's an unsigned 32-bit integer:
>
> std::string SDMolSupplier::getItemText(unsigned int idx){
> PRECONDITION(dp_inStream,"no stream");
> unsigned int holder=d_last;
> moveTo(idx);
> unsigned int begP=d_molpos[idx];
> unsigned int endP;
> try {
>
>
> My guess is that there's an overflow in this code, causing it to
> loop from 2**32 back to 0.
>
>
> Andrew
> da...@dalkescientific.com
>
>
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss