On Nov 21, 2011, at 7:47 PM, Robert DeLisle wrote:
> Getting the file to you might be a trick as it is over 4 GB compressed.

I think that's a clue.

RDKit uses tell/seek operations on the underlying file stream, like this:


  ROMol *SDMolSupplier::next() {
    PRECONDITION(dp_inStream,"no stream");
    // set the stream to the current position
    dp_inStream->seekg(d_molpos[d_last]);


d_molpos contains "std::streampos" elements,

MolSupplier.h:    std::vector<std::streampos> d_molpos; // vector of positions 
in the file for molecules


and I can't tell if that's a 32-bit or 64-bit value, but there's
code which assumes it's an unsigned 32-bit integer:

  std::string SDMolSupplier::getItemText(unsigned int idx){
    PRECONDITION(dp_inStream,"no stream");
    unsigned int holder=d_last;
    moveTo(idx);
    unsigned int begP=d_molpos[idx];
    unsigned int endP;
    try {


My guess is that there's an overflow in this code, causing it to
loop from 2**32 back to 0.


                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to