On Nov 21, 2011, at 7:47 PM, Robert DeLisle wrote: > Getting the file to you might be a trick as it is over 4 GB compressed.
I think that's a clue. RDKit uses tell/seek operations on the underlying file stream, like this: ROMol *SDMolSupplier::next() { PRECONDITION(dp_inStream,"no stream"); // set the stream to the current position dp_inStream->seekg(d_molpos[d_last]); d_molpos contains "std::streampos" elements, MolSupplier.h: std::vector<std::streampos> d_molpos; // vector of positions in the file for molecules and I can't tell if that's a 32-bit or 64-bit value, but there's code which assumes it's an unsigned 32-bit integer: std::string SDMolSupplier::getItemText(unsigned int idx){ PRECONDITION(dp_inStream,"no stream"); unsigned int holder=d_last; moveTo(idx); unsigned int begP=d_molpos[idx]; unsigned int endP; try { My guess is that there's an overflow in this code, causing it to loop from 2**32 back to 0. Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss