Andrew,
In thinking about this, an unsigned 32-bit integer should give me over 4
billion values, and a signed 32-bit gives 2 billion. I know that the file
has slightly over 5 million structures and ~300 million lines. Neither of
these is over the limit, so I wouldn't expect an overflow.
-Kirk
On Mon, Nov 21, 2011 at 12:22 PM, Robert DeLisle <rkdeli...@gmail.com>wrote:
> Andrew,
>
> Good catch! I had wondered if there might be a size problem but couldn't
> make the connection that you made. I'll find another method to partition
> the file.
>
> -Kirk
>
>
>
>
> On Mon, Nov 21, 2011 at 12:01 PM, Andrew Dalke
> <da...@dalkescientific.com>wrote:
>
>> On Nov 21, 2011, at 7:47 PM, Robert DeLisle wrote:
>> > Getting the file to you might be a trick as it is over 4 GB compressed.
>>
>> I think that's a clue.
>>
>> RDKit uses tell/seek operations on the underlying file stream, like this:
>>
>>
>> ROMol *SDMolSupplier::next() {
>> PRECONDITION(dp_inStream,"no stream");
>> // set the stream to the current position
>> dp_inStream->seekg(d_molpos[d_last]);
>>
>>
>> d_molpos contains "std::streampos" elements,
>>
>> MolSupplier.h: std::vector<std::streampos> d_molpos; // vector of
>> positions in the file for molecules
>>
>>
>> and I can't tell if that's a 32-bit or 64-bit value, but there's
>> code which assumes it's an unsigned 32-bit integer:
>>
>> std::string SDMolSupplier::getItemText(unsigned int idx){
>> PRECONDITION(dp_inStream,"no stream");
>> unsigned int holder=d_last;
>> moveTo(idx);
>> unsigned int begP=d_molpos[idx];
>> unsigned int endP;
>> try {
>>
>>
>> My guess is that there's an overflow in this code, causing it to
>> loop from 2**32 back to 0.
>>
>>
>> Andrew
>> da...@dalkescientific.com
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure
>> contains a definitive record of customers, application performance,
>> security threats, fraudulent activity, and more. Splunk takes this
>> data and makes sense of it. IT sense. And common sense.
>> http://p.sf.net/sfu/splunk-novd2d
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss