Hi, For the database case, what we could do is return an iterator that internally chunks, that is:
1) run the query "select * from datastore where id > ? limit 10000 order by id" 2) if the results set is empty, then iterator.hasNext is false 3) else read this in memory and close the connection 4) return the data from the in-memory buffer 5) if the in-memory buffer is empty, start with 1) replacing ? with the latest id This might also make sense for the MongoDB case. It's not very different from what we do now, but the chunking logic would be on the DocumentStore side, not on the NodeStore side. So chunking or not would be an implementation detail of the DocumentStore. Regards, Thomas On 20/03/14 08:58, "Julian Reschke" <[email protected]> wrote: >On 2014-03-20 08:44, Marcel Reutegger wrote: >> Hi, >> >>> I noticed that the BlobStore API recently acquired a similar interface >>> through GarbageCollectableBlobStore. >>> >>> The current impl in RDBBlobStore just returns an iterator that wraps a >>> result set, which works right now as the RDBBlobStore keeps holding the >>> Connection. >>> >>> I was planning to change the impl to actually use a connection pool, in >>> which case the iterator will need to hold the connection (*), in which >>> case we'll have to make sure that it's clear when the iterator isn't >>> needed anymore. That is: can I rely on it always being read to the >>>end??? >> >> no, I don't think you can do that. there is no guarantee this will >>always >> happen. >> >> I think the best you can do with implementations that require an >> explicit release of resource is to perform some kind of batch loading >> ever N items with increasing offset. this is actually what the MongoDB >> Java driver does under the hood. > >Oops. That does mean though that the result might be inaccurate, right? > >Or can we change the contract so that we can rely on the iterator being >released? > >Best regards, Julian >
