Hehe, sorry- this particular class won't help you with 500 million rows :-). It starts out by reading all the primary keys in the table and creating an array of them.

But it should be simple to modify it to use a JDBC fetch for that first step - fetch the primary keys in batches.

Cheers,
- Hugi

// Hugi Thordarson
// http://hugi.karlmenn.is/


On 9.1.2009, at 22:24, Randy Wigginton wrote:

I've been using JDBC directly. I have a job that reads about 500M rows nightly, and that was the only way I could find to handle it.

I wish there were an "EOF-lite" for such operations. Sounds like you have a very helpful class.

On Fri, Jan 9, 2009 at 1:58 PM, Hugi Thordarson <[email protected]> wrote:
Good evening folks!

The databases I'm responsible for contain a lot of data and I find myself frequently needing to resort to boring stuff like raw row fetching to create large reports or otherwise handle a lot of data. But sometimes, even that isn't enough - an array of ten million items is difficult for any application to handle, even though the ten million objects are just NSDictionaries/raw rows. Besides - working with raw rows is no fun. I'm spoiled by years of EOF-y goodness.

So, yesterday I wrote the attached class to handle massive amounts of data. It is by no means perfect - if you have a table of ten million rows, the primary keys for these rows are all fetched from the DB, creating quite an array (if anyone has a solution for that, I'd *love* to hear it).

It exports an entire table of roughly 2.000.000 rows from a 10 column DB table (creating a 500MB text file) in roughly four minutes on my MacBook Pro using a heap size of 400M. And this is an example of how you use it (the implementation of KMExportOperation is left as an exercise ;-):

        public WOActionResults batchFetchAction() {
                EOEditingContext ec = ERXEC.newEditingContext();
KMMassiveOperation.Operation operation = new KMExportOperation( "/ tmp/exported.csv", "\t", "\n", "UTF-8" ); KMMassiveOperation.start( ec, SomeEntity.class, null, null, operation );
                return new WOResponse();
        }

Anyway, I would love to hear how other folks are handling huge datasets. I would love fedback on the technique I'm using, and ieas for improvement would be great. Just about the only idea I'm not open to is "just use JDBC" ;-). I've been there and I don't want to be there. That's why I'm using EOF :-).

Cheers,
- Hugi

// Hugi Thordarson
// http://hugi.karlmenn.is/



 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-dev/cawineguy%40gmail.com

This email sent to [email protected]


 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com

This email sent to [email protected]

Reply via email to