Hehe, sorry- this particular class won't help you with 500 million
rows :-). It starts out by reading all the primary keys in the table
and creating an array of them.
But it should be simple to modify it to use a JDBC fetch for that
first step - fetch the primary keys in batches.
Cheers,
- Hugi
// Hugi Thordarson
// http://hugi.karlmenn.is/
On 9.1.2009, at 22:24, Randy Wigginton wrote:
I've been using JDBC directly. I have a job that reads about 500M
rows nightly, and that was the only way I could find to handle it.
I wish there were an "EOF-lite" for such operations. Sounds like
you have a very helpful class.
On Fri, Jan 9, 2009 at 1:58 PM, Hugi Thordarson <[email protected]>
wrote:
Good evening folks!
The databases I'm responsible for contain a lot of data and I find
myself frequently needing to resort to boring stuff like raw row
fetching to create large reports or otherwise handle a lot of data.
But sometimes, even that isn't enough - an array of ten million
items is difficult for any application to handle, even though the
ten million objects are just NSDictionaries/raw rows. Besides -
working with raw rows is no fun. I'm spoiled by years of EOF-y
goodness.
So, yesterday I wrote the attached class to handle massive amounts
of data. It is by no means perfect - if you have a table of ten
million rows, the primary keys for these rows are all fetched from
the DB, creating quite an array (if anyone has a solution for that,
I'd *love* to hear it).
It exports an entire table of roughly 2.000.000 rows from a 10
column DB table (creating a 500MB text file) in roughly four minutes
on my MacBook Pro using a heap size of 400M. And this is an example
of how you use it (the implementation of KMExportOperation is left
as an exercise ;-):
public WOActionResults batchFetchAction() {
EOEditingContext ec = ERXEC.newEditingContext();
KMMassiveOperation.Operation operation = new KMExportOperation( "/
tmp/exported.csv", "\t", "\n", "UTF-8" );
KMMassiveOperation.start( ec, SomeEntity.class, null, null,
operation );
return new WOResponse();
}
Anyway, I would love to hear how other folks are handling huge
datasets. I would love fedback on the technique I'm using, and ieas
for improvement would be great. Just about the only idea I'm not
open to is "just use JDBC" ;-). I've been there and I don't want to
be there. That's why I'm using EOF :-).
Cheers,
- Hugi
// Hugi Thordarson
// http://hugi.karlmenn.is/
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list ([email protected])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-dev/cawineguy%40gmail.com
This email sent to [email protected]
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list ([email protected])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com
This email sent to [email protected]