Hello Hugi; "KMMassiveOperation" <--- that's a really great class name.
What I do is to pull GID's or PK's for all of the objects involved using raw rows. As you say, sometimes the data sets are large enough that I have to further subdivide them on a domain-specific basis, but I won't complicate matters further... In any case, I get lots of GID's or PK's. Then I batch them up into (for example) lots of 100 or so and then farm the work-load out over JMS (more recently using JSON- RPC through a "JMS adaptor") so that the processing is able to run concurrently over a number of instances on a number of hosts. The number of instances involved increases the concurrency and hence the pressure on the database system. In the case of writing out CSV or Excel-readable XML files, I push the results from the workers into a "BLOB stream" -- effectively just a series of BLOBs that make up one long piece of contiguous data. The control and monitoring systems for all this are quite complex, but it does work well and I can do it all in EOF without resorting to SQL.
cheers.
Anyway, I would love to hear how other folks are handling huge datasets. I would love fedback on the technique I'm using, and ieas for improvement would be great. Just about the only idea I'm not open to is "just use JDBC" ;-). I've been there and I don't want to be there. That's why I'm using EOF :-).
___ Andrew Lindesay www.lindesay.co.nz _______________________________________________ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list ([email protected]) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [email protected]
