Hi Simon, I think I misunderstood something you said earlier because I thought you already had a "processed" flag you could query against. Given that you don't and I'm not sure why your performIteratedQuery() is failing, perhaps you could merge using data rows with paginated queries:
http://cayenne.apache.org/docs/3.0/data-rows.html http://cayenne.apache.org/docs/3.0/paginated-queries.html I suspect, however, this will not scale as much as you need (I think the paginated query will fetch in ~500k data rows still). You may end up having to do an SQLTemplate query and fetch only the primary keys (which is what a paginated query does), and then do a loop fetching batches of your records based upon the primary keys (using new DataContexts, of course). This is a bit more work, but shouldn't have issues. mrg On Mon, Dec 17, 2012 at 10:40 AM, Simon Schneider <[email protected]>wrote: > Hi Michael, > > I understand your approach of using a flag to identify already processed > objects. But introducing a flag or in my case another state just for > processing my records, was something I wanted to avoid. I thought that > Cayenne maybe has another way of fetching objects in a memory preserving > manner. Maybe some Iterator which on creation fetches the primary keys > only. And then while iterating, batches of data rows are fetched in the > background. > > Simon > > > Am 17.12.2012 um 15:50 schrieb Michael Gentry: > > > Hi Simon, > > > > I don't know why your performIteratedQuery() would fail with a heap > error. > > Based upon your answer to #2, it sounds like you can do a fetch limit on > > your query (call dataContext.setFetchLimit(limit) and do a normal > > performQuery() and you'll get back real Cayenne objects) and only pull > back > > 100 or 1000 records, process them (setting them to a different state), > then > > commit. Do this in a new DataContext each time so the GC can reclaim the > > memory. > > > > mrg > > > > > > > > On Mon, Dec 17, 2012 at 8:38 AM, Simon Schneider <[email protected] > >wrote: > > > >> Hi Michael, > >> > >> the problem is, that I do not even get an iterator because executing a > >> query like the following results in a Java Heap Space error: > >> > >> ResultIterator it = dataContext.performIteratedQuery(query); > >> > >> The answers to your questions are: > >> > >>> 1) How many records are you talking about? > >> It's about half a million records > >> > >>> 2) Are you updating your object with a flag/etc you can query on again > >> later (to exclude objects you've already processed)? > >> I already do exclude objects by setting them to a different state. But > it > >> may happen that I have to process half a million records despite of > this. > >> > >>> 3) What version of Cayenne are you using and what database? > >> Cayenne 3.0.2, Postgres 9.1 > >> > >>> 4) When you convert your Map (from the iterated query) into a > >> DataObject, are you creating a new DataContext or using the old one over > >> and over again? > >> At the moment I am using just one DataContext unregistering the > processed > >> objects. But as mentioned above execution does not even get to this > point. > >> > >> Simon > >> > >>> Hi Simon, some questions: > >>> > >>> 1) How many records are you talking about? > >>> 2) Are you updating your object with a flag/etc you can query on again > >> later (to exclude objects you've already processed)? > >>> 3) What version of Cayenne are you using and what database? > >>> 4) When you convert your Map (from the iterated query) into a > >> DataObject, are you creating a new DataContext or using the old one over > >> and over again? > >>> > >>> For #4, if you are using the same DataContext repeatedly, try changing > >> your logic to something more like: > >>> > >>> while (iterator.hasNextRow()) { > >>> DataContext context = DataContext.createDataContext(); > >>> Map row = (Map) iterator.nextRow(); > >>> CayenneObject object = (CayenneObject) > >> context.objectFromDataRow("CayenneObject", row); > >>> ... > >>> object.doStuff(); > >>> ... > >>> context.commitChanges(); > >>> } > >>> > >>> This way you won't build up a ton of objects in a single DataContext > and > >> possibly run out of memory. > >>> > >>> mrg > >> > >
