Actually most (or all?) DBs that cayenne supports will do a batch bind for multiple inserts, which is quite a bit faster than issuing separate sql statements.
On Thu, Jun 24, 2021 at 10:14 AM Jurgen Doll <jur...@ivory.web.za> wrote: > I'm not sure about this, but I don't think Cayenne does SQL batch insert > statements - it does each insert (even in a batch) as single SQL insert > statements. If this is the case then there's no real advantage, I think, > to doing batch commits with Cayenne in this particular case (or am I > mistaken?). > > Cayenne context creations are purposefully cheap; so for each insert get > a > new context, create a new object in that context, and commit. Surround > with the usual try block, inside your loop, to catch the unique clashes > and continue with the next record .... > > Just to be clear the select query of the data that's being processed is > still in "batch" mode, like a 1000 as suggested. > > > On Thu, 24 Jun 2021 16:35:46 +0200, Michael Gentry <blackn...@gmail.com> > wrote: > > > 10 is a rather small batch size. I'd suggest at least 100, perhaps even > > 1000, unless you are expecting tons of problem records. > > > > For your hashing concerns, could you maybe store that in an H2 DB you use > > for this migration? > > > > On Wed, Jun 23, 2021 at 12:54 PM Tony Giaccone <t...@giaccone.org> > wrote: > > > >> Yeah, that's not a solution that's going to work. I need to move about a > >> million records from one database to a second, and the delay associated > >> with querying the database for every record would take a problem that's > >> already going to take more than a day to complete and turn it into > >> several > >> days. That's why I'm batching them into groups of 10 at a time. The > >> problem is there are duplicate records, they hash to the same value, and > >> when that happens the insert on the other side is going to fail the > >> Hash is > >> a way to identify uniqueness. I'm sure you're thinking, well keep a > >> record of the hash values and only insert ones that you haven't seen. > >> The > >> problem with this solution is that I have to be able to restart the > >> application and pick up where the process ended. If I had an in memory > >> cache of the hash values, on restart I'd have to read all the previously > >> transferred records and inflate the cache. Another process that > really > >> is > >> going to be too odious and time consuming. > >> > >> > >> > >> On Wed, Jun 23, 2021 at 12:45 PM John Huss <johnth...@gmail.com> wrote: > >> > >> > I think it would be better to figure out the "problem" objects before > >> > committing by querying the DB and the object context. > >> > > >> > On Wed, Jun 23, 2021 at 9:47 AM Tony Giaccone <t...@giaccone.org> > >> wrote: > >> > > >> > > I have a list of 10 new objects that I've inserted into the > >> objectcontex > >> > > and am about to do a commit changes on the object context. > >> > > > >> > > One, or more, of those entries violates a constraint and causes the > >> > commit > >> > > changes to throw an exception. > >> > > > >> > > Now most of them are probably ok, so I want to make sure they get > >> > > inserted. How do I handle this? > >> > > > >> > > My first thought was to invalidate the 10 items.. then > individually > >> add > >> > > each one back into the context and do a commit changes after each > >> add. > >> > Is > >> > > that a reasonable path? Obviously the one that failed before will > >> fail > >> > > again, and then I can just log that, invalidate it again and keep > >> going. > >> > > > >> > > Is there a better faster way to do this? > >> > > > >> > > > >> > > > >> > > Tony Giaccone > >> > > > >> > >