Actually most (or all?) DBs that cayenne supports will do a batch bind for
multiple inserts, which is quite a bit faster than issuing separate sql
statements.

On Thu, Jun 24, 2021 at 10:14 AM Jurgen Doll <jur...@ivory.web.za> wrote:

> I'm not sure about this, but I don't think Cayenne does SQL batch insert
> statements - it does each insert (even in a batch) as single SQL insert
> statements. If this is the case then there's no real advantage, I think,
> to doing batch commits with Cayenne in this particular case (or am I
> mistaken?).
>
> Cayenne context creations are purposefully cheap; so for each insert get
> a
> new context, create a new object in that context, and commit. Surround
> with the usual try block, inside your loop, to catch the unique clashes
> and continue with the next record ....
>
> Just to be clear the select query of the data that's being processed is
> still in "batch" mode, like a 1000 as suggested.
>
>
> On Thu, 24 Jun 2021 16:35:46 +0200, Michael Gentry <blackn...@gmail.com>
> wrote:
>
> > 10 is a rather small batch size. I'd suggest at least 100, perhaps even
> > 1000, unless you are expecting tons of problem records.
> >
> > For your hashing concerns, could you maybe store that in an H2 DB you use
> > for this migration?
> >
> > On Wed, Jun 23, 2021 at 12:54 PM Tony Giaccone <t...@giaccone.org>
> wrote:
> >
> >> Yeah, that's not a solution that's going to work. I need to move about a
> >> million records from one database to a second, and the delay associated
> >> with querying the database for every record would take a problem that's
> >> already going to take more than a day to complete and turn it into
> >> several
> >> days.  That's why I'm batching them into groups of 10 at a time.  The
> >> problem is there are duplicate records, they hash to the same value, and
> >> when that happens the insert on the other side is going to fail the
> >> Hash is
> >> a way to identify uniqueness.   I'm sure you're thinking, well keep a
> >> record of the hash values and only insert ones that you haven't seen.
> >> The
> >> problem with this solution is that I have to be able to restart the
> >> application and pick up where the process ended. If I had an in memory
> >> cache of the hash values, on restart I'd have to read all the previously
> >> transferred records and inflate the cache.  Another process that
> really
> >> is
> >> going to be too odious and time consuming.
> >>
> >>
> >>
> >> On Wed, Jun 23, 2021 at 12:45 PM John Huss <johnth...@gmail.com> wrote:
> >>
> >> > I think it would be better to figure out the "problem" objects before
> >> > committing by querying the DB and the object context.
> >> >
> >> > On Wed, Jun 23, 2021 at 9:47 AM Tony Giaccone <t...@giaccone.org>
> >> wrote:
> >> >
> >> > > I have a list of 10 new objects that I've inserted into the
> >> objectcontex
> >> > > and am about to do a commit changes on the object context.
> >> > >
> >> > > One, or more, of those entries violates a constraint and causes the
> >> > commit
> >> > > changes to throw an exception.
> >> > >
> >> > > Now most of them are  probably ok, so I want to make sure they get
> >> > > inserted. How do I handle this?
> >> > >
> >> > > My first thought was to invalidate the 10 items.. then
> individually
> >> add
> >> > > each one back into the context and do a commit changes after each
> >> add.
> >> > Is
> >> > > that a reasonable path? Obviously the one that failed before will
> >> fail
> >> > > again, and then I can just log that, invalidate it again and keep
> >> going.
> >> > >
> >> > > Is there a better faster way to do this?
> >> > >
> >> > >
> >> > >
> >> > > Tony Giaccone
> >> > >
> >> >
>

Reply via email to