Mike Gentrry -> I agree 10 is a small batch size, but I wanted to keep it
small to start to get a sense of if this was going to work. Given that it's
looking better and better that number is probably going to grow, maybe
substantially.

Lon - the problem is that the batch entries all fails together. Either all
10 go in or none go in.  So then I have to decide what to do.  The data in
this structure is pretty simple so the only thing that really can be a
problem is that the entry already exists in the new table. This is detected
by the hash value being identical. So the correct choice at that point is
to throw away the row you would be trying to insert, because it's a
duplicate.

The code to do this is pretty straightforward. I keep a list of the objects
that are going to be inserted as a result of the commit changes.

private static void cleanUpInserts(ObjectContext ctx, List<MyData>
currentEntries) {
   // clean the context of all the records
   ctx.rollbackChanges();
   // that's ok because we still have them in the array.
   for (MyData ent : currentEntries) {
      // for each entry first insert it back into the context.
      ctx.registerNewObject(ent);
      try {
         // now do the commit
         ctx.commitChanges();
      } catch (Exception e2) {
         // if it fails it's already in the table, so print the
receipt and forget it.
         System.out.println("Value that failed is: "+ent.getReceiptHash());
      }
   }
}


The only problem with this solution is that it is going to slow down
the process by taking one commit of ten objects and potentially
turning it into

10 commits of a single object. Of course the reason we're batching
them up and doing 10 inserts at a time is to speed up the process.

If there are a large number of duplicates (I'm reasonably certain
there are not.) then this is going to cause a real slow down of the
inserts.




On Wed, Jun 23, 2021 at 2:30 PM Lon Varscsak <lon.varsc...@gmail.com> wrote:

> You could commit your batch of 10 and then on error, check each object
> against the DB.  If the batch succeeds, there is nothing special to do.
> Assuming your "failure" batches are infrequent, then the time to check them
> won't be an issue.
>
> On Wed, Jun 23, 2021 at 9:54 AM Tony Giaccone <t...@giaccone.org> wrote:
>
> > Yeah, that's not a solution that's going to work. I need to move about a
> > million records from one database to a second, and the delay associated
> > with querying the database for every record would take a problem that's
> > already going to take more than a day to complete and turn it into
> several
> > days.  That's why I'm batching them into groups of 10 at a time.  The
> > problem is there are duplicate records, they hash to the same value, and
> > when that happens the insert on the other side is going to fail the Hash
> is
> > a way to identify uniqueness.   I'm sure you're thinking, well keep a
> > record of the hash values and only insert ones that you haven't seen. The
> > problem with this solution is that I have to be able to restart the
> > application and pick up where the process ended. If I had an in memory
> > cache of the hash values, on restart I'd have to read all the previously
> > transferred records and inflate the cache.  Another process that really
> is
> > going to be too odious and time consuming.
> >
> >
> >
> > On Wed, Jun 23, 2021 at 12:45 PM John Huss <johnth...@gmail.com> wrote:
> >
> > > I think it would be better to figure out the "problem" objects before
> > > committing by querying the DB and the object context.
> > >
> > > On Wed, Jun 23, 2021 at 9:47 AM Tony Giaccone <t...@giaccone.org>
> wrote:
> > >
> > > > I have a list of 10 new objects that I've inserted into the
> > objectcontex
> > > > and am about to do a commit changes on the object context.
> > > >
> > > > One, or more, of those entries violates a constraint and causes the
> > > commit
> > > > changes to throw an exception.
> > > >
> > > > Now most of them are  probably ok, so I want to make sure they get
> > > > inserted. How do I handle this?
> > > >
> > > > My first thought was to invalidate the 10 items.. then individually
> add
> > > > each one back into the context and do a commit changes after each
> add.
> > > Is
> > > > that a reasonable path? Obviously the one that failed before will
> fail
> > > > again, and then I can just log that, invalidate it again and keep
> > going.
> > > >
> > > > Is there a better faster way to do this?
> > > >
> > > >
> > > >
> > > > Tony Giaccone
> > > >
> > >
> >
>

Reply via email to