On 04/25/2018 10:57 AM, Josh Stompro wrote:
> Hello, I just noticed that the auditor table trigger is disabled during
> the visibility update, so that answers that question, it is something
> that is done.  But it seem like it has a very small performance impact. 
> On my test system reingest of 150 record batches takes 2.5% longer when
> the auditor table trigger is enabled.  So my primary reason for doing it
> would just be to avoid audit table bloat.

I have a feeling that the columns you mentioned in your previous message
were simply not added to the auditor table by omission, i.e. the
developer simply forgot to add them. Looking at their purposes, the
vis_attr_vector may not be useful to log, but the merged fields probably
should be though I'm not sure that anything would get recorded, since a
record should not get updated after it is merged.

If you think it is an omission, you should probably open a Launchpad bug.

> 
>  
> 
> I’ve been trying out GNU parallel to run a bunch of ingest updates in
> parallel.  I am seeing collisions on inserting into
> metabib.browse_entry, when two different queries each contain the same
> entry and are both trying to insert a new row.  And I saw one deadlock
> detected also.  But the joblog feature seems like it will take care of
> that, once the bulk of the queries are done, it can just resume running
> all the failed queries and just run them serially to finish up.

The browse entries cannot be done in parallel because you may end up
trying to update the record for the same entry simultaneously. This is
more or less known in the developer community, and there isn't really a
good database side fix.

That said, I wrote a script, pingest.pl, to handle ingesting
bibliographic records in parallel. It is part of my evergreen_utilities
repository on github:

https://github.com/Dyrcona/evergreen_utilities

Here's a link to the code view of the pingest utility:

https://github.com/Dyrcona/evergreen_utilities/blob/master/perl/pingest.pl

HtH,
Jason

Reply via email to