Quoting Joe Thornton <[email protected]>:

Three things:

--  The process I started three days ago to import 160,000 records using
the method on the Evergreen site is still running.

I found it took days to load a large number of bibliographic records using the regular tools, particularly if you're trying to load them all at once.

--  Maybe an unfair comparison, but we use VuFind as an alternative
interface to Horizon, and a full import of all 550k records takes about 45
minutes.

I suspect VuFind is not doing as much with the records as it imports them as Evergreen does.


--  It's surprising to me that there isn't a faster method. We're looking
seriously at Evergreen as a replacement for Horizon, but this would be a
problem. I'll try Dan's and then Jason's methods (again, thank you very
much) and hope that they're significantly faster. If I had the time and
ability (unfortunately I have neither) I'd take a shot at it myself.

As I recall on Horizon reindexing bib records for the OPAC was quite a slow process. We'd often use special scripts to do it in a temporary directory when doing a full index so that old indexes could be searched. Also, there was the auto indexer that had to be run to pick up changes in bibliographic records for opac searching. Evergreen has no separate processes for that, it is all handled in the database.

With Evergreen, I found some interesting things that would improve performance of large bib loads. These may be related to our hardware/network configuration but here they are:

1. Setting the ingest.metarecord_mapping.skip_on_insert, ingest.disable_authority_linking, ingest.assume_inserts_only, internal flags to enabled = TRUE helps. You'll find these in the config.internal_flag table.

2. Doing the load from a computer other than the database server seemed to be faster.

3. JDBC batch inserts seemed faster than doing the equivalent with Perl DBI.

4. Break the bib records up into batches of 10,000 records. (The Horizon bib export program has options that make this relatively easy.) Get the number of cores on your database server and subtract 1. Run that number of batches simultaneously into the database server. (The software that I shared the link of will help with doing this.)

It went from taking days to load our 900,000+ bib records to taking overnight with those changes.

HtH,
Jason


Thanks again.
Joe

Joe Thornton
Manager, Automation Services
Upper Hudson Library System
28 Essex Street
Albany, NY 12206
518-437-9880 x230



On Tue, Jun 4, 2013 at 3:26 PM, Joe Thornton <[email protected]>wrote:

I'm new to Evergreen and to this list so I apologize in advance if this
issue has been discussed already (I did look).

I installed Evergreen successfully on a test server with 16GB RAM and
about 200GB of disk -- in two partitions.

We have:

Debian 7
Postgres 9.1 (not on a remote server)
Evergreen 2.4

To migrate bib records from our SirsiDynix Horizon database I used this
document:
http://docs.evergreen-ils.org/2.4/_migrating_your_bibliographic_records.html

The process was interrupted a few times by serious errors, but eventually
I ended up with 550k bib records in the staging_records_import table.

The real problems started when I ran SELECT staging_importer();

The first time it stopped after many hours because it ran out of disk
space. Postgres was using the smaller partition for data so I changed it to
use the larger partition (~135GB) and restarted the job. This time it ran
over the weekend and then ran out of disk space again.

Although this seems very strange to me, I started it again and this time
the staging_records_import table has about 160k records in it.

I started SELECT staging_importer(); yesterday (about 24 hours ago) and
it's still running and has used more than 50GB of disk so far.

Am I missing a step (or steps), or is this normal?

Thanks,

Joe Thornton
Manager, Automation Services
Upper Hudson Library System
28 Essex Street
Albany, NY 12206
518-437-9880 x230





--
Jason Stephenson
Assistant Director for Technology Services
Merrimack Valley Library Consortium

Reply via email to