On Wed, Sep 25, 2013 at 5:51 PM, Scott Myers <[email protected]> wrote: > Mike, > > The multithreaded reingest project was shared during the hackathon at the > last evergreen conference. >
Thanks, Scott. I'd like to have responded on this sooner, but I was sick for a few days, and then it was "dig out of email overload" for the last couple. > Here is a link to what we ended up running for moving KCLS from 2.1 to 2.2. > > https://github.com/CatalystIT/multithread_2_2_update > > The files to pay attention to are the data_update_driver.pl and the > update_driver.pl both have pod files attached with quite a few comments on > how they work. > > If I can clear up what that means basically we created driver files that > divide large amounts of data into smaller chunks and run those on multiple > connections for cpu bound updates. A good example is the 2.1->2.2 which had > changes in how the data was stored in the metabib field entry tables. This > was a very CPU bound update and ended up being run with 32 simultaneous > connections to reduce the amount of estimated time from 5 days to complete in > 4 hours. > So, if I'm following the code correctly, the idea is to generate a huge SQL script that contains an update statement for each non-deleted bib record, and use this tool to split that script into several that can run in parallel. That's a good goal, and this helpfully codifies the advice that's generally given for migrations and upgrade reingests, though I personally usually just use and recommend the unix split command and a set of psql sessions inside screen. If I'm not following the code as you intend, please let me know. However, there's a caveat to using this technique, generally, for 2.3 and beyond. Because of the browse indexing (and more specifically, the unique requirement on browse entries) done now, parallelizing becomes a bit harder. It's still possible, mind you, but you need to take certain (new) steps before reingesting, and then perform a final post-reingest run to handle the browse data. Just a head's up that you may run into forced serialization if you use your script, or the split+psql method, for future post-upgrade reingests. -- Mike Rylander | Director of Research and Development | Equinox Software, Inc. / Your Library's Guide to Open Source | phone: 1-877-OPEN-ILS (673-6457) | email: [email protected] | web: http://www.esilibrary.com > Let me know if you have questions on how this can be setup or run. > > Thanks > > Scott Myers > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Mike > Rylander > Sent: Wednesday, September 25, 2013 1:41 PM > To: Evergreen Discussion Group > Subject: Re: [OPEN-ILS-GENERAL] Evergreen & Software Performance Analysis > > Scott, > > I echo Rogan's down-thread thanks for following up here. > > I'm curious where the multi-threaded reingest project is shared. I can't > find anything like that searching any of the Evergreen the mailing lists or > launchpad for terms like "ingest" and "multi". > Perhaps I'm just missing it. Some interest was expressed in the community > IRC channel, but also some confusion as to what exactly that means. > > TIA, > > -- > Mike Rylander > | Director of Research and Development > | Equinox Software, Inc. / Your Library's Guide to Open Source | phone: > 1-877-OPEN-ILS (673-6457) | email: [email protected] | web: > http://www.esilibrary.com > > > On Wed, Sep 25, 2013 at 3:50 PM, Scott Myers > <[email protected]> wrote: >> Hi Rogan, >> >> >> >> The db work Command Prompt has done for KCLS is mostly configuration things, >> work mem, max connections, etc. They have been fine tuning all those >> settings to get the best performance. These settings wouldn't help other >> people as it would be dependent on each libraries load. Another change made >> by Command Prompt was to remove slony replication and move to pgpool. If >> anyone needs help doing the same with their database I would highly >> recommend Command Prompt. >> >> >> >> As for work done by Catalyst, all work that is directly applicable and >> beneficial to the community has been added. Kyle Tomita >> https://launchpad.net/~tomitakyle and Fred Parks >> https://launchpad.net/~fparks have been the most active community members >> from our team with Kyle being the 9th on the top contributors list as of >> 9/24/13. >> >> >> >> Catalyst also shared a multithreaded bib reingest that greatly reduces the >> time needed to do a full reingest. We also plan to share the way that >> Catalyst deploys code to KCLS without downtime. >> >> >> >> Catalyst considers itself part of the community and is actively working to >> add more value. We have developed a strong relationship with KCLS and enjoy >> working with them greatly and our relationship has allowed us to gain a >> strong understanding of Evergreen. We've got some interesting work that we >> are going to be doing in the near future for KCLS, and as we have in the >> past, that which is beneficial to the community will be shared. >> >> >> >> If you would like detail on any of these items now, feel free to reach out >> to me. You have my cell phone number. >> >> >> >> Thanks >> >> >> >> Scott Myers >> >> >> >> >> >> From: [email protected] >> [mailto:[email protected]] On Behalf Of >> Rogan Hamby >> Sent: Tuesday, September 24, 2013 7:10 AM >> To: Joshua D. Drake >> Cc: Evergreen Discussion Group >> Subject: Re: [OPEN-ILS-GENERAL] Evergreen & Software Performance Analysis >> >> >> >> Picking back up an old thread... >> >> >> >> I was hoping at some point to hear more about the db work Command Prompt has >> done for KCLS and perhaps see some work in git. I was sad to see that in the >> new LJ article that Jed Moffitt said that at this point KCLS has forked >> Evergreen so I suppose the work Catalyst and Command Prompt has done isn't >> relevant to the rest of the Evergreen community. I suppose that also means >> that any experience gained in working on the KCLS system isn't >> transferrable. >> >> >> >> >> >> >> >> >> >> >> >> On Thu, Aug 22, 2013 at 11:05 AM, Rogan Hamby <[email protected]> >> wrote: >> >> Hi Joshua, >> >> >> >> I don't know if you had a chance to see my message below so I'll copy you in >> directly as well and maybe touch base again after labor day. With the >> Evergreen community having a rich collection of input from various >> contributors (many like yourself paid to do individual development by >> community members) all participating in the open source spirit and putting >> their code out there, allowing others to build on top of it or modify it or >> package it into master it would be exciting to see this work since you've >> indicated it's had a big impact for your customers. >> >> >> >> I did a quick mark mail search since I sometimes lose emails to spam filters >> and noticed that back in Feb you mentioned that your Evergreen customer has >> been KCLS. I know that at the conference they talked about setting up a >> public repo that would be available right after the conference. Maybe they >> can chime in on an update on that? >> >> >> >> >> >> On Fri, Aug 9, 2013 at 11:52 AM, Rogan Hamby <[email protected]> >> wrote: >> >> HI Josh, >> >> >> >> Can you share with folks some more specifics? >> >> >> >> For example: >> >> >> >> In regards to optimizing the conf file can you share what kind of >> optimizations and the benchmarks? E.g. with X records we see Y performance >> in activity Z. >> >> >> >> A lot of other changes obviously touch on changes to code and/or schema >> changes. Are these going to be released on a public repo or fed back into >> master? >> >> >> >> >> >> >> >> >> >> On Thu, Aug 8, 2013 at 2:01 PM, Joshua D. Drake <[email protected]> >> wrote: >> >> >> On 08/07/2013 10:12 AM, Rogan Hamby wrote: >> >> I'm guessing maybe Joshua doesn't keep track of the list serv but is >> there someone else from Command Prompt or whomever they did the >> development work for that could chime in? When he says they've made >> improvements do those include GPLed code? >> >> >> >> Sorry folks, I do watch this list but not as much as the postgresql lists. >> We have also been very busy. Here are some of the basic things we have done: >> >> 1. Optimized the postgresql.conf, it is amazing how much you can get from >> some minor tweaks after some performance analysis. >> >> 2. Converted some of the procedures to C, for example translate_isbn1013 >> >> 3. Modified the holds process to use a look up table. >> >> 4. Changed the process for holds so they don't indefinitely exist but get >> migrated out for reporting but does not affect performance of the active >> table. >> >> 5. Partitioning of larger tables >> >> 6. Upgraded versions of PostgreSQL to more modern versions (this can also >> result in noticeable gains in performance). >> >> 7. Lots of query tuning, adding indexes where appropriate, increasing >> maintenance on particular tables to reduce bloat more aggressively etc... >> >> As well as various other things (stabilizing the system so there isn't weird >> overloads, unexpected apache load events etc..). It certainly has been a >> rather wild ride over the last 9 months as we get further and further into >> the adventure that is the Evergreen software. >> >> Sincerely, >> >> Joshua D. Drake >> >> >> >> >> -- >> Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 >> >> >> PostgreSQL Support, Training, Professional Services and Development >> >> High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc >> For my dreams of your image that blossoms >> a rose in the deeps of my heart. - W.B. Yeats >> >> >> >> >> >> -- >> >> >> >> Rogan Hamby, MLS, CCNP, MIA >> >> Managers Headquarters Library and Reference Services, >> >> York County Library System >> >> >> >> "You can never get a cup of tea large enough or a book long enough to suit >> me." >> -- C.S. Lewis >> >> >> >> >> >> -- >> >> >> >> Rogan Hamby, MLS, CCNP, MIA >> >> Managers Headquarters Library and Reference Services, >> >> York County Library System >> >> >> >> "You can never get a cup of tea large enough or a book long enough to suit >> me." >> -- C.S. Lewis >> >> >> >> >> >> -- >> >> >> >> Rogan Hamby, MLS, CCNP, MIA >> >> Managers Headquarters Library and Reference Services, >> >> York County Library System >> >> >> >> "You can never get a cup of tea large enough or a book long enough to suit >> me." >> -- C.S. Lewis
