In my testing, PG 9.0 works fine with Evergreen 2.0. I don't know of any production sites using that config, though.
--miker On Feb 9, 2011 4:38 PM, "Duimovich, George" < [email protected]> wrote: > Hello Mike, > > Just related follow-up question. I noted that the install notes for 2.0 state "PostgreSQL 8.4 is the minimum supported version." > > Is PostgreSQL 9.x officially supported in EG 2.0 - or is the 9.x version for dev/testing situations only. > > Thanks, > George Duimovich > NRCan Library > > -----Original Message----- > From: [email protected] [mailto: [email protected]] On Behalf Of Mike Rylander > Sent: February 9, 2011 15:22 > To: Evergreen Development Discussion List > Subject: Re: [OPEN-ILS-DEV] Indexing/Search functionality (was: Planning for Evergreen development, post 2.0) > > So, this is a big change, and thus I'm posting the patch here first both to solicit feedback on the direction and as a poke to those who may be interested but might have forgotten about this over the last several weeks. > > This is phase 1, wherein I have make the db changes required for SVF and brought QueryParser.pm in line with those changes. This requires Postgres 9.0+ and the hstore contrib module (along with everything else the Evergreen needs) in order to work. I'm continuing to move forward, but again, feedback is welcome! > > --miker > > On Fri, Jan 21, 2011 at 12:20 PM, Mike Rylander <[email protected]> wrote: >> Currently in Evergreen there are four different indexed bibliographic >> data storage mechanisms, each of which targets a different set of data >> and query use cases, and which carry their own caveats regarding >> application outside the designed use cases: >> >> 1) Base Search (metabib.keyword_field_entry and friends) >> * Use: general full-text indexing; query result relevance ranking >> * Caveat: inefficient for low-cardinality values (many records >> containing indexable data for an index definition, few unique values >> for that index defintion) >> 2) Full Record (metabib.real_full_rec) >> * Use: sorting; reporting; base data for control field indexing; >> direct MARC field search >> * Caveat: inefficient for general searching as the format is too >> close to MARC >> 3) Facets (metabib.facet_entry) >> * Use: exact match search; post-search result refining; browse >> * Caveat: expensive as a filter on low-cardinality facets (many >> records, few unique values) >> 4) Control Fields (metabib.rec_descriptor) >> * Use: storage for standard-based, single value record attributes >> (fixed fields, physical characteristics, etc); sorting (date1, etc); >> filtering (type, form, audience, VR format, etc); reporting; record >> analysis; low-cardinality search for known (controlled) values; search >> weighting (language) >> * Caveat: extension beyond standard is entirely out of scope; >> extension within standard is often prohibitively expensive, requiring >> schema, trigger, code and configuration changes >> >> Reading carefully, one will notice that what is lacking is a way to >> define and index general, user-defined single value fields. That is, >> something akin to (4) which works well for low-cardinality attributes, >> but is extensible in a manner similar to (1) using a definition table. >> I call this concept, including the indexing, search, filtering and >> maintenance mechanisms, Single Value Fields or SVF. >> >> Benefits provided by such a mechanism would be wide ranging. >> * Some access points that might currently be implemented as facets >> because of the exact-match propert, would be both faster and more >> flexible as an SVF >> * Other data used for sorting could be moved to an SVF to make such >> sorting more memory and time efficient >> * Arbitrary user-defined fields from within a record could be indexed >> and automatically exposed for use in the OPAC and staff client >> >> As a secondary effect, (4) becomes a strict subset of SVF and can be >> reimplemented as such. This too has some very attractive benefits. >> >> On the input side, the direct benefits of folding (4) and SVF together >> are a unification of configuration APIs and interfaces, a reduction in >> the complexity (and therefore maintenance cost) of the ingest code >> that extracts values from bib records, and the elimination of the cost >> to extend (4) beyond what exists already to any standard MARC control >> or fixed field. >> >> On the output side, the benefit from this will be to reduce the cost >> of searches involving both (current-style) Control Fields (4) and >> SVF-optimizable components by folding them into one mechanism. This >> reduces cost by eliminating one or more SQL JOINs, as well as taking >> advantage a unified SVF index on all attributes for a record. Thus, >> instead of one JOIN for metabib.rec_descriptor and a separate JOIN for >> each SVF-type facet used in a query, all would be replaced by a single >> JOIN to the SVF data table. This table will be approximately the same >> size as the metabib.rec_descriptor table (similar I/O profile and >> cachedness) while providing expanded functionality. >> >> SVF will also include a display translation mechanism for all fields >> indexed. This means coded values can be displayed using >> human-friendly strings in any language, just as I18N-enabled MARC >> coded fields do today when stored in metabib.full_rec. This is >> something that facets cannot do natively, and will not be able to do >> effectively -- because facet values are uncontrolled, I18N is outside >> the design scope. >> >> Put another way, the cost of using (4) today is essentially already >> paid by the use of metabib.rec_descriptor -- this, or an analog such >> as SVF must exist. The cost of SVF replacing (4) would be comparable, >> and in some cases lower (faster). On the other hand, the cost of >> using Facets (3) to simulate SVF-optimizable use cases for access >> points outside the Facet design constraints can be extremely high, >> depending on the cardinality of the facet. A good example of this is >> the use of a local Material Type value. Imagine a facet with 20 or so >> unique values, but where one of these, such as "book", is used in more >> than 3/4 of the bibliographic dataset. In practice, when Facets (3) >> are used this way it is has been identified as the #2 cause of slow >> searches. >> >> [NOTE: the #1 cause has been identified as the cost of "relevance >> bumps". Research is underway to evaluate the efficacy of replacing >> the ranking function (using rank_cd() instead of rank()) to address >> this. In previous version of Postgres, rank_cd() came at a >> significant cost, but this may not be the case today.] >> >> Now, the implementation of SVF and folding in of (4) is not without >> costs. The largest of these is the need to push the required Postgres >> version to 9.0. Pg 9.0 is considered stable and is very well >> supported by the Pg community and third party Pg support companies. >> The upgrade for existing production Evergreen sites is not terribly >> complicated, but non-trivial. >> >> The reason for this Postgres upgrade requirement is that the >> underlying datatypes have become more featureful and mature in 9.0, >> and the techniques I plan to use simply aren't possible in 8.4. >> >> Attached you will find my current design document which covers much of >> what is discussed above along with a basic implementation plan and >> example use-cases for each component. There are details not included, >> such as appropriate table constraints on configuration tables, but the >> meat is there and I would welcome feedback and input! >> >> -- >> Mike Rylander >> | VP, Research and Design >> | Equinox Software, Inc. / The Evergreen Experts >> | phone: 1-877-OPEN-ILS (673-6457) >> | email: [email protected] >> | web: http://www.esilibrary.com >> > > > > -- > Mike Rylander > | VP, Research and Design > | Equinox Software, Inc. / The Evergreen Experts > | phone: 1-877-OPEN-ILS (673-6457) > | email: [email protected] > | web: http://www.esilibrary.com
