On Mon, Nov 5, 2012 at 12:06 PM, David Boyle <[email protected]> wrote: > Proposed feature is to add an additional MARC tag and subfields for language > searching. > Currently, Evergreen only allows the language specified in MARC 008 position > 35-37 to be the target of a language search. > There is no current mechanism to allow for additional languages to be > included as search targets. > The proposed change would allow a configurable string to be set in the > database, specifying which subfields in MARC 041 records would be allowed as > additional search targets > > Blueprint: > https://blueprints.launchpad.net/evergreen/+spec/additional-language-search > > Evergreen Wiki: > http://www.open-ils.org/dokuwiki/doku.php?id=dev:proposal:additional_search_languages
Thanks for posting the idea! I ran across it over the weekend via the RSS feed for the wiki, so I had a chance to think about it before today ... First, I wonder if you've considered extending or expanding some of the more modern parts of the search/indexing machinery instead of using metabib.full_rec (MFR). There are some big-ish drawbacks to MFR (non-configurable normalization, huge size, loss of MARC-level field granularity, and most of all, performance (or, lack thereof)) that make it less than optimal for general use -- I'd personally lobby to see it go away entirely, if possible, but for its aid in troubleshooting and low-level data analysis -- and it's extremely MARC-centric, of course. In particular, for alternate ideas, I'd point you towards the Single Value Fields (config.record_attr_definition, metabib.record_attr, and friends) infrastructure as inspiration for a Multi-Value Fields implementation, to which the current language() (and item_lang()) filter could be moved. There would certainly be differences, of course -- data might be stored in a something other than HSTORE to avoid indexing and query complications, etc, or maybe not -- but the setup, extraction and normalization bits could be essentially the same as SFV. IOW, you'd benefit from code reuse. However, the bigger benefit to this route, IMO, is that it would be generalized and could help solve other outstanding issues. For instance, targeted indexing of other fields that are often singular, except when they're not (and that's when you care most about them, it seems), such as ISxN would be covered by this. That would make record import matching much faster (see the discussion of multi-value fields on the Launchpad bug at https://bugs.launchpad.net/evergreen/+bug/1024095 for some background), and is but one example of what a more generalized solution might be able to do. Thoughts? -- Mike Rylander | Director of Research and Development | Equinox Software, Inc. / Your Library's Guide to Open Source | phone: 1-877-OPEN-ILS (673-6457) | email: [email protected] | web: http://www.esilibrary.com
