#852: BibIndex: centralise index configurations
-------------------------+----------------------
Reporter: simko | Owner: pglauner
Type: enhancement | Status: new
Priority: major | Milestone:
Component: BibIndex | Version:
Keywords: |
-------------------------+----------------------
Currently, Invenio indexes can be configured in several ways:
a. Some configurations are done runtime, per-index, in the index DB table
(idxINDEX), e.g. stemming language.
b. Some configurations are done in `invenio.conf` per-index, e.g. index-
time synonym lists (`CFG_BIBINDEX_SYNONYM_KBRS`).
c. Some configurations are done in `invenio.conf` globally for all
indexes, e.g. stop word lists (`CFG_BIBINDEX_REMOVE_STOPWORDS`,
`CFG_BIBINDEX_PATH_TO_STOPWORDS_FILE`).
d. Some configurations are hard coded in the source code, e.g. fuzzy
author name tokenizer (`BibIndexFuzzyNameTokenizer`) is for author indexes
via hard coded check for index name (e.g. `firstauthor`).
e. Some configurations are hard coded in the source code with arguments,
e.g. journal index uses `get_words_from_journal_tag()` with format
standardisation depending on `CFG_JOURNAL_PUBINFO_STANDARD_FORM`.
The goal of this ticket is to harmonise and centralise the configurations
into DB table to make all of the above features configurable per index.
This means, roughly speaking, to enlarge `idxINDEX` table with new columns
so that not only stemming, but also tokenizer method for words and
phrases, the stopword list, the synonym list, etc, could be defined at the
runtime by manipulating the DB table, without touching source code or
`invenio.conf`.
The BibIndex Admin interface should be enriched consequently.
The work will bring various refactoring tasks such as separation of
various `get_words_from_foo()` functions, taking advantage of
`pluginutils.py` library.
P.S. This ticket will have several sequels, e.g. about defining index type
(native Invenio, Solr, Xapian) or e.g. about defining virtual logical
fields that would gather information for indexing from non-MARC, non-
fulltext sources (e.g. from the cataloguer log tables). These will be
ticketised separately.
--
Ticket URL: <http://invenio-software.org/ticket/852>
Invenio <http://invenio-software.org>