#852: BibIndex: centralise index configurations
-------------------------+----------------------
 Reporter:  simko        |      Owner:  pglauner
     Type:  enhancement  |     Status:  new
 Priority:  major        |  Milestone:
Component:  BibIndex     |    Version:
 Keywords:               |
-------------------------+----------------------
 Currently, Invenio indexes can be configured in several ways:

  a. Some configurations are done runtime, per-index, in the index DB table
 (idxINDEX), e.g. stemming language.

  b. Some configurations are done in `invenio.conf` per-index, e.g. index-
 time synonym lists (`CFG_BIBINDEX_SYNONYM_KBRS`).

  c. Some configurations are done in `invenio.conf` globally for all
 indexes, e.g. stop word lists (`CFG_BIBINDEX_REMOVE_STOPWORDS`,
 `CFG_BIBINDEX_PATH_TO_STOPWORDS_FILE`).

  d. Some configurations are hard coded in the source code, e.g. fuzzy
 author name tokenizer (`BibIndexFuzzyNameTokenizer`) is for author indexes
 via hard coded check for index name (e.g. `firstauthor`).

  e. Some configurations are hard coded in the source code with arguments,
 e.g. journal index uses `get_words_from_journal_tag()` with format
 standardisation depending on `CFG_JOURNAL_PUBINFO_STANDARD_FORM`.

 The goal of this ticket is to harmonise and centralise the configurations
 into DB table to make all of the above features configurable per index.
 This means, roughly speaking, to enlarge `idxINDEX` table with new columns
 so that not only stemming, but also tokenizer method for words and
 phrases, the stopword list, the synonym list, etc, could be defined at the
 runtime by manipulating the DB table, without touching source code or
 `invenio.conf`.

 The BibIndex Admin interface should be enriched consequently.

 The work will bring various refactoring tasks such as separation of
 various `get_words_from_foo()` functions, taking advantage of
 `pluginutils.py` library.

 P.S. This ticket will have several sequels, e.g. about defining index type
 (native Invenio, Solr, Xapian) or e.g. about defining virtual logical
 fields that would gather information for indexing from non-MARC, non-
 fulltext sources (e.g. from the cataloguer log tables). These will be
 ticketised separately.

-- 
Ticket URL: <http://invenio-software.org/ticket/852>
Invenio <http://invenio-software.org>

Reply via email to