#537: More granular re-indexing tuning
--------------------------+-----------------------------------------------
Reporter: skaplun | Owner: skaplun
Type: enhancement | Status: assigned
Priority: major | Milestone: v1.0
Component: BibIndex | Version:
Resolution: | Keywords: idxPAIR idxPHRASE reindex upgrade
--------------------------+-----------------------------------------------
Changes (by simko):
* status: in_merge => assigned
Comment:
1) Lazy creation of tables in case of upgrades is good, but it will be
useful to keep table definitions in tabcreate still, be it for
consistency with tabdrop and tabbibclean and friends. IOW, I would
look at this ticket as to easy the upgrades only, not to replace
initial installations. (Otherwise we should amend those tabdrop and
tabbibclean by moving the logic to inveniocfg; but this is perhaps for
later when we shall be moving also those DB upgrade statements from
Makefile to inveniocfg.)
If we keep idx table definitions in tabcreate, then in order not to
have tables defined in two places, bibindex can look up what is the
table definition for table 01 (either from tabcreate.sql that is
installed in `/opt/invenio/lib/sql/invenio/` or even from live DB
table instance) and replace 01 by 16, as appropriate.
(If we allow lazy creation of tables, we may want to introduce a
little checker in order to see if all tables are defined consistently.
Again something for when we shall move more DB stuff from Makefile to
inveniocfg, I guess.)
2) This patch will do the job if the tables don't exist by the
`task_get_option("modified")` versus `0000-00-00` trick, but this may
not be wanted since it may trigger lengthy re-indexing in unwanted
times, e.g. in case user wanted to continue with usual WORD,PHRASE
indexing during the day but scheduled lengthy PAIR re-indexing for the
night. So I think it would be better to provide an explicit CLI
option to re-index only certain index types (WORD, PAIR, PHRASE), just
as was said in the ticket description. Such an option may be useful
anyway for forensics purposes, perhaps. So instead of playing with
altering of modified dates to `0000-00-00` we may want to enrich
bibindex CLI API by something simple such as `-w INDEX.TYPE` so that
`-w title.WORD` would process only WORD tables, while by default
`-w title` would process all WORD/PAIR/PHRASE tables.
--
Ticket URL: <http://invenio-software.org/ticket/537#comment:4>
Invenio <http://invenio-software.org>