#537: More granular re-indexing tuning
--------------------------+-----------------------------------------------
  Reporter:  skaplun      |      Owner:  skaplun
      Type:  enhancement  |     Status:  assigned
  Priority:  major        |  Milestone:  v1.0
 Component:  BibIndex     |    Version:
Resolution:               |   Keywords:  idxPAIR idxPHRASE reindex upgrade
--------------------------+-----------------------------------------------
Changes (by simko):

 * status:  in_merge => assigned


Comment:

 1) Lazy creation of tables in case of upgrades is good, but it will be
 useful to keep table definitions in tabcreate still, be it for
 consistency with tabdrop and tabbibclean and friends.  IOW, I would
 look at this ticket as to easy the upgrades only, not to replace
 initial installations.  (Otherwise we should amend those tabdrop and
 tabbibclean by moving the logic to inveniocfg; but this is perhaps for
 later when we shall be moving also those DB upgrade statements from
 Makefile to inveniocfg.)

 If we keep idx table definitions in tabcreate, then in order not to
 have tables defined in two places, bibindex can look up what is the
 table definition for table 01 (either from tabcreate.sql that is
 installed in `/opt/invenio/lib/sql/invenio/` or even from live DB
 table instance) and replace 01 by 16, as appropriate.

 (If we allow lazy creation of tables, we may want to introduce a
 little checker in order to see if all tables are defined consistently.
 Again something for when we shall move more DB stuff from Makefile to
 inveniocfg, I guess.)

 2) This patch will do the job if the tables don't exist by the
 `task_get_option("modified")` versus `0000-00-00` trick, but this may
 not be wanted since it may trigger lengthy re-indexing in unwanted
 times, e.g. in case user wanted to continue with usual WORD,PHRASE
 indexing during the day but scheduled lengthy PAIR re-indexing for the
 night.  So I think it would be better to provide an explicit CLI
 option to re-index only certain index types (WORD, PAIR, PHRASE), just
 as was said in the ticket description.  Such an option may be useful
 anyway for forensics purposes, perhaps.  So instead of playing with
 altering of modified dates to `0000-00-00` we may want to enrich
 bibindex CLI API by something simple such as `-w INDEX.TYPE` so that
 `-w title.WORD` would process only WORD tables, while by default
 `-w title` would process all WORD/PAIR/PHRASE tables.

-- 
Ticket URL: <http://invenio-software.org/ticket/537#comment:4>
Invenio <http://invenio-software.org>

Reply via email to