This is an automated notification sent by LCG Savannah.
It relates to:
task #8353, project CDS Invenio
==============================================================================
OVERVIEW of task #8353:
==============================================================================
URL:
<http://savannah.cern.ch/task/?8353>
Summary: BibIndex enhancements
Project: CDS Invenio
Submitted by: skaplun
Submitted on: 2008-11-05 16:45
Should Start On: 2008-11-05 00:00
Should be Finished on: 2008-11-05 00:00
Category: BibIndex
Priority: 3 - Low
Status: None
Privacy: Public
Percent Complete: 0%
Assigned to: skaplun
Open/Closed: Open
Discussion Lock: Any
Effort: 0.00
_______________________________________________________
This is a proposal for enhancing BibIndex.
* three kind of tables per index:
* idxWORD with single words (optionally *cleaned*)
* idxPAIR with pair of words (optionally *cleaned*) for partial phrase
searching
* idxPHRASE for full phrases (optionally disabled, not stemmed).
*cleaned* means: optional stemming, optional stop words removal, configurable
punctuation, optional latex removal, optional html removal
This imply a refactoring of bibindex for performance reasons: at the moment
each record is parsed twice: once for idxWORD, a second time for idxPHRASE.
It would be better to parse it once and for all and to extract all the
desired strings. to be indexed.
(There might be a need for storing both stemmed and not stemmed words. In
this case: should we have also idxWORD_stemmed vs idxWORD_non_stemmed
separate tables?)
_______________________________________________________
Carbon-Copy List:
CC Address | Comment
------------------------------------+-----------------------------
2195 | -SUB-
==============================================================================
This item URL is:
<http://savannah.cern.ch/task/?8353>
_______________________________________________
Message sent via/by LCG Savannah
http://savannah.cern.ch/