This is an automated notification sent by LCG Savannah.
It relates to:
                task #8353, project CDS Invenio

==============================================================================
 OVERVIEW of task #8353:
==============================================================================

URL:
  <http://savannah.cern.ch/task/?8353>

                 Summary: BibIndex enhancements
                 Project: CDS Invenio
            Submitted by: skaplun
            Submitted on: 2008-11-05 16:45
         Should Start On: 2008-11-05 00:00
   Should be Finished on: 2008-11-05 00:00
                Category: BibIndex
                Priority: 3 - Low
                  Status: None
                 Privacy: Public
        Percent Complete: 0%
             Assigned to: skaplun
             Open/Closed: Open
         Discussion Lock: Any
                  Effort: 0.00

    _______________________________________________________


This is a proposal for enhancing BibIndex.
* three kind of tables per index:
  * idxWORD with single words (optionally *cleaned*)
  * idxPAIR with pair of words (optionally *cleaned*) for partial phrase
searching
  * idxPHRASE for full phrases (optionally disabled, not stemmed).
*cleaned* means: optional stemming, optional stop words removal, configurable
punctuation, optional latex removal, optional html removal

This imply a refactoring of bibindex for performance reasons: at the moment
each record is parsed twice: once for idxWORD, a second time for idxPHRASE.
It would be better to parse it once and for all and to extract all the
desired strings. to be indexed.

(There might be a need for storing both stemmed and not stemmed words. In
this case: should we have also idxWORD_stemmed vs idxWORD_non_stemmed
separate tables?)



    _______________________________________________________

Carbon-Copy List:

CC Address                          | Comment
------------------------------------+-----------------------------
2195                                | -SUB-




==============================================================================

This item URL is:
  <http://savannah.cern.ch/task/?8353>

_______________________________________________
  Message sent via/by LCG Savannah
  http://savannah.cern.ch/


Reply via email to