#45: full-text snippets: configuration to use number of chars
-----------------------+----------------------------------------------------
 Reporter:  simko      |       Owner:      
     Type:  defect     |      Status:  new 
 Priority:  blocker    |   Milestone:  v1.0
Component:  BibFormat  |     Version:      
 Keywords:             |  
-----------------------+----------------------------------------------------
 1) The full-text snippet configuration needs to use the number of
 characters, not the number of words.  We are allowed to show say 100
 characters around the pattern, rounded to the closest word outside of
 these 100 characters.  So we need to replace
 CFG_WEBSEARCH_FULLTEXT_SNIPPETS_WORDS configuration variables with
 character counting before v1.0 is out, in order to stabilize the
 config file.

 2) Moreover, the length of the snippet depends on the full-text file
 provenance.  The provenance is currently store as bibdoc type, so this
 has to be analyzed when snippets are generated.  The full-text snippet
 configuration should then look almost like a dictionary:

 {{{
 CFG_WEBSEARCH_FULLTEXT_SNIPPETS_CHARS = {
   'arXiv': 200,
   'Springer': 180,
   'APS': 100,
 }
 }}}

 3) Even the number of snippets to show can perhaps vary per source, so
 it may be perhaps good to store it in the configuration as well, e.g.
 (50, 200) would mean we are able to show up to 50 snippets containing
 up to 200 characters.

 4) The configuration variable that determines how many snippets are
 shown per record in the HTML brief output format on the search results
 pages can probably stay source independent.

-- 
Ticket URL: <http://cdswaredev.cern.ch/invenio/ticket/45>
Invenio <http://cdswaredev.cern.ch/invenio>

Reply via email to