#45: full-text snippets: configuration to use number of chars
-----------------------+----------------------------------------------------
Reporter: simko | Owner:
Type: defect | Status: new
Priority: blocker | Milestone: v1.0
Component: BibFormat | Version:
Keywords: |
-----------------------+----------------------------------------------------
1) The full-text snippet configuration needs to use the number of
characters, not the number of words. We are allowed to show say 100
characters around the pattern, rounded to the closest word outside of
these 100 characters. So we need to replace
CFG_WEBSEARCH_FULLTEXT_SNIPPETS_WORDS configuration variables with
character counting before v1.0 is out, in order to stabilize the
config file.
2) Moreover, the length of the snippet depends on the full-text file
provenance. The provenance is currently store as bibdoc type, so this
has to be analyzed when snippets are generated. The full-text snippet
configuration should then look almost like a dictionary:
{{{
CFG_WEBSEARCH_FULLTEXT_SNIPPETS_CHARS = {
'arXiv': 200,
'Springer': 180,
'APS': 100,
}
}}}
3) Even the number of snippets to show can perhaps vary per source, so
it may be perhaps good to store it in the configuration as well, e.g.
(50, 200) would mean we are able to show up to 50 snippets containing
up to 200 characters.
4) The configuration variable that determines how many snippets are
shown per record in the HTML brief output format on the search results
pages can probably stay source independent.
--
Ticket URL: <http://cdswaredev.cern.ch/invenio/ticket/45>
Invenio <http://cdswaredev.cern.ch/invenio>