This is an automated notification sent by LCG Savannah.
It relates to:
task #13644, project CDS Invenio
==============================================================================
LATEST MODIFICATIONS of task #13644:
==============================================================================
Update of task #13644 (project cdsware):
Status: None => Done
Open/Closed: Open => Closed
_______________________________________________________
Follow-up Comment #3:
Speed issues fixed as follows:
* in bibformat, the only addition is filter_hidden_fields, that is called
only once per record, and only if of=xm (marcxml)
* caching records is done as before
* in search engine, in search_pattern(), acc_authorize_action() is called
only once
http://cdsware.cern.ch/repo/?p=personal/cds-invenio-marko.git;a=shortlog;h=hiddentags
==============================================================================
OVERVIEW of task #13644:
==============================================================================
URL:
<http://savannah.cern.ch/task/?13644>
Summary: Do not show hidden notes in records (unless
authorized)
Project: CDS Invenio
Submitted by: man
Submitted on: 2010-02-01 10:55
Should Start On: 2010-02-01 00:00
Should be Finished on: 2010-03-01 00:00
Category: WebAccess
Priority: 5 - Normal
Status: Done
Privacy: Public
Percent Complete: 100%
Assigned to: man
Open/Closed: Closed
Discussion Lock: Any
Effort: 20.00
_______________________________________________________
Records often contain "hidden" tags that contain information that is not
meant for end users.
Example: tags 595 in the Atlantis collection records are technical CERN
notes.
Task:
Define a new conf variable listing all hidden tags of an Invenio instance,
e.g. CFG_BIBFORMAT_HIDDEN_TAGS = 595
In the MARC and MARCXML output formats, especially when served for user-level
apps (e.g. print_record() and format_record()), filter these variables away,
depending on user_info (authorization: runbibedit)
In the search engine, in the search_pattern() function, before
calling search_unit(), check if people have rights to search this
unit (e.g. `runbibedit' rights), otherwise pretend that search_unit()
returned an empty hitset.
_______________________________________________________
Follow-up Comments:
-------------------------------------------------------
Date: 2010-02-04 13:11 By: Marko Niinimaki <man>
Speed issues fixed as follows:
* in bibformat, the only addition is filter_hidden_fields, that is called
only once per record, and only if of=xm (marcxml)
* caching records is done as before
* in search engine, in search_pattern(), acc_authorize_action() is called
only once
http://cdsware.cern.ch/repo/?p=personal/cds-invenio-marko.git;a=shortlog;h=hiddentags
-------------------------------------------------------
Date: 2010-02-03 15:59 By: Tibor Simko <simko>
I had a quick look at the patch diffs, not testing it at works yet.
Here are some issues we may need to fix:
- The MARC generation is always done on the fly, even if we could take
it from the DB. This is very inefficient, because it can be ~70
times slower to construct MARCXML from bibxxx tables:
%timeit format_record(1, 'xm')
1000 loops, best of 3: 412 us per loop
%timeit format_record(1, 'xm', on_the_fly=True)
10 loops, best of 3: 36.6 ms per loop
This is being done also for sites where CFG_BIBFORMAT_HIDDEN_TAGS is
empty, so can be quite a performance penalty.
(Moreover, we may stop using bibxxx tables one day, and use MARCXML
only, see some old musings.)
So, it would be preferable to fetch full MARCXML from the DB, and to
filter the hidden fields afterwards, if the user cannot see them. A
kind of post-processing of full MARCMXL in XSLT style in order to
remove hidden fields. In this way the internal accesses to MARCXML
(e.g. BibEdit) would be ultra fast, as before. And your wrapper
would be nicely separated from the previous code, so that we could
possibly do things like:
filter_marcxml(format_record(1, 'xm'), hide_tags=['595', '933'])
filter_marc(format_record(1, 'hm'), hide_tags=['333'])
- Another performance consideration: inside BibFormatObject
constructor, acc_authorize_action() is called always, but this is
only necessary to do for MARC and MARCXML output formats. So in the
vast majority of cases (e.g. a lambda user displaying 25 hits per
page in HTML brief format) we would call this thing unnecessarily.
Can you please move the hidden fields checking down to the MARC
output format branch, or at least initialize
self.can_see_hidden_fields variable only when self.format is of the
MARC type?
- Another performance consideration: in the search engine, in
search_pattern(), acc_authorize_action() is called many times, even
if it is not needed. Would be better to call it only once, and even
better only if some bsu_f starts with two numbers (which is rarely
the case). Moreover, we should probably activate the check for
hidden tag searching only in cases when users come via Web interface
(detected by the req context), so that CLI searches would be always
fast.
- BTW the patch removes the line ``p_tag = parse_tag(tag)'' from
bibformat_engine.py, leading to an undefined p_tag variable.
-------------------------------------------------------
Date: 2010-02-03 13:25 By: Marko Niinimaki <man>
Functionality implemented in
http://cdsware.cern.ch/repo/?p=personal/cds-invenio-marko.git;a=shortlog;h=hiddentags
_______________________________________________________
Carbon-Copy List:
CC Address | Comment
------------------------------------+-----------------------------
1576 | -COM-
3346 | -SUB-
==============================================================================
This item URL is:
<http://savannah.cern.ch/task/?13644>
_______________________________________________
Message sent via/by LCG Savannah
http://savannah.cern.ch/