This is an automated notification sent by LCG Savannah.
It relates to:
task #11225, project CDS Invenio
==============================================================================
LATEST MODIFICATIONS of task #11225:
==============================================================================
Update of task #11225 (project cdsware):
Priority: 5 - Normal => 1 - Later
Status: None => Need Info
_______________________________________________________
Follow-up Comment #2:
Why would you suggest to remove pyRXP? Please consider that (i) the
package is only recommended, nobody is forced to compile and use it;
(ii) the package has been available and working robustly since years;
(iii) it is the fastest XML parser we have; and (iv) it does not cost
anything to continue supporting it.
E.g. to illustrate the point iii, I have just remeasured the speed of
create_records() for the demo MARCXML file containing ~100 records on
an SLC5 box, giving the following results:
1. pyrxp ... 0.177 sec
2. 4suite ... 0.561 sec
3. minidom ... 1.445 sec
As you can see, the next supported parser, 4suite, is more than three
times slower. If we would like to parse a file with 1M of records,
pyrxp would finish in something like 29 minutes, while 4suite would
take 93 minutes. Quite a difference.
Unless there is an equally fast alternative, pyrxp should stay.
==============================================================================
OVERVIEW of task #11225:
==============================================================================
URL:
<http://savannah.cern.ch/task/?11225>
Summary: Remove references to unmaintained rxp package
Project: CDS Invenio
Submitted by: vengmark
Submitted on: 2009-09-15 10:02
Should Start On: 2009-09-15 00:00
Should be Finished on: 2009-09-15 00:00
Category: BibEdit
Priority: 1 - Later
Status: Need Info
Privacy: Public
Percent Complete: 0%
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
Effort: 0.00
_______________________________________________________
It looks like the last version of rxp was released in 2004
<http://www.inf.ed.ac.uk/research/isdd/admin/package?download=80>, and it's
been removed from newer Linux distributions
<http://www.mail-archive.com/[email protected]/msg08400.html>.
References:
$ grep -R rxp *
INSTALL: rxp gnuplot xpdf-utils gs-common antiword catdoc \
INSTALL: <http://www.reportlab.org/pyrxp.html>
INSTALL: <http://www.cogsci.ed.ac.uk/~richard/rxp.html>
modules/bibedit/lib/bibrecord_config.py:CFG_BIBRECORD_PARSERS_AVAILABLE =
['pyrxp', '4suite', 'minidom']
modules/bibedit/lib/bibrecord.py: if 'pyrxp' in
CFG_BIBRECORD_PARSERS_AVAILABLE:
modules/bibedit/lib/bibrecord.py: AVAILABLE_PARSERS.append('pyrxp')
modules/bibedit/lib/bibrecord.py: if parser == 'pyrxp':
modules/bibedit/lib/bibrecord.py: rec =
_create_record_rxp(marcxml, verbose, correct)
modules/bibedit/lib/bibrecord.py:# 'pyrxp': _create_record_rxp,
modules/bibedit/lib/bibrecord.py:def _create_record_rxp(marcxml,
verbose=CFG_BIBRECORD_DEFAULT_VERBOSE_LEVEL,
modules/bibedit/lib/bibrecord.py: pyrxp_parser =
pyRXP.Parser(ErrorOnValidityErrors=0, ProcessDTD=1,
modules/bibedit/lib/bibrecord.py: pyrxp_parser.ErrorOnValidityErrors =
1
modules/bibedit/lib/bibrecord.py:
pyrxp_parser.ErrorOnUnquotedAttributeValues = 1
modules/bibedit/lib/bibrecord.py: root = pyrxp_parser.parse(marcxml)
modules/bibedit/lib/bibrecord.py: children =
_get_children_by_tag_name_rxp(root, 'record')
modules/bibedit/lib/bibrecord.py: for controlfield in
_get_children_by_tag_name_rxp(root, 'controlfield'):
modules/bibedit/lib/bibrecord.py: for datafield in
_get_children_by_tag_name_rxp(root, 'datafield'):
modules/bibedit/lib/bibrecord.py: for subfield in
_get_children_by_tag_name_rxp(datafield, 'subfield'):
modules/bibedit/lib/bibrecord.py:def _get_children_by_tag_name_rxp(node,
name):
modules/bibedit/lib/bibrecord.py: psyco.bind(_create_record_rxp)
modules/bibedit/lib/bibrecord_tests.py: record =
bibrecord._create_record_rxp(self.xmltext)
_______________________________________________________
Follow-up Comments:
-------------------------------------------------------
Date: 2009-09-15 12:45 By: Tibor Simko <simko>
Why would you suggest to remove pyRXP? Please consider that (i) the
package is only recommended, nobody is forced to compile and use it;
(ii) the package has been available and working robustly since years;
(iii) it is the fastest XML parser we have; and (iv) it does not cost
anything to continue supporting it.
E.g. to illustrate the point iii, I have just remeasured the speed of
create_records() for the demo MARCXML file containing ~100 records on
an SLC5 box, giving the following results:
1. pyrxp ... 0.177 sec
2. 4suite ... 0.561 sec
3. minidom ... 1.445 sec
As you can see, the next supported parser, 4suite, is more than three
times slower. If we would like to parse a file with 1M of records,
pyrxp would finish in something like 29 minutes, while 4suite would
take 93 minutes. Quite a difference.
Unless there is an equally fast alternative, pyrxp should stay.
-------------------------------------------------------
Date: 2009-09-15 10:07 By: Victor Engmark <vengmark>
PS: The deprecation link mentions that xmllint could be used instead of rxp.
PPS: To correct the URL for the deprecation in Ubuntu, replace the "..." with
"changes".
_______________________________________________________
Carbon-Copy List:
CC Address | Comment
------------------------------------+-----------------------------
1576 | -COM-
3964 | -SUB-
==============================================================================
This item URL is:
<http://savannah.cern.ch/task/?11225>
_______________________________________________
Message sent via/by LCG Savannah
http://savannah.cern.ch/