This is an automated notification sent by LCG Savannah.
It relates to:
                task #11225, project CDS Invenio

==============================================================================
 LATEST MODIFICATIONS of task #11225:
==============================================================================

Update of task #11225 (project cdsware):

                Priority:              5 - Normal => 1 - Later              
                  Status:                    None => Need Info              

    _______________________________________________________

Follow-up Comment #2:

Why would you suggest to remove pyRXP?  Please consider that (i) the
package is only recommended, nobody is forced to compile and use it;
(ii) the package has been available and working robustly since years;
(iii) it is the fastest XML parser we have; and (iv) it does not cost
anything to continue supporting it.

E.g. to illustrate the point iii, I have just remeasured the speed of
create_records() for the demo MARCXML file containing ~100 records on
an SLC5 box, giving the following results:

   1. pyrxp   ... 0.177 sec
   2. 4suite  ... 0.561 sec
   3. minidom ... 1.445 sec

As you can see, the next supported parser, 4suite, is more than three
times slower.  If we would like to parse a file with 1M of records,
pyrxp would finish in something like 29 minutes, while 4suite would
take 93 minutes.  Quite a difference.

Unless there is an equally fast alternative, pyrxp should stay.



==============================================================================
 OVERVIEW of task #11225:
==============================================================================

URL:
  <http://savannah.cern.ch/task/?11225>

                 Summary: Remove references to unmaintained rxp package
                 Project: CDS Invenio
            Submitted by: vengmark
            Submitted on: 2009-09-15 10:02
         Should Start On: 2009-09-15 00:00
   Should be Finished on: 2009-09-15 00:00
                Category: BibEdit
                Priority: 1 - Later
                  Status: Need Info
                 Privacy: Public
        Percent Complete: 0%
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
                  Effort: 0.00

    _______________________________________________________


It looks like the last version of rxp was released in 2004
<http://www.inf.ed.ac.uk/research/isdd/admin/package?download=80>, and it's
been removed from newer Linux distributions
<http://www.mail-archive.com/[email protected]/msg08400.html>.

References:
$ grep -R rxp *
INSTALL:              rxp gnuplot xpdf-utils gs-common antiword catdoc \
INSTALL:             <http://www.reportlab.org/pyrxp.html>
INSTALL:             <http://www.cogsci.ed.ac.uk/~richard/rxp.html>
modules/bibedit/lib/bibrecord_config.py:CFG_BIBRECORD_PARSERS_AVAILABLE =
['pyrxp', '4suite', 'minidom']
modules/bibedit/lib/bibrecord.py:    if 'pyrxp' in
CFG_BIBRECORD_PARSERS_AVAILABLE:
modules/bibedit/lib/bibrecord.py:        AVAILABLE_PARSERS.append('pyrxp')
modules/bibedit/lib/bibrecord.py:        if parser == 'pyrxp':
modules/bibedit/lib/bibrecord.py:            rec =
_create_record_rxp(marcxml, verbose, correct)
modules/bibedit/lib/bibrecord.py:#       'pyrxp': _create_record_rxp,
modules/bibedit/lib/bibrecord.py:def _create_record_rxp(marcxml,
verbose=CFG_BIBRECORD_DEFAULT_VERBOSE_LEVEL,
modules/bibedit/lib/bibrecord.py:    pyrxp_parser =
pyRXP.Parser(ErrorOnValidityErrors=0, ProcessDTD=1,
modules/bibedit/lib/bibrecord.py:        pyrxp_parser.ErrorOnValidityErrors =
1
modules/bibedit/lib/bibrecord.py:       
pyrxp_parser.ErrorOnUnquotedAttributeValues = 1
modules/bibedit/lib/bibrecord.py:        root = pyrxp_parser.parse(marcxml)
modules/bibedit/lib/bibrecord.py:        children =
_get_children_by_tag_name_rxp(root, 'record')
modules/bibedit/lib/bibrecord.py:    for controlfield in
_get_children_by_tag_name_rxp(root, 'controlfield'):
modules/bibedit/lib/bibrecord.py:    for datafield in
_get_children_by_tag_name_rxp(root, 'datafield'):
modules/bibedit/lib/bibrecord.py:        for subfield in
_get_children_by_tag_name_rxp(datafield, 'subfield'):
modules/bibedit/lib/bibrecord.py:def _get_children_by_tag_name_rxp(node,
name):
modules/bibedit/lib/bibrecord.py:    psyco.bind(_create_record_rxp)
modules/bibedit/lib/bibrecord_tests.py:        record =
bibrecord._create_record_rxp(self.xmltext)

    _______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: 2009-09-15 12:45              By: Tibor Simko <simko>
Why would you suggest to remove pyRXP?  Please consider that (i) the
package is only recommended, nobody is forced to compile and use it;
(ii) the package has been available and working robustly since years;
(iii) it is the fastest XML parser we have; and (iv) it does not cost
anything to continue supporting it.

E.g. to illustrate the point iii, I have just remeasured the speed of
create_records() for the demo MARCXML file containing ~100 records on
an SLC5 box, giving the following results:

   1. pyrxp   ... 0.177 sec
   2. 4suite  ... 0.561 sec
   3. minidom ... 1.445 sec

As you can see, the next supported parser, 4suite, is more than three
times slower.  If we would like to parse a file with 1M of records,
pyrxp would finish in something like 29 minutes, while 4suite would
take 93 minutes.  Quite a difference.

Unless there is an equally fast alternative, pyrxp should stay.



-------------------------------------------------------
Date: 2009-09-15 10:07              By: Victor Engmark <vengmark>
PS: The deprecation link mentions that xmllint could be used instead of rxp.
PPS: To correct the URL for the deprecation in Ubuntu, replace the "..." with
"changes".





    _______________________________________________________

Carbon-Copy List:

CC Address                          | Comment
------------------------------------+-----------------------------
1576                                | -COM-
3964                                | -SUB-




==============================================================================

This item URL is:
  <http://savannah.cern.ch/task/?11225>

_______________________________________________
  Message sent via/by LCG Savannah
  http://savannah.cern.ch/

Reply via email to