#299: Bibupload: ignore elements with no child text node
---------------------+------------------------------------------------------
Reporter: Maddog | Type: enhancement
Status: new | Priority: minor
Milestone: | Component: BibUpload
Version: v0.99.1 | Keywords:
---------------------+------------------------------------------------------
Hi,
in real-life harvesting I am often confronting with one problem - OAI
repositories are usually messy and it is very difficult to avoid having
some empty elements in converted MARCXML file for upload (and not having
conditions all over transformation stylesheet). When you try to upload
MARCXML file with some empty datafield or subfield, the bibupload crashes.
I think it would be great just to ignore all (max-depth) elements
containing no text nodes. For this i edited open_marc_file() function in
bibupload.py file and added some simple pre-processing:
{{{
### Extra imports
import xml.dom.minidom as dom
from xml.parsers.expat import ExpatError
def open_marc_file(path):
"""Open a file and return the data"""
try:
# open the file containing the marc document
marc_file = open(path,'r')
marc = marc_file.read()
marc_file.close()
### My edit ###
try:
marcDom = dom.parseString(marc)
subfields = marcDom.getElementsByTagName("subfield")
for e in subfields:
if not e.hasChildNodes():
parent = e.parentNode
parent.removeChild(e)
parent.normalize()
if len(parent.childNodes) == 1 and
isinstance(parent.childNodes[0], dom.Text):
parent.removeChild(parent.childNodes[0])
fields = marcDom.getElementsByTagName("datafield")
for e in fields:
if not e.hasChildNodes():
parent = e.parentNode
parent.removeChild(e)
marc = marcDom.toxml().encode('utf-8')
except ExpatError:
None
### End of my edit ###
except IOError, erro:
write_message("Error: %s" % erro, verbose=1, stream=sys.stderr)
write_message("Exiting.", sys.stderr)
task_update_status("ERROR")
sys.exit(1)
return marc
}}}
it works well for me so far, so if there is nothing wrong with this, maybe
you should consider adding something like this in the system
--
Ticket URL: <http://invenio-software.org/ticket/299>
Invenio <http://invenio-software.org>