#559: BibUpload: Cannot bibupload file containing UTF-8 chars
--------------------------+------------------------
Reporter: grfavre | Owner:
Type: enhancement | Status: new
Priority: major | Milestone: v1.0
Component: WebSubmit | Version:
Resolution: | Keywords: bibdocfile
--------------------------+------------------------
Changes (by grfavre):
* keywords: => bibdocfile
* priority: critical => major
* component: BibUpload => WebSubmit
* type: defect => enhancement
Comment:
I finally found out the solution.
For some reason, bibupload checks stuff using bibdocfile: it adds
comments and descriptions to the MARC using the get_description() and
get_comment() functions. These functions retrieve content pickled in a
blob in the database (this is real bad design, sorry guys, a database is
by no way meant to contain language specific stuff.).
As no escaping is made on the content initially passed to set_comment or
set_definition, it will then crash when building MARC if this content was
a unicode object rather than an encoded string.
The solution I used was to re-encode all descriptions:
{{{
from invenio.dbquery import run_sql
from invenio.bibdocfile import BibRecDocs
recids = run_sql("select id_bibrec from bibrec_bibdoc")
def stringize(str_like, default='n/a'):
if type(str_like) == str:
return str_like
if type(str_like) == unicode:
return str_like.encode('utf-8')
elif type(str_like) == type(None):
return default
else:
raise ValueError
for (recid,) in recids:
archive = BibRecDocs(recid)
for bibdoc in archive.bibdocs:
for bfile in bibdoc.list_all_files():
description = stringize(bfile.get_description())
bibdoc.set_description(description, bfile.get_format(),
bfile.get_version())
}}}
The simplest solution would be to check string-like objects before storing
them in the database. One should modify bibdocfile => BibDocMoreInfo and
make it escape content before storing it.
This problem would never have happened if values were stored in a SQL
field (which is already encoded by the database). The best possible
solution would be to store such content directly in the tables. Anyway,
this effort would cost slightly more in development time (modifications of
the API, tests an migration kits)...
--
Ticket URL: <http://invenio-software.org/ticket/559#comment:12>
Invenio <http://invenio-software.org>