Hello, I have met a problem with uploading of converted (harvested) records from a specific OAI repository (http://www.vse.cz/oai2/). After Invenio converts harvested records it rejects to upload them because of bad formatting? More accurately, the error file (bibsched_task.err) starts as "2010-04-30 17:11:48 --> Error: MARCXML file has wrong format: [(None, 0, ['Document not well formed(Tagname : subfield)']), ({'856': [([('u', 'http://www.vse.cz/vskp/eid/5620')], '4', ' ', '', 1)], '650': [([('a', 'neziskov\xc3\xa9 organizace; efektivita')], ' ', '7', '', 10)], '980': [([('a', 'VSE')], ' ', ' ', '', 13)], '909': [([('u', 'oai:vse.cz:vskp/5620')], 'C', 'O', '', 12)], '998': [([('a', 'Vysok\\xc3\\xa1_\\xc5\\xa1kola_ekonomick\\xc3\\xa1_v_Praze')], ' ', ' ', '', 14)], '245': [([('a', 'Welche Leistungen erbringen gemeinn\xc3\xbctzige Organisationen im sozialen, kulturellen und freizeitorientierten Bereich in der Region S\xc3\xbcdb\xc3\xb6hmen?')], '0', '0',...................................." I guess that problem could be caused by (coding) certain czech letters (diacritic marks) as ě, š, č, ř, ž, ...? However, harvesting another czech repositories works well.. For comparison I have attached the error file. In addition, I would like to ask on possibility of creating Report Number analog for harvested records which would contain its ID (control field 001): Specifically, the identifier should possess syntax: "nusl-ID". Since it seems that Invenio assigns IDs when uploading converted (harvested) records into database. And hence, I cannot employ the value of the control field 001 within an oaidc2marcxml.xsl file (since it has not yet been assigned)? If this is true, is there another way how to use ID (from control field 001) for creating the required identifier? Thank you for any answer or advice. Jindrich Dolansky
bibsched_task_576.err
Description: Binary data
