@Chris, you are right, since I added the <?xml version="1.0" encoding="UTF-8" ?> xmllint also prints it properly. But with "xmllint --encode ascii wiki.xml" you get my describe behaviour, strange default..
Anyway, so all characters are valid UTF-8. But what I found is that most characters in that document aren't those they appear to be. For example most y's aren't actually the ordinary Y (y) but rather the "Latin Capital Letter Y with hook" (Ƴ). Similarily, some i's aren't actually the ordinary I (i), but the "Cyrillic Small Letter Byelorussian-Ukrainian I" (і). Hope that helps. -- You received this bug notification because you are a member of Zorba Coders, which is the registrant for Zorba. https://bugs.launchpad.net/bugs/1027270 Title: xml:parse() - infinite loop Status in Zorba - The XQuery Processor: Confirmed Bug description: "xmllint wiki.xml" reveals that for some reason the input file contains lots of numeric character references (cat and vim decode those automatically). Strangely it doesn't seem to be only one character but a combination of lines that provokes the behaviour (I tried removing some lines individually but couldn't reproduce after that). To manage notifications about this bug go to: https://bugs.launchpad.net/zorba/+bug/1027270/+subscriptions -- Mailing list: https://launchpad.net/~zorba-coders Post to : firstname.lastname@example.org Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp