I've been having a tough time with parsing XML files and special characters.

I have attempted every applicable engine, last try SAX, to attempt at parsing a (rather large, 17.8mb) xml file.

The problem I hit, is when it hits a UTF8 encoded character. I've attempted at decoded the file before it hits the parser, I've attempted even ENCODING it (god knows why that'd work, it didnt, lol). I've tried html_entities, etc. Nothing as such has worked.

I've also tried simply removing the character, and low/behold, it worked! Darned thing...

§µÖÕÔÓÒ

Those are the characters so far that have caused me problems. I'd give the utf8 encoded equivalent, but I'm not sure of it off the top of my head.

My code, varies so much that I'm not sure it'd be useful to type it out. The issue seems not to be with my code, as when I parse the file manually with a whole bunch of inefficient regex statements, everything works out peachy. The problem with that way again is, it eats system resources for a very long time (remember, 17mb file, and its all plain text? :)).

Any suggestions as to how I could get around this seemingly impossible road block thats been placed by what seems to be the xml engines :O..

Thanks!

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to