Re: [PHP] XML and special characters

2006-01-28 Thread Adam Hubscher

Steve Clay wrote:

Sunday, January 22, 2006, 10:10:54 PM, Adam Hubscher wrote:


ee dee da da da? sect;eth; -- those that look like html entities are
the represented characters. I was mistaken, they are html entities, 



Can you show us a small chunk of this XML that throws errors?

You said you've tried various parsers.  Did none of those parsers have
error logging capabilities?  Show us the errors.

Steve


I realized my problem and fixed it.

For the future, a doctype is required no matter what ;)

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] XML and special characters

2006-01-22 Thread tedd

I've been having a tough time with parsing XML files and special characters.

-snip-

Any suggestions as to how I could get around this seemingly 
impossible road block thats been placed by what seems to be the xml 
engines :O..


Adam:

I believe that these special character will be with us for a long 
while. I suggest that you review the Unicode database for these 
characters and my suggestion is to use the code-points (HEX 
equivalences) for these characters. For example, 0061 is a small a, 
2022 is a bullet, 2713 is a check-mark and so on. Most language 
glyphs of the world are represented in the Unicode database.


HTH's

tedd

--

http://sperling.com/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] XML and special characters

2006-01-22 Thread Adam Hubscher

tedd wrote:
I've been having a tough time with parsing XML files and special 
characters.


-snip-

Any suggestions as to how I could get around this seemingly impossible 
road block thats been placed by what seems to be the xml engines :O..



Adam:

I believe that these special character will be with us for a long 
while. I suggest that you review the Unicode database for these 
characters and my suggestion is to use the code-points (HEX 
equivalences) for these characters. For example, 0061 is a small a, 
2022 is a bullet, 2713 is a check-mark and so on. Most language 
glyphs of the world are represented in the Unicode database.


HTH's

tedd


Oh, I understand that they'll be here for a while.

The problem is the XML file is not my own, rather, its generated by 
another service that I am creating a stemmed service for. I feel I have 
asked much of the owner of that service in creating a properly formed 
XML file (he was simply using pseudo xml that was, although nice and 
organized, unable to be parsed.. period, and took forever with pregs, at 
least now running through an XML generator the script itself takes less 
time on his part too, and hes thankful for that.)


There are usernames listed in the file that use these special characters.

Rather than have him have to well, go through and edit the 3 some 
odd users that are indexed... unless there is a way for the xml writer 
to do hex codes instead of unicode codes automatically... (and in that 
partake, is there any way to read them automatically with a parser?), 
then the idea is feasible.


Other than that, I'm trying to find a solution to parse the existing 
file with the unicode data that causes a fatal error in the parser.


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] XML and special characters

2006-01-22 Thread Adam Hubscher

Adam Hubscher wrote:

tedd wrote:

I've been having a tough time with parsing XML files and special 
characters.


-snip-

Any suggestions as to how I could get around this seemingly 
impossible road block thats been placed by what seems to be the xml 
engines :O..




Adam:

I believe that these special character will be with us for a long 
while. I suggest that you review the Unicode database for these 
characters and my suggestion is to use the code-points (HEX 
equivalences) for these characters. For example, 0061 is a small a, 
2022 is a bullet, 2713 is a check-mark and so on. Most language 
glyphs of the world are represented in the Unicode database.


HTH's

tedd


Oh, I understand that they'll be here for a while.

The problem is the XML file is not my own, rather, its generated by 
another service that I am creating a stemmed service for. I feel I have 
asked much of the owner of that service in creating a properly formed 
XML file (he was simply using pseudo xml that was, although nice and 
organized, unable to be parsed.. period, and took forever with pregs, at 
least now running through an XML generator the script itself takes less 
time on his part too, and hes thankful for that.)


There are usernames listed in the file that use these special characters.

Rather than have him have to well, go through and edit the 3 some 
odd users that are indexed... unless there is a way for the xml writer 
to do hex codes instead of unicode codes automatically... (and in that 
partake, is there any way to read them automatically with a parser?), 
then the idea is feasible.


Other than that, I'm trying to find a solution to parse the existing 
file with the unicode data that causes a fatal error in the parser.
ee dee da da da? sect;eth; -- those that look like html entities are 
the represented characters. I was mistaken, they are html entities, 
which is even odder to me.


I apologize for earlier referring to utf8, they do not decode with utf8, 
they decode with html entities. however, i continue to try methods to 
get it to read... still it does not read properly.


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php