Yeah, I know it has ( fairly big ) limitations, but I thought I should share 
what I found... I think it will do
for my purposes right now... Its definately going to be ASCII only.

Hmmm.. so If I have no DTD, how do you recommend that I would set up the parser 
to ignore the characters?

Thanks!

James


---------------------------------------- Message History 
----------------------------------------


From: Andy Clark <[EMAIL PROTECTED]> on 01/06/2001 01:48 ZE9

Please respond to [EMAIL PROTECTED]

To:   [EMAIL PROTECTED]
cc:
Subject:  Re: Ignorable Whitespace ( and 'terminating with </>' )


James Richardson wrote:
> cat bad.xml | perl -p -e 's/\<(.*?)\>(.*)\<\/\>/<$1>$2<\/$1>/' > good.xml

As long as your file is ASCII, this should be fine. But XML
is based on Unicode which can have any number of encodings.
And unless your Perl understands this (not likely) this is
limited to working for ASCII (and "ASCII-transparent") files.

> Why are strings containing [\n\t ]* reported as character, rather
> than ignorable whitespace?

Without a grammar present, the parser has no knowledge as
to what is meaningful character data and ignorable white-
space. So if you want the [\n\t ]* reported as ignorable
whitespace, then you have to have a grammer associated.

--
Andy Clark * IBM, TRL - Japan * [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






--

This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and destroy this e-mail. Any unauthorised 
copying, disclosure or distribution of the material in this e-mail is strictly 
forbidden.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to