Yeah, I know it has ( fairly big ) limitations, but I thought I should share what I found... I think it will do for my purposes right now... Its definately going to be ASCII only.
Hmmm.. so If I have no DTD, how do you recommend that I would set up the parser to ignore the characters? Thanks! James ---------------------------------------- Message History ---------------------------------------- From: Andy Clark <[EMAIL PROTECTED]> on 01/06/2001 01:48 ZE9 Please respond to [EMAIL PROTECTED] To: [EMAIL PROTECTED] cc: Subject: Re: Ignorable Whitespace ( and 'terminating with </>' ) James Richardson wrote: > cat bad.xml | perl -p -e 's/\<(.*?)\>(.*)\<\/\>/<$1>$2<\/$1>/' > good.xml As long as your file is ASCII, this should be fine. But XML is based on Unicode which can have any number of encodings. And unless your Perl understands this (not likely) this is limited to working for ASCII (and "ASCII-transparent") files. > Why are strings containing [\n\t ]* reported as character, rather > than ignorable whitespace? Without a grammar present, the parser has no knowledge as to what is meaningful character data and ignorable white- space. So if you want the [\n\t ]* reported as ignorable whitespace, then you have to have a grammer associated. -- Andy Clark * IBM, TRL - Japan * [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
