collapsing unicode white space

Scott Wilson Thu, 29 Oct 2009 09:21:39 -0700

Hi everyone,

I need to implement a W3C processing algorithm which states:


10.1.8 Rule for Getting Text Content with Normalized White Space

The rule for getting text content with normalized white space is given in the following algorithm. The algorithm always returns a string, which MAY be empty.


        • Let input be the Element to be processed.

• Let result be the result of applying the rule for getting text content to input. • In result, convert any sequence of one or more Unicode white space characters into a single U+0020 SPACE.

        • Return result.

The step I'm having problems with is "convert any sequence of one or more Unicode white space characters into a single U+0020 SPACE."

The StringUtils replace() and CharSetUtils squeeze() methods would seem to be best suited for solving this one, but there doesn't seem to be a set syntax for easily specifying unicode white space chars defined for one thing.

Has anyone else solved a similar problem using commons lang, or should I consider using something else?


Thanks!

S


/-/-/-/-/-/
Scott Wilson
Apache Wookie: http://incubator.apache.org/projects/wookie.html

smime.p7s
Description: S/MIME cryptographic signature

collapsing unicode white space

Reply via email to