I'm not certain how StringEscapeUtils handles it, but in HTML land, it should be equal to character 160 instead of 32. It has different meaning than space.

Michael Akerman
Systems Analyst
University IT Services

----- Original Message ----- From: "Vitor Costa" <[email protected]>
To: <[email protected]>
Sent: Wednesday, August 25, 2010 4:50 PM
Subject: [lang] StringEscapeUtils.unescapeHtml(" ") doesn't return a space


Hi,

I am writing a crawler to get some info on web pages and I am using commons lang
to unescape the html file.
I was having some problems with my regex expressions until I realized that the
following is printing false:

System.out.println(" ".equals(StringEscapeUtils. unescapeHtml("&nbsp;")));

Is this a bug? Or is it the expected behavior of the unescape method when
dealing with escaped space characters?


Also, if I unescape 'sbrubbles&nbps;' and then trim() it, the space still
appears in the end of the string.
Visually  speaking, unescaping '&nbsp;' returns a space. But programmatically
speaking, the system doesn't recognize it as a space character.

Thanks in advance,
Vitor.





---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to