Hi,

I am writing a crawler to get some info on web pages and I am using commons 
lang 
to unescape the html file.
I was having some problems with my regex expressions until I realized that the 
following is printing false:

System.out.println(" ".equals(StringEscapeUtils. unescapeHtml(" ")));

Is this a bug? Or is it the expected behavior of the unescape method when 
dealing with escaped space characters?


Also, if I unescape 'sbrubbles&nbps;' and then trim() it, the space still 
appears in the end of the string.
Visually  speaking, unescaping ' ' returns a space. But programmatically 
speaking, the system doesn't recognize it as a space character.

Thanks in advance,
Vitor.


      

Reply via email to