Hi,
I am writing a crawler to get some info on web pages and I am using commons
lang
to unescape the html file.
I was having some problems with my regex expressions until I realized that the
following is printing false:
System.out.println(" ".equals(StringEscapeUtils. unescapeHtml(" ")));
Is this a bug? Or is it the expected behavior of the unescape method when
dealing with escaped space characters?
Also, if I unescape 'sbrubbles&nbps;' and then trim() it, the space still
appears in the end of the string.
Visually speaking, unescaping ' ' returns a space. But programmatically
speaking, the system doesn't recognize it as a space character.
Thanks in advance,
Vitor.