I noticed that we are still using the version 0.9.4. However ther is a more recent version which was released 2 years ago: Version 0.9.5 (18 Jun 2005) ==> Added feature submitted by Asgeir Asgeirsson to allow scanner to fix character entity references for Microsoft Windows(r) characters; stopped building nekohtmlXni.jar file by default; fixed handling of <blockquote> reported by Joseph Walton to better match browser behavior; fixed tag-balancing bug for unknown elements reported by Marc Guillemot and Vadim Tashlikovich; fixed mapping of encoding name in <meta> element reported by Marc Guillemot; changed tag-balancing to allow headers inside of links suggested by Laurens Fridael; applied attribute namespace patch from Joseph Walton; fixed namespace bug for "xml" prefixes reported by Asgeir Asgeirsson; fixed namespace bug for "xmlns" prefixes reported by Johannes Koch; and fixed no-such-method exception bug when using augmentations feature with older versions of Xerces2 reported by Hans Donner.Why we didn't use this within Nutch ? Should we update Trunk ?
