Author: jukka
Date: Wed Jan  7 07:41:38 2009
New Revision: 732370

URL: http://svn.apache.org/viewvc?rev=732370&view=rev
Log:
TIKA-180: XHTMLContentHandler unable to extract text from MSWord file

Use the SafeContentHandler class in XHTMLContentHandler to prevent all current 
Tika parsers from outputting invalid XML characters.

Modified:
    lucene/tika/trunk/src/main/java/org/apache/tika/sax/XHTMLContentHandler.java

Modified: 
lucene/tika/trunk/src/main/java/org/apache/tika/sax/XHTMLContentHandler.java
URL: 
http://svn.apache.org/viewvc/lucene/tika/trunk/src/main/java/org/apache/tika/sax/XHTMLContentHandler.java?rev=732370&r1=732369&r2=732370&view=diff
==============================================================================
--- 
lucene/tika/trunk/src/main/java/org/apache/tika/sax/XHTMLContentHandler.java 
(original)
+++ 
lucene/tika/trunk/src/main/java/org/apache/tika/sax/XHTMLContentHandler.java 
Wed Jan  7 07:41:38 2009
@@ -26,7 +26,7 @@
  * Content handler decorator that simplifies the task of producing XHTML
  * events for Tika content parsers.
  */
-public class XHTMLContentHandler extends ContentHandlerDecorator {
+public class XHTMLContentHandler extends SafeContentHandler {
 
     /**
      * The XHTML namespace URI


Reply via email to