Author: jukka
Date: Thu Sep 4 10:40:36 2008
New Revision: 692170
URL: http://svn.apache.org/viewvc?rev=692170&view=rev
Log:
TIKA-149: Parser for zip files
Include some newlines to make the plain text output a bit more readable (and to
avoid words running into each other and breaking full text indexing)
Modified:
incubator/tika/trunk/src/main/java/org/apache/tika/parser/zip/ZipParser.java
Modified:
incubator/tika/trunk/src/main/java/org/apache/tika/parser/zip/ZipParser.java
URL:
http://svn.apache.org/viewvc/incubator/tika/trunk/src/main/java/org/apache/tika/parser/zip/ZipParser.java?rev=692170&r1=692169&r2=692170&view=diff
==============================================================================
---
incubator/tika/trunk/src/main/java/org/apache/tika/parser/zip/ZipParser.java
(original)
+++
incubator/tika/trunk/src/main/java/org/apache/tika/parser/zip/ZipParser.java
Thu Sep 4 10:40:36 2008
@@ -83,6 +83,7 @@
throws IOException, SAXException {
xhtml.startElement("div", "class", "file");
xhtml.element("h1", entry.getName());
+ xhtml.characters("\n");
try {
Metadata metadata = new Metadata();
@@ -95,6 +96,7 @@
// Could not parse the entry, just skip the content
}
+ xhtml.characters("\n");
xhtml.endElement("div");
}