Might want to look into RecursiveMetadata Parser http://wiki.apache.org/tika/RecursiveMetadata
Or https://issues.apache.org/jira/i#browse/TIKA-1329?issueKey=TIKA-1329&serverRenderedViewIssue=true From: yeshwanth kumar [mailto:[email protected]] Sent: Monday, June 30, 2014 3:24 PM To: Allison, Timothy B. Subject: Re: Stack Overflow Question hi tim, thanks for quick reply, i changed the contenthandler to bodyContentHandler i got exception for maximum word limit, i used -1 in the bodycontenthandler constructor, now its another problem, filenames and content are present in string returned from handler.tostring() how can i map a fileName to its content. thanks, yeshwanth On Tue, Jul 1, 2014 at 12:35 AM, Allison, Timothy B. <[email protected]<mailto:[email protected]>> wrote: DefaultHandler is effectively a NullHandler; it doesn't store or do anything. Try BodyContentHandler or ToXMLHandler or maybe WriteoutHandler. If you want to write out each embedded file as a binary, try subclassing EmbeddedResourceHandler. QUOTE: 0down votefavorite<http://stackoverflow.com/questions/24495504/unable-tp-read-zipfile-using-apache-tika?sem=2> i am using Apache Tika 1.5 for parsing the contents present in a zip file, here's my sample code Parser parser = new AutoDetectParser(); ParseContext context = new ParseContext(); context.set(Parser.class, parser); ContentHandler handler = new DefaultHandler(); Metadata metadata = new Metadata(); InputStream stream = null; try { stream = TikaInputStream.get(new File(zipFilePath)); } catch (FileNotFoundException e) { e.printStackTrace(); } try { parser.parse(stream, handler, metadata, context); logger.info("Content:\t" + handler.toString()); } catch (IOException e) { e.printStackTrace(); } catch (SAXException e) { e.printStackTrace(); } catch (TikaException e) { e.printStackTrace(); } finally { try { stream.close(); } catch (IOException e) { e.printStackTrace(); } } in the logger statement all i see is org.xml.sax.helpers.DefaultHandler@5bd8e367<mailto:org.xml.sax.helpers.DefaultHandler@5bd8e367> i am missing something, unable to figure it out, looking for some help -----Original Message----- From: yeshwanth kumar [mailto:[email protected]<mailto:[email protected]>] Sent: Monday, June 30, 2014 1:28 PM To: [email protected]<mailto:[email protected]> Subject: Stack Overflow Question Unable tp read zipfile using Apache Tika http://stackoverflow.com/q/24495504/1899893?sem=2
