Solved: tika-app-1.3 was missing from Reference library
Mass ________________________________ From: Massoud Kohan <[email protected]> To: "[email protected]" <[email protected]> Sent: Wednesday, January 23, 2013 8:37 AM Subject: Empty contenthandler.toString() using AutoDetectParser Hi, I wrote a small java application on Windows using Eclipse, that takes a certain directory as input and tries to parse all found documents and then index using Lucene. The problem is that handler.toString() documents result will be empty. Here the codes: Parser parser = new AutoDetectParser(); Metadata metadata = new Metadata(); metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName()); ParseContext parseContext = new ParseContext(); ContentHandler handler = new BodyContentHandler(); parser.parse(new FileInputStream(file), handler, metadata, parseContext); System.out.println("-------------------------------------------------------"); System.out.println("File: " + file); for (String name : metadata.names()) { System.out.println("metadata: " + name + " - " + metadata.get(name)); } System.out.println("Content: " + handler.toString()); document.add(new Field("fulltext",handler.toString(), Store.NO,Index.ANALYZED)); Eclipse Console results: File: C:\Program Files\cwseidocuments\2012\AgileSoftware.ppt metadata: Content-Type - application/vnd.ms-powerpoint metadata: resourceName - AgileSoftware.ppt Content: path= C:\Program Files\documents\2012\English.pdf ------------------------------------------------------- File: C:\Program Files\documents\2012\English.pdf metadata: Content-Type - application/pdf metadata: resourceName - English.pdf Content: path= C:\Program Files\documents\2012\hotle.doc ------------------------------------------------------- File: C:\Program Files\cwseidocuments\2012\hotle.doc metadata: Content-Type - application/msword metadata: resourceName - hotle.doc Content: What is wrong with my code? Thanks for your help. Mass
