Hi,

 
I wrote a small java application on Windows using Eclipse, that takes a certain
directory as input and tries to parse all found documents and then index using
Lucene.
 The problem is that  handler.toString()    documents result will be empty.
Here the codes:


     Parser parser = new AutoDetectParser();
             Metadata metadata = new Metadata();
             metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
             ParseContext parseContext = new ParseContext();
             ContentHandler handler = new BodyContentHandler();

            parser.parse(new FileInputStream(file), handler, metadata, 
parseContext);
             

             
System.out.println("-------------------------------------------------------");
             System.out.println("File: " + file);
             for (String name : metadata.names()) {
                 System.out.println("metadata: " + name + " - " + 
metadata.get(name));
             }
             System.out.println("Content: " + handler.toString());
             document.add(new Field("fulltext",handler.toString(),  
Store.NO,Index.ANALYZED));




Eclipse Console results:

File: C:\Program Files\cwseidocuments\2012\AgileSoftware.ppt
metadata: Content-Type - application/vnd.ms-powerpoint
metadata: resourceName - AgileSoftware.ppt
Content: 
  path= C:\Program Files\documents\2012\English.pdf
-------------------------------------------------------
File: C:\Program Files\documents\2012\English.pdf
metadata: Content-Type - application/pdf
metadata: resourceName - English.pdf
Content: 
 path= C:\Program Files\documents\2012\hotle.doc
-------------------------------------------------------
File: C:\Program Files\cwseidocuments\2012\hotle.doc
metadata: Content-Type - application/msword
metadata: resourceName - hotle.doc
Content: 



What is wrong with my code?
Thanks for your help.
Mass

Reply via email to