I tried the July4th build. The warnings are gone. Thank You.

I used the following code for a couple of small excel files to index with lucene. I don't know how effective the search is going to be since it is still in the implementation stage.If there are any errors please let me know.

public class ExcelHandler implements DocumentHandler {
        
        String fileName;
        public ExcelHandler(String name) {
                super();
                fileName = new String(name);
                
        }

public Document getDocument(InputStream is) throws DocumentHandlerException {

Document doc = new Document();
POIFSDocument pdoc = new POIFSDocument(fileName,is);
DocumentInputStream docis = new DocumentInputStream(pdoc);
byte[] content = new byte[docis.available()];
        docis.read(content);
        docis.close();
        StringBuffer textBuf = new StringBuffer();
        for(int i =0; i<content.length; i++){
                String byteString = new Byte(content[i]).toString();
                 textBuf.append(byteString);
        }
        String text = textBuf.toString();
        if((text!=null) && (!text.equals(""))){
                        
                doc.add(new Field("body", text, Field.Store.YES, 
Field.Index.NO));
                }
        }

        catch(IOException io){
                throw new DocumentHandlerException("Cannot parse Excel 
Document", io);
        }
                return doc;
        }
}

Separately in another file I am indexing the filename, filepath, date as keywords. Hope it helps.

thanks,
suba suresh.



Nick Burch wrote:
On Tue, 27 Jun 2006, Suba Suresh wrote:

Thank you for all the pointers.  It is a great help. I used today's
build. It worked fine for WordDocument. I did not try the meta data yet.
For PowerPoint I am getting the following for powerpoint extractor just
for one file. Am I doing anything wrong? I did'nt change my code.


These errors should now have gone. Can you try a new svn checkout /
tomorrow's SVN build?



Also since some the excel files were not 97-2002 format I used the
POIFSFilesystem and read it as a bytestream and stored as text string. I
hope that is fine.


If you have some code for getting some basic text out of Excel 95 files,
we'd be interested in hosting it. I'm sure that something that outputs
text that can be fed to lucene would be useful for a lot of people, even
if that's all the excel 95 support we have.

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

Reply via email to