Hello, first of all: I do have a first name. It's "Johannes", and I prefer being called "Johannes" over "Leimbach". Thanks ;)
To your problem: Can you provide more information about it? What's in the cells and where do these errors come from? As far as I know HSSF is not able to read formulas or macros from Excel. Bye, Johannes -----Ursprüngliche Nachricht----- Von: Feris Thia [mailto:[EMAIL PROTECTED] Gesendet: Montag, 31. Juli 2006 18:01 An: POI Users List Betreff: Re: Extract Text From Excel Hello Suba, Michael and Leimbach, Thanks for the responses... it greatly helps me. Especially to Leimbach, I have used your wrapper and tested it with my application. It works great :) But I have some warnings (attach below) ... . Is it a limitation of HSSF not to be able to read some Excel format ? [java] [WARNING] Unknown Ptg 14 (20) at cell (5,2) [java] [WARNING] Unknown Ptg 14 (20) at cell (6,2) [java] [WARNING] Unknown Ptg 14 (20) at cell (16,2) [java] [WARNING] Unknown Ptg 14 (20) at cell (5,1) [java] [WARNING] Unknown Ptg 14 (20) at cell (6,1) [java] [WARNING] Unknown Ptg 14 (20) at cell (6,4) [java] [WARNING] Unknown Ptg 14 (20) at cell (24,1) [java] [WARNING] Unknown Ptg 14 (20) at cell (24,4) [java] [WARNING] Unknown Ptg 14 (20) at cell (25,1) [java] [WARNING] Unknown Ptg 14 (20) at cell (25,4) [java] [WARNING] Unknown Ptg 14 (20) at cell (26,1) [java] [WARNING] Unknown Ptg 14 (20) at cell (26,4) [java] [WARNING] Unknown Ptg 14 (20) at cell (27,1) [java] [WARNING] Unknown Ptg 14 (20) at cell (27,4) And one more thins... so HSSF do not read the value of formulas ? Regards, Feris On 7/31/06, Leimbach, Johannes <[EMAIL PROTECTED]> wrote: > > Hello, > > last week I wrote a wrapper class to facilitate text extraction from Excel > files, please see the sourcecode below. > Maybe this (or another example, don't care) should be posted on the POI > homepage - I see very few beginner's documentation there. > > Anyway, here's the class, should be self explanatory: > > package fulltext.common.processing.helpers.poi; > > import java.io.FileInputStream; > import java.io.IOException; > import java.util.Iterator; > > import org.apache.poi.hssf.usermodel.HSSFCell; > import org.apache.poi.hssf.usermodel.HSSFRow; > import org.apache.poi.hssf.usermodel.HSSFSheet; > import org.apache.poi.hssf.usermodel.HSSFWorkbook; > import org.apache.poi.poifs.filesystem.POIFSFileSystem; > > /** > * Wraps around the POI stuff to read an Excel (XLS) file from disk > */ > public class ExcelFileWrapper > { > private POIFSFileSystem _fileSystem; > private HSSFWorkbook _workbook; > > /** > * Initialize the object - does not read yet > * @throws IOException > */ > public ExcelFileWrapper(FileInputStream stream) throws IOException > { > if (stream == null) > throw new NullPointerException ("in > ExcelFileWrapper: ctor parameter 'stream' is null."); > // > _fileSystem = new POIFSFileSystem(stream); > _workbook = new HSSFWorkbook (_fileSystem); > } > > /** > * Return the contents of all sheets as string. > * Every textual cell's content is added here. > */ > public String readContents () > { > // return this > StringBuilder builder = new StringBuilder(); > > // for each sheet > for (int numSheets = 0; numSheets < > _workbook.getNumberOfSheets(); numSheets++) > { > HSSFSheet sheet = _workbook.getSheetAt(numSheets); > > // Iterate over each row in the sheet > Iterator rows = sheet.rowIterator(); > while( rows.hasNext() ) > { > HSSFRow row = (HSSFRow) rows.next(); > > // Iterate over each cell in the row and add the > cell's content > Iterator cells = row.cellIterator(); > while( cells.hasNext() ) > { > // get cell.. > HSSFCell cell = (HSSFCell) cells.next(); > // .. add to stringbuilder > processCell (cell, builder); > } > > } > > } // for numSheets .. > > // > return builder.toString(); > } > > /** > * Add the cells's content to the stringbuilder (if appropiate > content, i.e. text - no numbers) > */ > private void processCell (HSSFCell cell, StringBuilder builder) > { > switch ( cell.getCellType() ) > { > /* > case HSSFCell.CELL_TYPE_NUMERIC: > System.out.println( cell.getNumericCellValue() ); > break; > */ > case HSSFCell.CELL_TYPE_STRING: > builder.append (cell.getStringCellValue()); > builder.append (" "); > break; > > default: > break; > } > } > > } > > > - Johannes > > > -----Ursprüngliche Nachricht----- > Von: Michael J. Prichard [mailto:[EMAIL PROTECTED] > Gesendet: Montag, 31. Juli 2006 15:36 > An: POI Users List > Betreff: Re: Extract Text From Excel > > Hey Feris, > > That [HSSF] is what I use as well and it works pretty good. > > -Michael > > Suba Suresh wrote: > > > You can use the hssf libraries for excel text extraction. I used it > > for lucene indexing. > > > > suba suresh. > > > > Feris Thia wrote: > > > >> Hi All, > >> > >> I'm new to this user group. Is there any way to extract all the text > >> from > >> Excel documents ? Want to perform indexing using POI + Lucene :) > >> > >> Thanks, > >> > >> Feris > >> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > Mailing List: http://jakarta.apache.org/site/mail2.html#poi > > The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/ > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > Mailing List: http://jakarta.apache.org/site/mail2.html#poi > The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/ > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > Mailing List: http://jakarta.apache.org/site/mail2.html#poi > The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/ > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List: http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/
