Hello,
last week I wrote a wrapper class to facilitate text extraction from Excel
files, please see the sourcecode below.
Maybe this (or another example, don't care) should be posted on the POI
homepage - I see very few beginner's documentation there.
Anyway, here's the class, should be self explanatory:
package fulltext.common.processing.helpers.poi;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.Iterator;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
/**
* Wraps around the POI stuff to read an Excel (XLS) file from disk
*/
public class ExcelFileWrapper
{
private POIFSFileSystem _fileSystem;
private HSSFWorkbook _workbook;
/**
* Initialize the object - does not read yet
* @throws IOException
*/
public ExcelFileWrapper(FileInputStream stream) throws IOException
{
if (stream == null)
throw new NullPointerException ("in ExcelFileWrapper:
ctor parameter 'stream' is null.");
//
_fileSystem = new POIFSFileSystem(stream);
_workbook = new HSSFWorkbook (_fileSystem);
}
/**
* Return the contents of all sheets as string.
* Every textual cell's content is added here.
*/
public String readContents ()
{
// return this
StringBuilder builder = new StringBuilder();
// for each sheet
for (int numSheets = 0; numSheets <
_workbook.getNumberOfSheets(); numSheets++)
{
HSSFSheet sheet = _workbook.getSheetAt(numSheets);
// Iterate over each row in the sheet
Iterator rows = sheet.rowIterator();
while( rows.hasNext() )
{
HSSFRow row = (HSSFRow) rows.next();
// Iterate over each cell in the row and add the cell's
content
Iterator cells = row.cellIterator();
while( cells.hasNext() )
{
// get cell..
HSSFCell cell = (HSSFCell) cells.next();
// .. add to stringbuilder
processCell (cell, builder);
}
}
} // for numSheets ..
//
return builder.toString();
}
/**
* Add the cells's content to the stringbuilder (if appropiate content,
i.e. text - no numbers)
*/
private void processCell (HSSFCell cell, StringBuilder builder)
{
switch ( cell.getCellType() )
{
/*
case HSSFCell.CELL_TYPE_NUMERIC:
System.out.println( cell.getNumericCellValue() );
break;
*/
case HSSFCell.CELL_TYPE_STRING:
builder.append (cell.getStringCellValue());
builder.append (" ");
break;
default:
break;
}
}
}
- Johannes
-----Ursprüngliche Nachricht-----
Von: Michael J. Prichard [mailto:[EMAIL PROTECTED]
Gesendet: Montag, 31. Juli 2006 15:36
An: POI Users List
Betreff: Re: Extract Text From Excel
Hey Feris,
That [HSSF] is what I use as well and it works pretty good.
-Michael
Suba Suresh wrote:
> You can use the hssf libraries for excel text extraction. I used it
> for lucene indexing.
>
> suba suresh.
>
> Feris Thia wrote:
>
>> Hi All,
>>
>> I'm new to this user group. Is there any way to extract all the text
>> from
>> Excel documents ? Want to perform indexing using POI + Lucene :)
>>
>> Thanks,
>>
>> Feris
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> Mailing List: http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/