Hello Suba, Michael and Leimbach,

Thanks for the responses... it greatly helps me. Especially to Leimbach, I
have used your wrapper and tested it with my application. It works great :)

But I have some warnings (attach below) ... . Is it a limitation of HSSF not
to be able to read some Excel format ?

[java] [WARNING] Unknown Ptg 14 (20) at cell (5,2)
[java] [WARNING] Unknown Ptg 14 (20) at cell (6,2)
[java] [WARNING] Unknown Ptg 14 (20) at cell (16,2)
[java] [WARNING] Unknown Ptg 14 (20) at cell (5,1)
[java] [WARNING] Unknown Ptg 14 (20) at cell (6,1)
[java] [WARNING] Unknown Ptg 14 (20) at cell (6,4)
[java] [WARNING] Unknown Ptg 14 (20) at cell (24,1)
[java] [WARNING] Unknown Ptg 14 (20) at cell (24,4)
[java] [WARNING] Unknown Ptg 14 (20) at cell (25,1)
[java] [WARNING] Unknown Ptg 14 (20) at cell (25,4)
[java] [WARNING] Unknown Ptg 14 (20) at cell (26,1)
[java] [WARNING] Unknown Ptg 14 (20) at cell (26,4)
[java] [WARNING] Unknown Ptg 14 (20) at cell (27,1)
[java] [WARNING] Unknown Ptg 14 (20) at cell (27,4)

And one more thins... so HSSF do not read the value of formulas ?

Regards,

Feris

On 7/31/06, Leimbach, Johannes <[EMAIL PROTECTED]> wrote:

Hello,

last week I wrote a wrapper class to facilitate text extraction from Excel
files, please see the sourcecode below.
Maybe this (or another example, don't care) should be posted on the POI
homepage - I see very few beginner's documentation there.

Anyway, here's the class, should be self explanatory:

package fulltext.common.processing.helpers.poi;

import java.io.FileInputStream;
import java.io.IOException;
import java.util.Iterator;

import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;

/**
* Wraps around the POI stuff to read an Excel (XLS) file from disk
*/
public class ExcelFileWrapper
{
        private POIFSFileSystem _fileSystem;
        private HSSFWorkbook _workbook;

        /**
         * Initialize the object - does not read yet
         * @throws IOException
         */
        public ExcelFileWrapper(FileInputStream stream) throws IOException
        {
                if (stream == null)
                        throw new NullPointerException ("in
ExcelFileWrapper: ctor parameter 'stream' is null.");
                //
        _fileSystem = new POIFSFileSystem(stream);
        _workbook = new HSSFWorkbook (_fileSystem);
        }

        /**
         * Return the contents of all sheets as string.
         * Every textual cell's content is added here.
         */
        public String readContents ()
        {
                // return this
                StringBuilder builder = new StringBuilder();

                // for each sheet
                for (int numSheets = 0; numSheets <
_workbook.getNumberOfSheets(); numSheets++)
                {
                HSSFSheet sheet = _workbook.getSheetAt(numSheets);

                // Iterate over each row in the sheet
                Iterator rows = sheet.rowIterator();
                while( rows.hasNext() )
                {
                    HSSFRow row = (HSSFRow) rows.next();

                    // Iterate over each cell in the row and add the
cell's content
                    Iterator cells = row.cellIterator();
                    while( cells.hasNext() )
                    {
                        // get cell..
                        HSSFCell cell = (HSSFCell) cells.next();
                        // .. add to stringbuilder
                        processCell (cell, builder);
                    }

                }

        } // for numSheets ..

                //
                return builder.toString();
        }

        /**
         * Add the cells's content to the stringbuilder (if appropiate
content, i.e. text - no numbers)
         */
        private void processCell (HSSFCell cell, StringBuilder builder)
        {
        switch ( cell.getCellType() )
        {
        /*
            case HSSFCell.CELL_TYPE_NUMERIC:
                System.out.println( cell.getNumericCellValue() );
                break;
        */
            case HSSFCell.CELL_TYPE_STRING:
                builder.append (cell.getStringCellValue());
                builder.append (" ");
                break;

            default:
                break;
        }
        }

}


- Johannes


-----Ursprüngliche Nachricht-----
Von: Michael J. Prichard [mailto:[EMAIL PROTECTED]
Gesendet: Montag, 31. Juli 2006 15:36
An: POI Users List
Betreff: Re: Extract Text From Excel

Hey Feris,

That [HSSF] is what I use as well and it works pretty good.

-Michael

Suba Suresh wrote:

> You can use the hssf libraries for excel text extraction. I used it
> for lucene indexing.
>
> suba suresh.
>
> Feris Thia wrote:
>
>> Hi All,
>>
>> I'm new to this user group. Is there any way to extract all the text
>> from
>> Excel documents ? Want to perform indexing using POI + Lucene :)
>>
>> Thanks,
>>
>> Feris
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


Reply via email to