Re: There has got to be a way

2007-05-10 Thread Nick Burch

On Tue, 1 May 2007, Levi Strope wrote:
All I need to do is read in a row of data, just like the spreadsheet 
that is attached, and parse the information reliably.  There are many 
blank cells here, but I need to account for them.


I've got some code that might help here. Instead of registering your 
HSSFListener directly, you create a MissingRecordAwareHSSFListener, 
register that, and tell it to call your HSSFListener.


As it goes through, it will fire off dummy records when it notices that 
your final has skipped a row or a column (or a few), and another dummy 
record after the last cell of a given row.



I'll be adding MissingRecordAwareHSSFListener and friends to svn after we 
do the 3.0 final release, and once I've finished writing some proper tests 
for it. I can send you a copy if it'd be of use though.


Nick

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/



RE: There has got to be a way

2007-05-10 Thread Levi Strope
That would be of GREAT use. 

BTW:  what do you think the ETA is for that?   

-Original Message-
From: Nick Burch [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 10, 2007 6:05 AM
To: POI Users List
Subject: Re: There has got to be a way

On Tue, 1 May 2007, Levi Strope wrote:
 All I need to do is read in a row of data, just like the spreadsheet 
 that is attached, and parse the information reliably.  There are many 
 blank cells here, but I need to account for them.

I've got some code that might help here. Instead of registering your
HSSFListener directly, you create a MissingRecordAwareHSSFListener,
register that, and tell it to call your HSSFListener.

As it goes through, it will fire off dummy records when it notices that
your final has skipped a row or a column (or a few), and another dummy
record after the last cell of a given row.


I'll be adding MissingRecordAwareHSSFListener and friends to svn after
we do the 3.0 final release, and once I've finished writing some proper
tests for it. I can send you a copy if it'd be of use though.

Nick

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/



RE: There has got to be a way

2007-05-10 Thread Nick Burch

On Thu, 10 May 2007, Levi Strope wrote:

That would be of GREAT use.


I'll send you an off-list mail with the files. Note that it has only been 
tested on a fairly simple file, so there might be issues on fancier ones.



BTW:  what do you think the ETA is for that?


For 3.0 Final? We're having the release vote now. It'll be a weeks time if 
the vote passes, longer if not.


Nick

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List: http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/



RE: There has got to be a way

2007-05-02 Thread David Henry
Levi,
 
I haven't dug through all your code and spreadsheet, but have you
considered what the definition of blank is?  In many similar
circumstances I have found that relying upon isBlank, for example, may
not be sufficient if you have a mix of true blank (i.e. null value)
cells and cells with zero length data (i.e. looks blank but, by
Excel's reckoning contains a value).  Sometimes something as simple as
tabbing to the cell may be sufficient to change a blank to a
zero-length value.  You may have already taken this into consideration,
but thought I'd mention it as a first check.
 
- David



From: Levi Strope [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 01, 2007 10:11 PM
To: POI Users List
Subject: There has got to be a way


Please look at the attached excel spreadsheet.  I would really
appreciate some input.
 
All I need to do is read in a row of data, just like the spreadsheet
that is attached, and parse the information reliably.  There are many
blank cells here, but I need to account for them.  
 
My business problem is this:  We are accepting messages where columns of
information are not required, but the column MUST be represented in the
file as we process it.  At minimum I'd like to be able to read in these
excel files and output them in a CSV format.  Essentially as I am
streaming the excel file out, I am appending comma's between the values
because our messaging system expects the values to be comma delimited.
Unfortunately we accpet XLS files so we cannot just ask them to save it
as a CSV.  This can happen in 1 of 2 ways, I can stream the file and
append the commas between the cells as I am sending it to our messaging
system, or I can stream it out to a file and build an actual CSV, and
then tell the messaging system to pick it up.
 
When I read in an excel file I am finding it very difficult to determine
when cells are blank, and account for them.  For some reason the
BlankRecordListener doesn't pickup all blank records.  I've even tried
applying formatting to all cells and running my code, it still does not
work.
 
I put code in the blank record listener code to tell me the column.
Here is what I get:
 
Blank record encounterred.  Column= 3
Blank record encounterred.  Column= 4
Blank record encounterred.  Column= 7
Blank record encounterred.  Column= 8
Blank record encounterred.  Column= 10
Blank record encounterred.  Column= 18
Blank record encounterred.  Column= 20
Blank record encounterred.  Column= 30
Blank record encounterred.  Column= 34

There is no conditional formatting in this code, all I'm doing is
listening for a blank record and doing a System.out. If you look at the
attached spreadsheet you can see that columns 21, 22, 24, 32, and 33 are
blank as well.  There are 14 blank records in this spreadsheet but the
blankrecord listener only reports 9. 
 
I need help.  I need someone to tell me what kind of record listener I
need to use.  How can I account for these cells?
 
Here is my code.  At present it skips the blankcells when writing it out
to a CSV.   
 
/*
 *
 * Created on April 5, 2007, 11:49 AM
 *
 *
 *
 * @author lstrope
 */
 
package poitest;
 
import java.io.*;
 
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hssf.eventusermodel.*;
import org.apache.poi.hssf.record.*;
 
public class PoiXLStest implements HSSFListener {
private SSTRecord sstrec;
int rowRecLen[] = new int[RowRecord.MAX_ROW_NUMBER];
PrintWriter CSV = null;
   // int rowNum = 0;
int colNum = 0;
 
public PoiXLStest()
{
this(c:\\ExceltoCSV.csv);
}

public PoiXLStest(String F)
{
try{
CSV = new PrintWriter(new BufferedWriter((new
FileWriter(F;
}
catch(FileNotFoundException E){}
catch(IOException E){}
}

public PoiXLStest(InputStream in)
{   
PoiXLStest noargs = new PoiXLStest();
HSSFRequest req = new HSSFRequest();
//req.addListener(noargs, SSTRecord.sid);
//req.addListener(noargs, LabelSSTRecord.sid);
//req.addListener(noargs, RowRecord.sid);
//req.addListener(noargs, NumberRecord.sid);
//req.addListener(noargs, BlankRecord.sid);
req.addListenerForAllRecords(noargs);
HSSFEventFactory factory = new HSSFEventFactory();
   
try{
factory.processEvents(req, in);
}
catch(IOException E){
System.out.println(Problem in constructor);
}
}

public void processRecord(Record record)  
{
short sidVal;
sidVal = record.getSid();
 
if(sidVal == RowRecord.sid){
RowRecord rowRec = (RowRecord) record;
if(rowRec.getRecordSize()  0){
rowRecLen[rowRec.getRowNumber()] = rowRec.getLastCol();
// Setting array to hold the row at it's physical position with its
Column length.
//System.out.println(rowRec.getLastCol

RE: There has got to be a way

2007-05-02 Thread Levi Strope
Forgive me for not clarifying earlier, this is using HSSF.

To answer your question Dave, Yes - I have taken that into
consideration.  I'm not saying my code is correct in looking for the
blank cells with the BlankRecord listener, but according to my limited
knowledge and the javadoc a blankrecord is any record without data that
contains styling information.  That's why I'm asking for help - I would
love it if the answer is just to use a different type of listener, but I
need someone to tell me what that would be.

So here is what I did last night:
1) I went through each and every blank cell in the row and applied a
default styling.  After running my code again it returned 1 less blank
record than it did before!  I would have expected it to return more.
2) This got me to thinking that I was completely backwards, so I went in
and removed formatting from the blank cells and ran my test again.  This
time it returned the same number of blank records, only it reported
these blank records on different columns!  

I've tried adding a MulBlank record listener - and that doesn't detect
any of my cells where there might be 3 blanks in a row.  I'm not sure
what the criteria is for this but according to the javadoc I believe it
should return at least 2 mulblank records with the excel file I
provided.  According the the javadoc it says that all MulBlank records
are converted into individual blankrecords.  I'm starting to believe
there may be something wrong with this conversion.  The reason why I
believe this is in the 2 tests that I noted above, the inconsistencies
in the columns that were reported as blanks occurred where there might
be more than 1 blank column side by side. It's almost as if HSSF sees
these columns as the same column and is reporting them as 1 blank record
when it should be more than one.

-Levi  

-Original Message-
From: David Henry [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 02, 2007 10:18 AM
To: POI Users List
Subject: RE: There has got to be a way

Levi,
 
I haven't dug through all your code and spreadsheet, but have you
considered what the definition of blank is?  In many similar
circumstances I have found that relying upon isBlank, for example, may
not be sufficient if you have a mix of true blank (i.e. null value)
cells and cells with zero length data (i.e. looks blank but, by
Excel's reckoning contains a value).  Sometimes something as simple as
tabbing to the cell may be sufficient to change a blank to a
zero-length value.  You may have already taken this into consideration,
but thought I'd mention it as a first check.
 
- David



From: Levi Strope [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 01, 2007 10:11 PM
To: POI Users List
Subject: There has got to be a way


Please look at the attached excel spreadsheet.  I would really
appreciate some input.
 
All I need to do is read in a row of data, just like the spreadsheet
that is attached, and parse the information reliably.  There are many
blank cells here, but I need to account for them.  
 
My business problem is this:  We are accepting messages where columns of
information are not required, but the column MUST be represented in the
file as we process it.  At minimum I'd like to be able to read in these
excel files and output them in a CSV format.  Essentially as I am
streaming the excel file out, I am appending comma's between the values
because our messaging system expects the values to be comma delimited.
Unfortunately we accpet XLS files so we cannot just ask them to save it
as a CSV.  This can happen in 1 of 2 ways, I can stream the file and
append the commas between the cells as I am sending it to our messaging
system, or I can stream it out to a file and build an actual CSV, and
then tell the messaging system to pick it up.
 
When I read in an excel file I am finding it very difficult to determine
when cells are blank, and account for them.  For some reason the
BlankRecordListener doesn't pickup all blank records.  I've even tried
applying formatting to all cells and running my code, it still does not
work.
 
I put code in the blank record listener code to tell me the column.
Here is what I get:
 
Blank record encounterred.  Column= 3
Blank record encounterred.  Column= 4
Blank record encounterred.  Column= 7
Blank record encounterred.  Column= 8
Blank record encounterred.  Column= 10
Blank record encounterred.  Column= 18
Blank record encounterred.  Column= 20
Blank record encounterred.  Column= 30
Blank record encounterred.  Column= 34

There is no conditional formatting in this code, all I'm doing is
listening for a blank record and doing a System.out. If you look at the
attached spreadsheet you can see that columns 21, 22, 24, 32, and 33 are
blank as well.  There are 14 blank records in this spreadsheet but the
blankrecord listener only reports 9. 
 
I need help.  I need someone to tell me what kind of record listener I
need to use.  How can I account for these cells?
 
Here is my

Re: There has got to be a way

2007-05-02 Thread Andrew C. Oliver
Another possibility.  Styling was applied via row or colinfo records 
instead

of by blank records.  Try using org.apache.poi.hssf.dev.BiffViewer on your
files.  you can also use it in concert with the unix Diff command  (try 
cygwin if you're on Windows) or your favorite alternative. 


Levi Strope wrote:

Forgive me for not clarifying earlier, this is using HSSF.

To answer your question Dave, Yes - I have taken that into
consideration.  I'm not saying my code is correct in looking for the
blank cells with the BlankRecord listener, but according to my limited
knowledge and the javadoc a blankrecord is any record without data that
contains styling information.  That's why I'm asking for help - I would
love it if the answer is just to use a different type of listener, but I
need someone to tell me what that would be.

So here is what I did last night:
1) I went through each and every blank cell in the row and applied a
default styling.  After running my code again it returned 1 less blank
record than it did before!  I would have expected it to return more.
2) This got me to thinking that I was completely backwards, so I went in
and removed formatting from the blank cells and ran my test again.  This
time it returned the same number of blank records, only it reported
these blank records on different columns!  


I've tried adding a MulBlank record listener - and that doesn't detect
any of my cells where there might be 3 blanks in a row.  I'm not sure
what the criteria is for this but according to the javadoc I believe it
should return at least 2 mulblank records with the excel file I
provided.  According the the javadoc it says that all MulBlank records
are converted into individual blankrecords.  I'm starting to believe
there may be something wrong with this conversion.  The reason why I
believe this is in the 2 tests that I noted above, the inconsistencies
in the columns that were reported as blanks occurred where there might
be more than 1 blank column side by side. It's almost as if HSSF sees
these columns as the same column and is reporting them as 1 blank record
when it should be more than one.

-Levi  


-Original Message-
From: David Henry [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 02, 2007 10:18 AM

To: POI Users List
Subject: RE: There has got to be a way

Levi,
 
I haven't dug through all your code and spreadsheet, but have you

considered what the definition of blank is?  In many similar
circumstances I have found that relying upon isBlank, for example, may
not be sufficient if you have a mix of true blank (i.e. null value)
cells and cells with zero length data (i.e. looks blank but, by
Excel's reckoning contains a value).  Sometimes something as simple as
tabbing to the cell may be sufficient to change a blank to a
zero-length value.  You may have already taken this into consideration,
but thought I'd mention it as a first check.
 
- David




From: Levi Strope [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 01, 2007 10:11 PM
To: POI Users List
Subject: There has got to be a way


Please look at the attached excel spreadsheet.  I would really
appreciate some input.
 
All I need to do is read in a row of data, just like the spreadsheet

that is attached, and parse the information reliably.  There are many
blank cells here, but I need to account for them.  
 
My business problem is this:  We are accepting messages where columns of

information are not required, but the column MUST be represented in the
file as we process it.  At minimum I'd like to be able to read in these
excel files and output them in a CSV format.  Essentially as I am
streaming the excel file out, I am appending comma's between the values
because our messaging system expects the values to be comma delimited.
Unfortunately we accpet XLS files so we cannot just ask them to save it
as a CSV.  This can happen in 1 of 2 ways, I can stream the file and
append the commas between the cells as I am sending it to our messaging
system, or I can stream it out to a file and build an actual CSV, and
then tell the messaging system to pick it up.
 
When I read in an excel file I am finding it very difficult to determine

when cells are blank, and account for them.  For some reason the
BlankRecordListener doesn't pickup all blank records.  I've even tried
applying formatting to all cells and running my code, it still does not
work.
 
I put code in the blank record listener code to tell me the column.

Here is what I get:
 
Blank record encounterred.  Column= 3

Blank record encounterred.  Column= 4
Blank record encounterred.  Column= 7
Blank record encounterred.  Column= 8
Blank record encounterred.  Column= 10
Blank record encounterred.  Column= 18
Blank record encounterred.  Column= 20
Blank record encounterred.  Column= 30
Blank record encounterred.  Column= 34

There is no conditional formatting in this code, all I'm doing is
listening for a blank record and doing a System.out. If you look

RE: There has got to be a way

2007-05-02 Thread Donahue, Michael
Levi -

In an effort to save space, Microsoft chose not to create cells for
blank cells, unless they have been touched in some fashion.  David Henry
mentioned this when he said, 'Sometimes something as simple as tabbing
to the cell may be sufficient to change a blank to a zero-length
value.'  Typically listeners will only fire if they encounter an
existing cell (blank or otherwise), but some of the cells you are
looking for don't even exist because they have never been touched, so
you will never see the listener fire for those cells.  In this case I
would consider setting a default value to all of the fields you want a
value for.  Then when the listener fires it will simply replace the
value with the correct value, but if the listener never fires for a cell
you will have a default value.

It also seams that if Excel recognizes a cell no longer contains data,
Excel may remove the cell the next time it saves your file if it meets
certain criteria, but that's just an observation.

Also, I don't think assigning a default styling to a cell that does not
exist yet has no effect.

I hope this helps.

- Mike

-Original Message-
From: Levi Strope [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 02, 2007 9:47 AM
To: POI Users List
Subject: RE: There has got to be a way

Forgive me for not clarifying earlier, this is using HSSF.

To answer your question Dave, Yes - I have taken that into
consideration.  I'm not saying my code is correct in looking for the
blank cells with the BlankRecord listener, but according to my limited
knowledge and the javadoc a blankrecord is any record without data that
contains styling information.  That's why I'm asking for help - I would
love it if the answer is just to use a different type of listener, but I
need someone to tell me what that would be.

So here is what I did last night:
1) I went through each and every blank cell in the row and applied a
default styling.  After running my code again it returned 1 less blank
record than it did before!  I would have expected it to return more.
2) This got me to thinking that I was completely backwards, so I went in
and removed formatting from the blank cells and ran my test again.  This
time it returned the same number of blank records, only it reported
these blank records on different columns!  

I've tried adding a MulBlank record listener - and that doesn't detect
any of my cells where there might be 3 blanks in a row.  I'm not sure
what the criteria is for this but according to the javadoc I believe it
should return at least 2 mulblank records with the excel file I
provided.  According the the javadoc it says that all MulBlank records
are converted into individual blankrecords.  I'm starting to believe
there may be something wrong with this conversion.  The reason why I
believe this is in the 2 tests that I noted above, the inconsistencies
in the columns that were reported as blanks occurred where there might
be more than 1 blank column side by side. It's almost as if HSSF sees
these columns as the same column and is reporting them as 1 blank record
when it should be more than one.

-Levi  

-Original Message-
From: David Henry [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 02, 2007 10:18 AM
To: POI Users List
Subject: RE: There has got to be a way

Levi,
 
I haven't dug through all your code and spreadsheet, but have you
considered what the definition of blank is?  In many similar
circumstances I have found that relying upon isBlank, for example, may
not be sufficient if you have a mix of true blank (i.e. null value)
cells and cells with zero length data (i.e. looks blank but, by
Excel's reckoning contains a value).  Sometimes something as simple as
tabbing to the cell may be sufficient to change a blank to a
zero-length value.  You may have already taken this into consideration,
but thought I'd mention it as a first check.
 
- David



From: Levi Strope [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 01, 2007 10:11 PM
To: POI Users List
Subject: There has got to be a way


Please look at the attached excel spreadsheet.  I would really
appreciate some input.
 
All I need to do is read in a row of data, just like the spreadsheet
that is attached, and parse the information reliably.  There are many
blank cells here, but I need to account for them.  
 
My business problem is this:  We are accepting messages where columns of
information are not required, but the column MUST be represented in the
file as we process it.  At minimum I'd like to be able to read in these
excel files and output them in a CSV format.  Essentially as I am
streaming the excel file out, I am appending comma's between the values
because our messaging system expects the values to be comma delimited.
Unfortunately we accpet XLS files so we cannot just ask them to save it
as a CSV.  This can happen in 1 of 2 ways, I can stream the file and
append the commas between the cells as I am sending it to our messaging
system, or I

There has got to be a way

2007-05-01 Thread Levi Strope
Please look at the attached excel spreadsheet.  I would really
appreciate some input.
 
All I need to do is read in a row of data, just like the spreadsheet
that is attached, and parse the information reliably.  There are many
blank cells here, but I need to account for them.  
 
My business problem is this:  We are accepting messages where columns of
information are not required, but the column MUST be represented in the
file as we process it.  At minimum I'd like to be able to read in these
excel files and output them in a CSV format.  Essentially as I am
streaming the excel file out, I am appending comma's between the values
because our messaging system expects the values to be comma delimited.
Unfortunately we accpet XLS files so we cannot just ask them to save it
as a CSV.  This can happen in 1 of 2 ways, I can stream the file and
append the commas between the cells as I am sending it to our messaging
system, or I can stream it out to a file and build an actual CSV, and
then tell the messaging system to pick it up.
 
When I read in an excel file I am finding it very difficult to determine
when cells are blank, and account for them.  For some reason the
BlankRecordListener doesn't pickup all blank records.  I've even tried
applying formatting to all cells and running my code, it still does not
work.
 
I put code in the blank record listener code to tell me the column.
Here is what I get:
 
Blank record encounterred.  Column= 3
Blank record encounterred.  Column= 4
Blank record encounterred.  Column= 7
Blank record encounterred.  Column= 8
Blank record encounterred.  Column= 10
Blank record encounterred.  Column= 18
Blank record encounterred.  Column= 20
Blank record encounterred.  Column= 30
Blank record encounterred.  Column= 34

There is no conditional formatting in this code, all I'm doing is
listening for a blank record and doing a System.out. If you look at the
attached spreadsheet you can see that columns 21, 22, 24, 32, and 33 are
blank as well.  There are 14 blank records in this spreadsheet but the
blankrecord listener only reports 9. 
 
I need help.  I need someone to tell me what kind of record listener I
need to use.  How can I account for these cells?
 
Here is my code.  At present it skips the blankcells when writing it out
to a CSV.   
 
/*
 *
 * Created on April 5, 2007, 11:49 AM
 *
 *
 *
 * @author lstrope
 */
 
package poitest;
 
import java.io.*;
 
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hssf.eventusermodel.*;
import org.apache.poi.hssf.record.*;
 
public class PoiXLStest implements HSSFListener {
private SSTRecord sstrec;
int rowRecLen[] = new int[RowRecord.MAX_ROW_NUMBER];
PrintWriter CSV = null;
   // int rowNum = 0;
int colNum = 0;
 
public PoiXLStest()
{
this(c:\\ExceltoCSV.csv);
}

public PoiXLStest(String F)
{
try{
CSV = new PrintWriter(new BufferedWriter((new
FileWriter(F;
}
catch(FileNotFoundException E){}
catch(IOException E){}
}

public PoiXLStest(InputStream in)
{   
PoiXLStest noargs = new PoiXLStest();
HSSFRequest req = new HSSFRequest();
//req.addListener(noargs, SSTRecord.sid);
//req.addListener(noargs, LabelSSTRecord.sid);
//req.addListener(noargs, RowRecord.sid);
//req.addListener(noargs, NumberRecord.sid);
//req.addListener(noargs, BlankRecord.sid);
req.addListenerForAllRecords(noargs);
HSSFEventFactory factory = new HSSFEventFactory();
   
try{
factory.processEvents(req, in);
}
catch(IOException E){
System.out.println(Problem in constructor);
}
}

public void processRecord(Record record)  
{
short sidVal;
sidVal = record.getSid();
 
if(sidVal == RowRecord.sid){
RowRecord rowRec = (RowRecord) record;
if(rowRec.getRecordSize()  0){
rowRecLen[rowRec.getRowNumber()] = rowRec.getLastCol();
// Setting array to hold the row at it's physical position with its
Column length.
//System.out.println(rowRec.getLastCol());
}

}
 
if(sidVal == SSTRecord.sid){
sstrec = (SSTRecord) record;
}
 
if(sidVal == LabelSSTRecord.sid){
LabelSSTRecord lrec = (LabelSSTRecord) record;
if(lrec.getColumn()  (rowRecLen[lrec.getRow()] - 1) 
!(lrec.getColumn()  colNum)){
//System.out.print(lrec.getColumn());
CSV.print(sstrec.getString(lrec.getSSTIndex()) +
,);
colNum++;
}
else if(lrec.getColumn()  colNum){
//System.out.print(lrec.getColumn());
CSV.print(\n +
sstrec.getString(lrec.getSSTIndex()) + ,);
colNum = 1;  
}
else{