Re: International (Russian) support: continue

Andrew C. Oliver Fri, 05 Jul 2002 12:55:50 -0700


> I looked at autodetection, as I see it works properly
> (the part, where the String is checked for including symbols with code >
> 255),
> but the problem with Russian, that Russian encodings use codes with code >
> 127, so this autocheck do not detect the code.
> In CP1251 russian character has codes range (197 - 255), in CP886 the codes
> range is not solid and starts with 128 code.
>


ouch.  I think this means that autodetection may in fact be useless as
well as inefficient.  Perhaps its time to kill it.  Thoughts Glen?
(Since we can't very well do autodetection with 127 or we'll cut off
8-bit unicode characters as well).

> With cells encoding it is my misreading of the manual and code. Sorry.
> I have founded that setting encoding for the cells works pretty well:
>                 hssfCell.setEncoding( HSSFCell.ENCODING_UTF_16 );
>                 hssfCell.setCellValue( cellValue );
> And here it is!
> Thank you for the help. :)
> 

Great!  Well I forgot entirely about this.  For a quick history, IIRC I
wanted this API as above, Marc thought it should/could be automatic and
that it would be less prone to error if autodetected, so he incorporated
autodetection into SSTRecord.  I said that was fine, provided the old
API as I wanted it could still work even if it was kinda redundant.  He
gave me that one.  I forgot about it entirely and so it lay undocumented
until now.  (oops).

> 
> > Can you supply a patch which:
> >  1. allows you to add strings and define them as unicode
> >  2. maintains your abillity to encode in non-unicode.
> 
> So this kind of path is not needed - nothing to patch.
> 

Except I (or whomever beats me to it) need to document this better.  The
Javadoc is way inadequate.  This has replaced logging as the most
frequent question.  (next to maybe "when will feature x be done" to
which I always reply "when you do it" ;-) )

> 
> > I believe so.  We don't yet support 16-bit unicode strings for sheet
> > names.  If someone supplies a patch for this
> > I will gladly apply it.
> 
> I tried to do it, but with no results, yet. :( I was confused understanding
> how the strings are serialized and deserialized. Could you tell or point me
> at the place (or chunk of code) where it is described and coded.
> 

Sure.  the sheet name is in in org.apache.poi.model.Workbook under
setSheetName, then org.apache.poi.records.BoundSheetRecord.

As for how strings are serialized and deserialized!  HA there are
SEVERAL convoluted and peculiar ways that Excel uses to do this. 
(Consistency is NOT the Microsoft way).  So I'll answer your question
only as it applies to BoundSheetRecord and leave the explanation for all
the other places for another day ;-).

If you look in org.apache.poi.hssf.records.BoundSheetRecord you see this
function:

(http://jakarta.apache.org/poi/javadocs/javasrc/org/apache/poi/hssf/record/BoundSheetRecord_java.html#BoundSheetRecord)

262      public int serialize(int offset, byte [] data)
263      {
264          LittleEndian.putShort(data, 0 + offset, sid);
265          LittleEndian.putShort(data, 2 + offset,
266                                ( short ) (0x08 +
getSheetnameLength()));
267          LittleEndian.putInt(data, 4 + offset, getPositionOfBof());
268          LittleEndian.putShort(data, 8 + offset, getOptionFlags());
269          data[ 10 + offset ] = getSheetnameLength();
270          data[ 11 + offset ] = getCompressedUnicodeFlag();
271  
272          // we assume compressed unicode (bein the dern americans we
are ;-p)
273          StringUtil.putCompressedUnicode(getSheetname(), data, 12 +
offset);
274          return getRecordSize();
275      }

Notice the comment - har har.  Actually, at the time (before POI 1.0)
Marc and I made the design decision to address Unicode later because it
was so inconsistent throughout Excel, so thats what that means (blush).

According to page 291 of the Excel 97 Developer's Kit, at offset 10
(including the 4 byte header) there is a 2 byte string "length" integer
for the following (offset 12) Sheet name.  However I read somewhere that
this is an error.  Therefore we have the
"field_4_compressed_unicode_flag" which should be set to 0 for 8-bit or
non-unicode (which is the default) or 1 for 16-bit unicode.  I assume
from there (but I'm not positive) that the sheet name could just be
stored via the string util function.

I could be wrong though.   


> Sure, I found it while looking at the jUnit reports, here it is from the
> jUnit testcases (building info):
> Running org.apache.poi.hssf.record.TestSSTRecord
> Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 5,879 sec
> TEST org.apache.poi.hssf.record.TestSSTRecord FAILED
> 

could you perhaps do a 

./build.sh site

and then browse at build/docs

and find the junit test results pages?

You should be able to drill down into which test failed.  My problem is
that the tests are succeeding for me.  This could be a
Global/localization issue.  (meaning running on a system with Russian
language settings might have an error that running on Linux box with
English settings does not due to language defaults/etc).  If we can
narrow it down to which TestSSTRecord test failed (there are 7 of them,
I suspect its the rich text one:
http://jakarta.apache.org/poi/tests/junit/org/apache/poi/hssf/record/TestSSTRecord.html)
  

> 
> PS: while building I encoutered the build error:
> in org.apache.poi.hssf.contrib.view.SViewerPanel
> line 87:     SVTableCellRenderer rnd = new SVTableCellRenderer(wb);
> but the class org.apache.poi.hssf.contrib.view.SVTableCellRenderer has only
> one constructor:
>                public SVTableCellRenderer(HSSFWorkbook wb, HSSFSheet st)
> {...}
> and no     public SVTableCellRenderer( HSSFWorkbook wb ) {...}
> so build failed. :(
> 

Ooops... CVS got me again.  I swear it said that commit succeed. 
Thanks.  Its fixed now. :-)


> 
-- 
http://www.superlinksoftware.com - software solutions for business
http://jakarta.apache.org/poi - Excel/Word/OLE 2 Compound Document in
Java                            
http://krysalis.sourceforge.net/centipede - the best build/project
structure
                    a guy/gal could have! - Make Ant simple on complex Projects!
The avalanche has already started. It is too late for the pebbles to
vote.
-Ambassador Kosh


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: International (Russian) support: continue

Reply via email to