On Mon, 2002-07-08 at 11:03, ������ ������� wrote:
> Hello!
> 
> While I was studying the String functions and Excel structure, Patrick has
> already done the patch. :)
> Thank you, Patrick, but I have something to say, it seems to me that there
> is a way to make it better.
> 

hehe...  Have at it man.  You'll be the fourth to try. I think its
better than it was.

> For reading it is good. I have test it, it reads Russian also. But the code
> is too obfuscating for me. :( Using SSTDeserialiser, BinaryTree. Somehow I
> understood these the offsets, but it was not from the first time.
> I tried to simplify the process, but failed, so after the attemptes I have
> the question:

This doesn't have anything to do with the sheet name.  The sheet name is
stored in the BoundSheetRecord.  SSTRecord is for LabelSSTRecords.  The
code is complex because the record is stored *that* complex.  If you
think you can take a crack at simplifying it I'd be greatful.  The
easier that code is, the less bugs we have as most *serious* bugs for
HSSF have been there.  The $15k per processor alternative didn't even
attempt this record (they just right the old style Excel 95 Label
records)

> How is StringUtil.getFromUnicode(...) working?
> ( I tried to simply use this function ). I want it to be such easy, but it
> was not this way. %)
> 

I believe it is.  The problem is that the BoundSheet record has the flag
in it already.  Maybe you'll find it doesn't match the BoundSheetRecord
at all, this wouldn't be that surprising (Excel likes storing strings in
new and convoluted ways).

> 
> What do you think about idea to get rid of autodetecting coding, I suggest
> the way:
> 
> To the BoundSheetRecord change:
>     public void setSheetname( String sheetname )
>     {
>         field_5_sheetname = sheetname;
>     }
> 
> With the purpose to set all the features in the Workbook:
> 
> To the Workbook add
>     public void setSheetName(int sheetnum, String sheetname, short
> encoding ) {
>         checkSheets(sheetnum);
> 
>         BoundSheetRecord sheet =
> (BoundSheetRecord)boundsheets.get( sheetnum );
>         sheet.setSheetname(sheetname);
>         sheet.setSheetnameLength( (byte)sheetname.length() );
>         sheet.setCompressedUnicodeFlag( (byte)encoding );
>     }
> 
> And make it available to the user to do it manually, so add to the
> HSSFWorkbook:
>  public final static byte ENCODING_COMPRESSED_UNICODE = 0;
>  public final static byte ENCODING_UTF_16                                 =
> 1;
> 
>     public void setSheetName(int sheet, String name)
>     {
>         workbook.setSheetName( sheet, name, ENCODING_COMPRESSED_UNICODE );
>     }
>     public void setSheetName( int sheet, String name, short encoding )
>     {
>         if (sheet > (sheets.size() - 1))
>         {
>             throw new RuntimeException("Sheet out of bounds");
>         }
> 
>         switch ( encoding ) {
>         case ENCODING_COMPRESSED_UNICODE:
>         case ENCODING_UTF_16:
>             break;
> 
>         default:
>            throw new RuntimeException( "Unsupported encoding" );
>         }
>         workbook.setSheetName( sheet, name, encoding );
>     }
> 
> 
> User's example may be like this:
>         hssfWorkbook.setSheetName(0, "�������� ������",
> HSSFWorkbook.ENCODING_UTF_16 );
>     or
>         hssfWorkbook.setSheetName(0, "HSSF Test",
> HSSFWorkbook.ENCODING_COMPRESSED_UNICODE );
> 
> 

get it working, submit a patch, I shall apply it.  

Instructions for submitting patches are here:
http://jakarta.apache.org/poi/getinvolved/index.html

> 
> 
> The most unpleasent thing, that saving is not working at my side.
> The MS Excel on the opening  such a file tells that the name is incorrect
> and fixes the error. :(
> 
> So simply StringUtil.putUncompressedUnicode(getSheetname(), data, 12 +
> offset); seems not to working.
> I tried:
>     public int serialize(int offset, byte [] data)
>     {
>         LittleEndian.putShort(data, 0 + offset, sid);
>         LittleEndian.putShort( data, 2 + offset,
>                 (short)( 0x08 + getSheetnameLength() ) );
>         LittleEndian.putInt(data, 4 + offset, getPositionOfBof());
>         LittleEndian.putShort(data, 8 + offset, getOptionFlags());
>         /*
>         data[ 10 + offset ] = getSheetnameLength();
>         data[ 11 + offset ] = getCompressedUnicodeFlag();
>         */
>         UnicodeString name = new UnicodeString();
>         name.setOptionFlags( (byte)( field_4_compressed_unicode_flag &
> 0x01 ) );
>         name.setString( getSheetname() );
>         System.arraycopy( name.serialize(), 0, data, 10 + offset,
> name.getRecordSize() );
> 
>         return getRecordSize();
>     }
> But it is not working too. :(
> By the way what is the 0x08 in the expression "0x08 + getSheetnameLength()"
> ?

4 byte for the header (sid + size) + 4 bytes for the fields.  That looks
like a miscalculation for me (it looks like it should be 0xA +
getSheetnameLength() -- meaning 4 bytes for the header, 6 bytes for the
other fields + the length of the string.  That is unless the flag is
part of the string (which may be the case).  I don't recall this record
very well which probably means its from November 2001.  

My recommendation:  

look here http://sc.openoffice.org/excelfileformat.pdf on page 48.  

Make the record match that as close as possible.  Modify the toString to
include a hexdump of the record (see org.apache.poi.util.HexDump). Use
org.apache.poi.hssf.dev.BiffViewer MyFileWithRussianSheetName.xls. 
Recreate the sheet with HSSF.  Run org.apache.poi.hssf.dev.BiffViewer
MyRecreatedSheet.xls -- use the unix diff command on the output. 
Correct the differences.  Submit a patch.

> 
> 
> Also what is the 0x3 in the class UnicodeString?
>     public int serialize(int offset, byte [] data)
>     {
>         .....
>                 StringUtil.putCompressedUnicode(getString(), data, 0x3 +
>                                                 offset);
>             }
>             else
>             {
>                 StringUtil.putUncompressedUnicode(getString(), data,
>                                                   0x3 + offset);
>         .....
>     }
> 
> 
> And while reading excel specification the length of the Unicode String may
> be 1 or 2.
> How to detect, when it is 1 and when it is 2?
> 
> What is your opinion?
> 

The above is serializing compressed or uncompressed unicode via the
StringUtil with a 3 byte header (offset is = location in the file + 3
more bytes, then start the string).  I'll bet that there is other code
that writes stuff into the header.  

Glen, Marc?  I've not been spelunking in String code in a good long
time.  (Can you believe that this project is like a whole year old or so
already???).  Is there anything you can add?

Sorry if I'm not much help... Its been a good while since I looked
there.  

-Andy

> 
> Sincerely yours, Sergei Kozello.
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
> 
-- 
http://www.superlinksoftware.com - software solutions for business
http://jakarta.apache.org/poi - Excel/Word/OLE 2 Compound Document in
Java                            
http://krysalis.sourceforge.net/centipede - the best build/project
structure
                    a guy/gal could have! - Make Ant simple on complex Projects!
The avalanche has already started. It is too late for the pebbles to
vote.
-Ambassador Kosh


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

  • ... Сергей Козелло
    • Andrew C. Oliver

Reply via email to