Nice to meet you! > > I looked at autodetection, as I see it works properly > > (the part, where the String is checked for including symbols with code > > > 255), > > but the problem with Russian, that Russian encodings use codes with code > > > 127, so this autocheck do not detect the code. > > In CP1251 russian character has codes range (197 - 255), in CP886 the codes > > range is not solid and starts with 128 code. > > > > ouch. I think this means that autodetection may in fact be useless as > well as inefficient. Perhaps its time to kill it. Thoughts Glen? > (Since we can't very well do autodetection with 127 or we'll cut off > 8-bit unicode characters as well).
It good idea and it is makes the point: user can choose where to use Unicode himeself. > > With cells encoding it is my misreading of the manual and code. Sorry. > > I have founded that setting encoding for the cells works pretty well: > > hssfCell.setEncoding( HSSFCell.ENCODING_UTF_16 ); > > hssfCell.setCellValue( cellValue ); > > And here it is! > > Thank you for the help. :) > > > > Great! Well I forgot entirely about this. For a quick history, IIRC I > wanted this API as above, Marc thought it should/could be automatic and > that it would be less prone to error if autodetected, so he incorporated > autodetection into SSTRecord. I said that was fine, provided the old > API as I wanted it could still work even if it was kinda redundant. He > gave me that one. I forgot about it entirely and so it lay undocumented > until now. (oops). > Except I (or whomever beats me to it) need to document this better. The > Javadoc is way inadequate. This has replaced logging as the most > frequent question. (next to maybe "when will feature x be done" to > which I always reply "when you do it" ;-) ) As an idea to put it in the second HOWTO example. I think it will be comfortable. > > > I believe so. We don't yet support 16-bit unicode strings for sheet > > > names. If someone supplies a patch for this > > > I will gladly apply it. > > > > I tried to do it, but with no results, yet. :( I was confused understanding > > how the strings are serialized and deserialized. Could you tell or point me > > at the place (or chunk of code) where it is described and coded. > > > > Sure. the sheet name is in in org.apache.poi.model.Workbook under > setSheetName, then org.apache.poi.records.BoundSheetRecord. > > As for how strings are serialized and deserialized! HA there are > SEVERAL convoluted and peculiar ways that Excel uses to do this. > (Consistency is NOT the Microsoft way). So I'll answer your question > only as it applies to BoundSheetRecord and leave the explanation for all > the other places for another day ;-). > If you look in org.apache.poi.hssf.records.BoundSheetRecord you see this > function: > > (http://jakarta.apache.org/poi/javadocs/javasrc/org/apache/poi/hssf/record/B oundSheetRecord_java.html#BoundSheetRecord) > > 262 public int serialize(int offset, byte [] data) > 263 { > 264 LittleEndian.putShort(data, 0 + offset, sid); > 265 LittleEndian.putShort(data, 2 + offset, > 266 ( short ) (0x08 + > getSheetnameLength())); > 267 LittleEndian.putInt(data, 4 + offset, getPositionOfBof()); > 268 LittleEndian.putShort(data, 8 + offset, getOptionFlags()); > 269 data[ 10 + offset ] = getSheetnameLength(); > 270 data[ 11 + offset ] = getCompressedUnicodeFlag(); > 271 > 272 // we assume compressed unicode (bein the dern americans we > are ;-p) > 273 StringUtil.putCompressedUnicode(getSheetname(), data, 12 + > offset); > 274 return getRecordSize(); > 275 } > > Notice the comment - har har. Actually, at the time (before POI 1.0) > Marc and I made the design decision to address Unicode later because it > was so inconsistent throughout Excel, so thats what that means (blush). > > According to page 291 of the Excel 97 Developer's Kit, at offset 10 > (including the 4 byte header) there is a 2 byte string "length" integer > for the following (offset 12) Sheet name. However I read somewhere that > this is an error. Therefore we have the > "field_4_compressed_unicode_flag" which should be set to 0 for 8-bit or > non-unicode (which is the default) or 1 for 16-bit unicode. I assume > from there (but I'm not positive) that the sheet name could just be > stored via the string util function. > > I could be wrong though. Thank you, it makes clear, I have wrote some my thoughts in the previous letter. Now I have some more to tell: first of all, it made to me quite many code signths and sudies to advance in the direction of the understanding. %) Now it is more understandable, but I don't feal the interactions between different Records, the workflow. > 264 LittleEndian.putShort(data, 0 + offset, sid); > 265 LittleEndian.putShort(data, 2 + offset, > 266 ( short ) (0x08 + getSheetnameLength())); > 267 LittleEndian.putInt(data, 4 + offset, getPositionOfBof()); > 268 LittleEndian.putShort(data, 8 + offset, getOptionFlags()); > 269 data[ 10 + offset ] = getSheetnameLength(); > 270 data[ 11 + offset ] = getCompressedUnicodeFlag(); > 271 > 272 // we assume compressed unicode (bein the dern americans we are ;-p) > 273 StringUtil.putCompressedUnicode(getSheetname(), data, 12 + offset ); And about Consistency, for example: As far as I has understood the first and second words are the inner ones. Am I right? (From the excel specs it starts from the BOF). ( and what is 0x008? :) Then dword and word are for the Bundle Sheet. It is OK. The last two bytes are define UnicodeString BIFF8 (or simple String BIFF7) properties. And after the String content comes. I feel unconsistency as the last two properies bites are described in the UnicodeString class as well as here (BoundSheetRecord). Quite confusing. The next point that stoped me for a peroid of time is StringUtil.putCompressedUnicode(getSheetname(), data, 12 + offset); in this class and StringUtil.putCompressedUnicode(unicodeString, data, 0x3 + offset); in the UnicodeString class. (0x3 - please, explain ) Also I do not understand the workflow: try { String unicodeString = new String(getString().getBytes("Unicode"),"Unicode"); if (getOptionFlags() == 0) { StringUtil.putCompressedUnicode(unicodeString, data, 0x3 + offset); } else { StringUtil.putUncompressedUnicode(unicodeString, data, 0x3 + offset); } } catch (Exception e) { if (getOptionFlags() == 0) { StringUtil.putCompressedUnicode(getString(), data, 0x3 + offset); } else { StringUtil.putUncompressedUnicode(getString(), data, 0x3 + offset); } } What for the encoding, I thought that it is the same 'A' decoded from Unicode and encoded to Unicode still 'A'. What it is for? > > Sure, I found it while looking at the jUnit reports, here it is from the > > jUnit testcases (building info): > > Running org.apache.poi.hssf.record.TestSSTRecord > > Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 5,879 sec > > TEST org.apache.poi.hssf.record.TestSSTRecord FAILED > > > > could you perhaps do a > > ./build.sh site > > and then browse at build/docs > > and find the junit test results pages? > > You should be able to drill down into which test failed. My problem is > that the tests are succeeding for me. This could be a > Global/localization issue. (meaning running on a system with Russian > language settings might have an error that running on Linux box with > English settings does not due to language defaults/etc). If we can > narrow it down to which TestSSTRecord test failed (there are 7 of them, > I suspect its the rich text one: > http://jakarta.apache.org/poi/tests/junit/org/apache/poi/hssf/record/TestSST Record.html) > I will attach the xml file with the test results. in the direct letter. I will also ZIP it. ( it is testcase name="testProcessContinueRecord" ) As for my platform: it is MS Win2K Pro, JDK 1.4.0, Excel XP. :) Bye! -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
