> I looked at autodetection, as I see it works properly > (the part, where the String is checked for including symbols with code > > 255), > but the problem with Russian, that Russian encodings use codes with code > > 127, so this autocheck do not detect the code. > In CP1251 russian character has codes range (197 - 255), in CP886 the codes > range is not solid and starts with 128 code. >
ouch. I think this means that autodetection may in fact be useless as well as inefficient. Perhaps its time to kill it. Thoughts Glen? (Since we can't very well do autodetection with 127 or we'll cut off 8-bit unicode characters as well). > With cells encoding it is my misreading of the manual and code. Sorry. > I have founded that setting encoding for the cells works pretty well: > hssfCell.setEncoding( HSSFCell.ENCODING_UTF_16 ); > hssfCell.setCellValue( cellValue ); > And here it is! > Thank you for the help. :) > Great! Well I forgot entirely about this. For a quick history, IIRC I wanted this API as above, Marc thought it should/could be automatic and that it would be less prone to error if autodetected, so he incorporated autodetection into SSTRecord. I said that was fine, provided the old API as I wanted it could still work even if it was kinda redundant. He gave me that one. I forgot about it entirely and so it lay undocumented until now. (oops). > > > Can you supply a patch which: > > 1. allows you to add strings and define them as unicode > > 2. maintains your abillity to encode in non-unicode. > > So this kind of path is not needed - nothing to patch. > Except I (or whomever beats me to it) need to document this better. The Javadoc is way inadequate. This has replaced logging as the most frequent question. (next to maybe "when will feature x be done" to which I always reply "when you do it" ;-) ) > > > I believe so. We don't yet support 16-bit unicode strings for sheet > > names. If someone supplies a patch for this > > I will gladly apply it. > > I tried to do it, but with no results, yet. :( I was confused understanding > how the strings are serialized and deserialized. Could you tell or point me > at the place (or chunk of code) where it is described and coded. > Sure. the sheet name is in in org.apache.poi.model.Workbook under setSheetName, then org.apache.poi.records.BoundSheetRecord. As for how strings are serialized and deserialized! HA there are SEVERAL convoluted and peculiar ways that Excel uses to do this. (Consistency is NOT the Microsoft way). So I'll answer your question only as it applies to BoundSheetRecord and leave the explanation for all the other places for another day ;-). If you look in org.apache.poi.hssf.records.BoundSheetRecord you see this function: (http://jakarta.apache.org/poi/javadocs/javasrc/org/apache/poi/hssf/record/BoundSheetRecord_java.html#BoundSheetRecord) 262 public int serialize(int offset, byte [] data) 263 { 264 LittleEndian.putShort(data, 0 + offset, sid); 265 LittleEndian.putShort(data, 2 + offset, 266 ( short ) (0x08 + getSheetnameLength())); 267 LittleEndian.putInt(data, 4 + offset, getPositionOfBof()); 268 LittleEndian.putShort(data, 8 + offset, getOptionFlags()); 269 data[ 10 + offset ] = getSheetnameLength(); 270 data[ 11 + offset ] = getCompressedUnicodeFlag(); 271 272 // we assume compressed unicode (bein the dern americans we are ;-p) 273 StringUtil.putCompressedUnicode(getSheetname(), data, 12 + offset); 274 return getRecordSize(); 275 } Notice the comment - har har. Actually, at the time (before POI 1.0) Marc and I made the design decision to address Unicode later because it was so inconsistent throughout Excel, so thats what that means (blush). According to page 291 of the Excel 97 Developer's Kit, at offset 10 (including the 4 byte header) there is a 2 byte string "length" integer for the following (offset 12) Sheet name. However I read somewhere that this is an error. Therefore we have the "field_4_compressed_unicode_flag" which should be set to 0 for 8-bit or non-unicode (which is the default) or 1 for 16-bit unicode. I assume from there (but I'm not positive) that the sheet name could just be stored via the string util function. I could be wrong though. > Sure, I found it while looking at the jUnit reports, here it is from the > jUnit testcases (building info): > Running org.apache.poi.hssf.record.TestSSTRecord > Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 5,879 sec > TEST org.apache.poi.hssf.record.TestSSTRecord FAILED > could you perhaps do a ./build.sh site and then browse at build/docs and find the junit test results pages? You should be able to drill down into which test failed. My problem is that the tests are succeeding for me. This could be a Global/localization issue. (meaning running on a system with Russian language settings might have an error that running on Linux box with English settings does not due to language defaults/etc). If we can narrow it down to which TestSSTRecord test failed (there are 7 of them, I suspect its the rich text one: http://jakarta.apache.org/poi/tests/junit/org/apache/poi/hssf/record/TestSSTRecord.html) > > PS: while building I encoutered the build error: > in org.apache.poi.hssf.contrib.view.SViewerPanel > line 87: SVTableCellRenderer rnd = new SVTableCellRenderer(wb); > but the class org.apache.poi.hssf.contrib.view.SVTableCellRenderer has only > one constructor: > public SVTableCellRenderer(HSSFWorkbook wb, HSSFSheet st) > {...} > and no public SVTableCellRenderer( HSSFWorkbook wb ) {...} > so build failed. :( > Ooops... CVS got me again. I swear it said that commit succeed. Thanks. Its fixed now. :-) > -- http://www.superlinksoftware.com - software solutions for business http://jakarta.apache.org/poi - Excel/Word/OLE 2 Compound Document in Java http://krysalis.sourceforge.net/centipede - the best build/project structure a guy/gal could have! - Make Ant simple on complex Projects! The avalanche has already started. It is too late for the pebbles to vote. -Ambassador Kosh -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
