On Mon, 2002-07-08 at 12:36, Sergei Kozello wrote:
> Nice to meet you!
>
Thanks. Right back at you.
Lets move this to the poi-dev list. We're probably blowing (confusing)
the minds of a great many people at this point.
> > > I looked at autodetection, as I see it works properly
> > > (the part, where the String is checked for including symbols with code >
> > > 255),
> > > but the problem with Russian, that Russian encodings use codes with code
> >
> > > 127, so this autocheck do not detect the code.
> > > In CP1251 russian character has codes range (197 - 255), in CP886 the
> codes
> > > range is not solid and starts with 128 code.
> > >
> >
> > ouch. I think this means that autodetection may in fact be useless as
> > well as inefficient. Perhaps its time to kill it. Thoughts Glen?
> > (Since we can't very well do autodetection with 127 or we'll cut off
> > 8-bit unicode characters as well).
>
> It good idea and it is makes the point: user can choose where to use Unicode
> himeself.
>
great. Submit a patch, I'll apply it.
>
> > Except I (or whomever beats me to it) need to document this better. The
> > Javadoc is way inadequate. This has replaced logging as the most
> > frequent question. (next to maybe "when will feature x be done" to
> > which I always reply "when you do it" ;-) )
>
> As an idea to put it in the second HOWTO example. I think it will be
> comfortable.
>
great. Submit a patch. I'll probably let glen apply it.
>
> Thank you, it makes clear, I have wrote some my thoughts in the previous
> letter.
> Now I have some more to tell: first of all, it made to me quite many code
> signths and sudies to advance in the direction of the understanding. %)
> Now it is more understandable, but I don't feal the interactions between
> different Records, the workflow.
>
> > 264 LittleEndian.putShort(data, 0 + offset, sid);
> > 265 LittleEndian.putShort(data, 2 + offset,
> > 266 ( short ) (0x08 +
> getSheetnameLength()));
> > 267 LittleEndian.putInt(data, 4 + offset, getPositionOfBof());
> > 268 LittleEndian.putShort(data, 8 + offset, getOptionFlags());
> > 269 data[ 10 + offset ] = getSheetnameLength();
> > 270 data[ 11 + offset ] = getCompressedUnicodeFlag();
> > 271
> > 272 // we assume compressed unicode (bein the dern americans we
> are ;-p)
> > 273 StringUtil.putCompressedUnicode(getSheetname(), data, 12 +
> offset );
>
org.apache.poi.hssf.usermodel = primarily a high level wrapper (though
it contains some mild relationships between objects)
org.apache.poi.hssf.model = primarily the "grammar" for the file format.
org.apache.poi.hssf.records.* = the "words" for the file format.
run org.apache.poi.hssf.dev.BiffViewer on a few live files and look at
the output in your favorite editor (mine is "vi"). This will give you a
lot more understanding.
Glen? Anything to add?
> And about Consistency, for example:
> As far as I has understood the first and second words are the inner ones. Am
> I right? (From the excel specs it starts from the BOF). ( and what is 0x008?
> :)
I didn't understand the above... ?
> Then dword and word are for the Bundle Sheet. It is OK.
I think I understand. Yes. (you're spelling out the bytes 0x0 = "sid"
- java short value for bundlesheet record id). at 0x2 = the record
size. The record size is equal to the stringlength + 8 (which seems
incorrect -- shouldn't it be 0xA?)
> The last two bytes are define UnicodeString BIFF8 (or simple String BIFF7)
> properties.
> And after the String content comes.
> I feel unconsistency as the last two properies bites are described in the
> UnicodeString class as well as here (BoundSheetRecord). Quite confusing.
>
I don't follow you. UnicodeString doesn't appear to be used here. I
don't remember exactly, but the inconsistancy may or may not be ours.
But if you're hinting that the *header* in the UnicodeString record
equals the stringsize + unicode flag taht is not set correctly, thats
probably correct. I think you need to get rid of the "sheetnameLength"
field (at least from serialization) and get rid of the compressed
unicode flag, and use a UnicodeString instead. Is that your meaning?
If so I think you're correct.
> The next point that stoped me for a peroid of time is
> StringUtil.putCompressedUnicode(getSheetname(), data, 12 + offset);
> in this class
> and
> StringUtil.putCompressedUnicode(unicodeString, data, 0x3 + offset);
> in the UnicodeString class. (0x3 - please, explain )
>
> Also I do not understand the workflow:
> try {
> String unicodeString = new
> String(getString().getBytes("Unicode"),"Unicode");
> if (getOptionFlags() == 0)
> {
> StringUtil.putCompressedUnicode(unicodeString, data, 0x3 +
> offset);
> }
> else
> {
> StringUtil.putUncompressedUnicode(unicodeString, data, 0x3 +
> offset);
> }
> }
> catch (Exception e) {
> if (getOptionFlags() == 0)
> {
> StringUtil.putCompressedUnicode(getString(), data, 0x3 +
> offset);
> }
> else
> {
> StringUtil.putUncompressedUnicode(getString(), data, 0x3 +
> offset);
> }
> }
> What for the encoding, I thought that it is the same 'A' decoded from
> Unicode and encoded to Unicode still 'A'. What it is for?
>
I don't understand what the catch statement is for, but if the unicode
flag is == 0 then write compressed unicode (8 bit) skipping 3 bytes
(probably for a header which is likely written elsewhere). If its not
zero then write it in "UncompressedUnicode" (16bit) again skipping the 3
byte header.
>
> > > Sure, I found it while looking at the jUnit reports, here it is from the
> > > jUnit testcases (building info):
> > > Running org.apache.poi.hssf.record.TestSSTRecord
> > > Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 5,879 sec
> > > TEST org.apache.poi.hssf.record.TestSSTRecord FAILED
> > >
> >
> > could you perhaps do a
> >
> > ./build.sh site
> >
> > and then browse at build/docs
> >
> > and find the junit test results pages?
> >
> > You should be able to drill down into which test failed. My problem is
> > that the tests are succeeding for me. This could be a
> > Global/localization issue. (meaning running on a system with Russian
> > language settings might have an error that running on Linux box with
> > English settings does not due to language defaults/etc). If we can
> > narrow it down to which TestSSTRecord test failed (there are 7 of them,
> > I suspect its the rich text one:
> >
> http://jakarta.apache.org/poi/tests/junit/org/apache/poi/hssf/record/TestSST
> Record.html)
> >
>
> I will attach the xml file with the test results. in the direct letter. I
> will also ZIP it.
> ( it is testcase name="testProcessContinueRecord" )
> As for my platform: it is MS Win2K Pro, JDK 1.4.0, Excel XP. :)
>
Cool. I'll take a look at it.
>
> Bye!
>
>
>
>
>
> --
> To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
>
--
http://www.superlinksoftware.com - software solutions for business
http://jakarta.apache.org/poi - Excel/Word/OLE 2 Compound Document in
Java
http://krysalis.sourceforge.net/centipede - the best build/project
structure
a guy/gal could have! - Make Ant simple on complex Projects!
The avalanche has already started. It is too late for the pebbles to
vote.
-Ambassador Kosh
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>