Note. I accidentally tagged the main branch "performance" before creating the performance-branch. So "performance" is actually just a tag before I applied my patches and "performance-branch" is where they actually were. The tag "performance" is roughly irrelevant at this point. (I just goofed).

On to the details. The results are roughly what i'd expect. Note that I did not have a sheet with all single characters in mind, more thinking about your average scenario. In this case the peformance branch would cause an expansion.

Here is why. In order to keep the data structure simple the labelsst index stored in each string cell is expanded to a double. This lets us store it in the same array with numeric cells. So while for the 10 meg sheet the memory savings of reducing the object count pays off still. On the 20 meg sheet the memory expansion of the labelsst indicies actually manages to surpass the savings in objects. Furthermore, I imagine tweaking the INITIAL_CAPACITY constants for such a large sheet would probably bring this back into line (I plan to make these configurable). As near the end the array copies required are probably quite large.

So what is left to do? Well we need to test/debug with complex sheets, make all the unit tests work and make the INTIAL_CAPACITY constants able to be configured. Meaning I should be able to say "Okay I'm about to open a 10k spreadsheet, so don't optimize for opening a big sheet" or "Okay I'm about to open a 4mb spreadsheet, so optimize for about that" or "I'm one of those crazy people who generates 20mb spreadsheets, please optimize for that". This way the arrays and such are cooked to 20mb etc and not made that big for small sheets.

-Andy

Mikael Sitruk wrote:

Hi to all,
I've made a benchmark on two branches: 'performance-branch' and
'performance'. I put the result here in this email, but perhaps an excel
file should be more appropriate

The benchmark was performed on two types of excel files noted A & B

A: MS Excel file of 10,483 KB (10483 (KB)) (1 sheet of 65520 on 10
columns, each cell contains a single character)
B: MS Excel file of 20,956 Mg (2 sheets of 65520 on 10 columns, each
cell contains a single character)

The benchmark was performed since I need to create very large workbooks
with an acceptable amount of memory. Following are the results.

Test #1 - branch: performance
-----------------------------
Memory size : 128MB
Excel : A & B
Run status : failed
run time (ms): N/A
new file size: N/A
Comment : N/A

Test #2 - branch: performance
-----------------------------
Memory size : 256MB
Excel : A
Run status : success
run time (ms): 55680
new file size: 10339 (KB)
Comment : The new file size is slightly less than the original one,
but it is totally compatible with Excel, e.g. I opened it in excel and
the data is ok. See(*)

Test #3 - branch: performance
-----------------------------
Memory size : 256MB
Excel : B
Run status : failure
run time (ms): N/A
new file size: N/A
Comment : N/A

Test #4 - branch: performance
-----------------------------
Memory size : 300MB
Excel : B
Run status : failure
run time (ms): N/A
new file size: N/A
Comment : N/A

Test #5 - branch: performance
-----------------------
Memory size : 400MB
Excel : B
Run status : Success
run time (ms): 114765
new file size: 20666 KB
Comment : N/A
------------------------------------------------------------------------
--------------------

Test #1 - branch: performance-branch
------------------------------------
Memory size : 128MB
Excel : A & B
Run status : failed
run time (ms): N/A
new file size: N/A
Comment : N/A

Test #2 - branch: performance-branch
------------------------------------
Memory size : 256MB
Excel : A
Run status : success
run time (ms): 25687 (after third run - first run took 40668, the second
33000)
new file size: 10331 (KB)
Comment : The new file size is slightly less than the original one,
but it is totally compatible with Excel, e.g. I opened it in excel and
the data is ok.
Another quite interesting thing is that the run took less and less time,
but this I presume is due to the CPU architecture.
Test #3 - branch: performance-branch
-----------------------------
Memory size : 256MB
Excel : B
Run status : failure
run time (ms): N/A
new file size: N/A
Comment : N/A

Test #4 - branch: performance-branch
-----------------------------
Memory size : 300MB
Excel : B
Run status : failure
run time (ms): N/A
new file size: N/A
Comment : N/A

Test #5 - branch: performance-branch
-----------------------
Memory size : 400MB
Excel : B
Run status : failure
run time (ms): N/A
new file size: N/A
Comment : N/A

Test #6 - branch: performance-branch
-----------------------
Memory size : 450MB
Excel : B
Run status : success
run time (ms): 236991
new file size: 20,658 KB
Comment : Here again the file size is slightly smaller.

The most interesting thing is that in this branch (performance-branch)
by paradox took more memory than the branch 'performance', e.g. the test
with 400MB failed in this branch and succeed in the 'performance'
branch.

==========================================================

(*) difference between original file and created file

Using Biff I have the following diffs.
Orig file (just after the COUNTRY record):
============================================
Offset 0x57e (1406)
recordid = 0x1c1, size =8
[UNKNOWN RECORD:1c1]
.id = 1c1
[/UNKNWON RECORD]

-----UNKNOWN----------------------------------
00000000 C1 01 00 00 54 8D 01 00 ....T...

-----UNKNOWN----------------------------------
============================================
Offset 0x58a (1418)
recordid = 0xfc, size =48
[SST]
.numstrings = 9ff60
.uniquestrings = a
.string_0 = A
.string_1 = a
.string_2 = b
.string_3 = B
.string_4 = c
.string_5 = d
.string_6 = e
.string_7 = f
.string_8 = g
.string_9 = h
[/SST]


The new file has instead ============================================
Offset 0x57e (1406)
recordid = 0xfc, size =48
[SST]
.numstrings = 9ff60
.uniquestrings = a
.string_0 = A
.string_1 = a
.string_2 = b
.string_3 = B
.string_4 = c
.string_5 = d
.string_6 = e
.string_7 = f
.string_8 = g
.string_9 = h
[/SST]

============================================

So there is a difference, and of course after this record the offset are
not the same.

Mikael.S



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to