Hi all,

based on Pavels input, I did some quick trials and thought I let you know
the results.
I'm running on an old pentiu...@3.2ghz with 3G memory and Win-7. So, don't
expect
super-fast times!

1. I have one DB with 5411 units and 1517 measurements with columns
       unit_id, m_1, .. m_1517
    No indices and not 100% populated: .db size is 48'834kB, whereas for
full population
    I'd expect 5411*1517*8 = 65'668kB.

    To dump the entire db into a .csv file with sqlite3.exe takes ~35s
    To dump a random column takes 0.42 .. 1.02s with a mean of 0.65s
    To dump 5 randomly selected columns takes 0.65 .. 1.16s with mean 0.88s
    To dump 100 randomly selected columns takes 2.34 .. 3.11s with mean
2.54s
    (I always dump all units and the times include the time to start sqlite3
- so it could be improved...;
     The run on one column shows the most dispersion and it is not clear
which columns cost most...
     Each series above includes 100 random selections)

2. I quickly created a DB for 5500 units and 1500 measurements in the format
       unit_id, meas_id, meas_val
    This database is fully populated with random values for meas_val and is
191MByte in size.
    Dumping the whole db costs 77s, but this may be an unfair measurement
since I simply run
       select * from table;
    rather than restoring the rectangular format of units x measurements.

    Of course, now dumping one meas_id for all unit_id is expensive, so I
need an index on meas_id
    to run queries like 'select * from table where meas_id=n'. That
increases .db size to 293MByte.
    Likely, I'd need another index on unit_id as well... so db size would
grow another 100MByte,
    which I clearly woudn't like!

    With the index, dumping one meas_id with the query above costs 0.61s (I
didn't do multiple
    runs here...), but if I now want to select multiple meas_id with queries
like 
        select * from table where meas_id IN (n1, n2, ..)
    it costs for 5 meas_id 1.62s and for 100 meas_id 27.2s (!!), so I'm
penalized not only in
    size but also in time compared to case 1 above.

So, I can't see the advantage of normalization here. (Note that I do not
plan to join or index
on measurement columns!)

So, back to my first question... what do I pay when I increase the compile
time limits of SQLite
from 2000 to 32k columns? To be honest, I still consider staying in the 2000
column limit with
a blocked table layout similar as follows:
     unit_id, block_id, meas_1, .... meas_2000
so that measurements 2001 ... 4000 would come on a new row. (Yea, that's
2002 cols, so I need
take care... :-) )

Thanks for your help anyway - and let me know if you have further input or
remarks, either regarding
above measurements (did I miss some important stuff?) or for the case of
>2000 measurements.

Greetings - Stefan

-- 
View this message in context: 
http://old.nabble.com/Limitation-on-Column-count-tp27117364p27203919.html
Sent from the SQLite mailing list archive at Nabble.com.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to