This is from your last e-mail:

> So, I can't see the advantage of normalization here. (Note that I do not
> plan to join or index
> on measurement columns!)

This is from your first e-mail:

> Queries on this big table will be rather straight-forward: either on the
> table alone (SELECT * FROM table_large) or one join on one field to one of
> the smaller tables.

As you probably can see these sentences contradict to each other.
If all queries that you need on your database are
- get all measurements for some unit_id;
- get some measurements for all unit_ids;
then of course you won't gain anything from normalization and your
queries will never contain any O(N^2) algorithms (because you have no
joins on measurements). So I'd say go ahead and raise the limit to
32K. Besides some ugly-looking column names and some schema
uncomfortable for database theory supporter, you won't get any
troubles with that.


Pavel

On Sun, Jan 17, 2010 at 6:08 PM, Stefan_E <se_m...@hotmail.com> wrote:
>
> Hi all,
>
> based on Pavels input, I did some quick trials and thought I let you know
> the results.
> I'm running on an old pentiu...@3.2ghz with 3G memory and Win-7. So, don't
> expect
> super-fast times!
>
> 1. I have one DB with 5411 units and 1517 measurements with columns
>       unit_id, m_1, .. m_1517
>    No indices and not 100% populated: .db size is 48'834kB, whereas for
> full population
>    I'd expect 5411*1517*8 = 65'668kB.
>
>    To dump the entire db into a .csv file with sqlite3.exe takes ~35s
>    To dump a random column takes 0.42 .. 1.02s with a mean of 0.65s
>    To dump 5 randomly selected columns takes 0.65 .. 1.16s with mean 0.88s
>    To dump 100 randomly selected columns takes 2.34 .. 3.11s with mean
> 2.54s
>    (I always dump all units and the times include the time to start sqlite3
> - so it could be improved...;
>     The run on one column shows the most dispersion and it is not clear
> which columns cost most...
>     Each series above includes 100 random selections)
>
> 2. I quickly created a DB for 5500 units and 1500 measurements in the format
>       unit_id, meas_id, meas_val
>    This database is fully populated with random values for meas_val and is
> 191MByte in size.
>    Dumping the whole db costs 77s, but this may be an unfair measurement
> since I simply run
>       select * from table;
>    rather than restoring the rectangular format of units x measurements.
>
>    Of course, now dumping one meas_id for all unit_id is expensive, so I
> need an index on meas_id
>    to run queries like 'select * from table where meas_id=n'. That
> increases .db size to 293MByte.
>    Likely, I'd need another index on unit_id as well... so db size would
> grow another 100MByte,
>    which I clearly woudn't like!
>
>    With the index, dumping one meas_id with the query above costs 0.61s (I
> didn't do multiple
>    runs here...), but if I now want to select multiple meas_id with queries
> like
>        select * from table where meas_id IN (n1, n2, ..)
>    it costs for 5 meas_id 1.62s and for 100 meas_id 27.2s (!!), so I'm
> penalized not only in
>    size but also in time compared to case 1 above.
>
> So, I can't see the advantage of normalization here. (Note that I do not
> plan to join or index
> on measurement columns!)
>
> So, back to my first question... what do I pay when I increase the compile
> time limits of SQLite
> from 2000 to 32k columns? To be honest, I still consider staying in the 2000
> column limit with
> a blocked table layout similar as follows:
>     unit_id, block_id, meas_1, .... meas_2000
> so that measurements 2001 ... 4000 would come on a new row. (Yea, that's
> 2002 cols, so I need
> take care... :-) )
>
> Thanks for your help anyway - and let me know if you have further input or
> remarks, either regarding
> above measurements (did I miss some important stuff?) or for the case of
>>2000 measurements.
>
> Greetings - Stefan
>
> --
> View this message in context: 
> http://old.nabble.com/Limitation-on-Column-count-tp27117364p27203919.html
> Sent from the SQLite mailing list archive at Nabble.com.
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to