Re: [sqlite] Limitation on Column count
Hi Pavel, yes - you are quite right, I wasn't precise here! Sorry about that! What I meant was a join on unit_id, which obviously will have an index if that's needed. Thanks for your help! Kind regards Stefan Btw., in the same context, I noticed that sparse data of the database (with lots of NULL values), which would otherwise bloat the database doesn't seem to be a problem in SQLite with it's type affinity scheme! Nice! -- View this message in context: http://old.nabble.com/Limitation-on-Column-count-tp27117364p27231605.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Limitation on Column count
This is from your last e-mail: > So, I can't see the advantage of normalization here. (Note that I do not > plan to join or index > on measurement columns!) This is from your first e-mail: > Queries on this big table will be rather straight-forward: either on the > table alone (SELECT * FROM table_large) or one join on one field to one of > the smaller tables. As you probably can see these sentences contradict to each other. If all queries that you need on your database are - get all measurements for some unit_id; - get some measurements for all unit_ids; then of course you won't gain anything from normalization and your queries will never contain any O(N^2) algorithms (because you have no joins on measurements). So I'd say go ahead and raise the limit to 32K. Besides some ugly-looking column names and some schema uncomfortable for database theory supporter, you won't get any troubles with that. Pavel On Sun, Jan 17, 2010 at 6:08 PM, Stefan_Ewrote: > > Hi all, > > based on Pavels input, I did some quick trials and thought I let you know > the results. > I'm running on an old pentiu...@3.2ghz with 3G memory and Win-7. So, don't > expect > super-fast times! > > 1. I have one DB with 5411 units and 1517 measurements with columns > unit_id, m_1, .. m_1517 > No indices and not 100% populated: .db size is 48'834kB, whereas for > full population > I'd expect 5411*1517*8 = 65'668kB. > > To dump the entire db into a .csv file with sqlite3.exe takes ~35s > To dump a random column takes 0.42 .. 1.02s with a mean of 0.65s > To dump 5 randomly selected columns takes 0.65 .. 1.16s with mean 0.88s > To dump 100 randomly selected columns takes 2.34 .. 3.11s with mean > 2.54s > (I always dump all units and the times include the time to start sqlite3 > - so it could be improved...; > The run on one column shows the most dispersion and it is not clear > which columns cost most... > Each series above includes 100 random selections) > > 2. I quickly created a DB for 5500 units and 1500 measurements in the format > unit_id, meas_id, meas_val > This database is fully populated with random values for meas_val and is > 191MByte in size. > Dumping the whole db costs 77s, but this may be an unfair measurement > since I simply run > select * from table; > rather than restoring the rectangular format of units x measurements. > > Of course, now dumping one meas_id for all unit_id is expensive, so I > need an index on meas_id > to run queries like 'select * from table where meas_id=n'. That > increases .db size to 293MByte. > Likely, I'd need another index on unit_id as well... so db size would > grow another 100MByte, > which I clearly woudn't like! > > With the index, dumping one meas_id with the query above costs 0.61s (I > didn't do multiple > runs here...), but if I now want to select multiple meas_id with queries > like > select * from table where meas_id IN (n1, n2, ..) > it costs for 5 meas_id 1.62s and for 100 meas_id 27.2s (!!), so I'm > penalized not only in > size but also in time compared to case 1 above. > > So, I can't see the advantage of normalization here. (Note that I do not > plan to join or index > on measurement columns!) > > So, back to my first question... what do I pay when I increase the compile > time limits of SQLite > from 2000 to 32k columns? To be honest, I still consider staying in the 2000 > column limit with > a blocked table layout similar as follows: > unit_id, block_id, meas_1, meas_2000 > so that measurements 2001 ... 4000 would come on a new row. (Yea, that's > 2002 cols, so I need > take care... :-) ) > > Thanks for your help anyway - and let me know if you have further input or > remarks, either regarding > above measurements (did I miss some important stuff?) or for the case of >>2000 measurements. > > Greetings - Stefan > > -- > View this message in context: > http://old.nabble.com/Limitation-on-Column-count-tp27117364p27203919.html > Sent from the SQLite mailing list archive at Nabble.com. > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Limitation on Column count
Hi all, based on Pavels input, I did some quick trials and thought I let you know the results. I'm running on an old pentiu...@3.2ghz with 3G memory and Win-7. So, don't expect super-fast times! 1. I have one DB with 5411 units and 1517 measurements with columns unit_id, m_1, .. m_1517 No indices and not 100% populated: .db size is 48'834kB, whereas for full population I'd expect 5411*1517*8 = 65'668kB. To dump the entire db into a .csv file with sqlite3.exe takes ~35s To dump a random column takes 0.42 .. 1.02s with a mean of 0.65s To dump 5 randomly selected columns takes 0.65 .. 1.16s with mean 0.88s To dump 100 randomly selected columns takes 2.34 .. 3.11s with mean 2.54s (I always dump all units and the times include the time to start sqlite3 - so it could be improved...; The run on one column shows the most dispersion and it is not clear which columns cost most... Each series above includes 100 random selections) 2. I quickly created a DB for 5500 units and 1500 measurements in the format unit_id, meas_id, meas_val This database is fully populated with random values for meas_val and is 191MByte in size. Dumping the whole db costs 77s, but this may be an unfair measurement since I simply run select * from table; rather than restoring the rectangular format of units x measurements. Of course, now dumping one meas_id for all unit_id is expensive, so I need an index on meas_id to run queries like 'select * from table where meas_id=n'. That increases .db size to 293MByte. Likely, I'd need another index on unit_id as well... so db size would grow another 100MByte, which I clearly woudn't like! With the index, dumping one meas_id with the query above costs 0.61s (I didn't do multiple runs here...), but if I now want to select multiple meas_id with queries like select * from table where meas_id IN (n1, n2, ..) it costs for 5 meas_id 1.62s and for 100 meas_id 27.2s (!!), so I'm penalized not only in size but also in time compared to case 1 above. So, I can't see the advantage of normalization here. (Note that I do not plan to join or index on measurement columns!) So, back to my first question... what do I pay when I increase the compile time limits of SQLite from 2000 to 32k columns? To be honest, I still consider staying in the 2000 column limit with a blocked table layout similar as follows: unit_id, block_id, meas_1, meas_2000 so that measurements 2001 ... 4000 would come on a new row. (Yea, that's 2002 cols, so I need take care... :-) ) Thanks for your help anyway - and let me know if you have further input or remarks, either regarding above measurements (did I miss some important stuff?) or for the case of >2000 measurements. Greetings - Stefan -- View this message in context: http://old.nabble.com/Limitation-on-Column-count-tp27117364p27203919.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Limitation on Column count
> When doing so: Any idea of what would be the worst case column select > strategy on the huge table to compare with? Take some query from your typical usage where only last column of the huge table is used and all other columns either not used at all or just one-two of them used for other conditions (assuming you don't have index on this last column). I believe it would be the worst case for huge table usage. Pavel On Tue, Jan 12, 2010 at 4:44 PM, Stefan_Ewrote: > > Hi Pavel, > > thanks a lot; I will reconsider this option and run some test cases to > compare. > But that will be tomorrow I guess... > When doing so: Any idea of what would be the worst case column select > strategy on the huge table to compare with? > > Stefan > > > Pavel Ivanov-2 wrote: >> >>> so normalization would lead to a doubling >>> of the storage space (add a measurement_id to each measurement). >> >> My strong belief is that when you try this normalization you'll see >> that such doubling of storage is a good enough trade-off for the speed >> you'll achieve. I don't think that speed of queries on the table with >> 100+ columns would be any useful (of course unless you *always* select >> all columns and *never* try to select only a few ones). >> >>> Second, the most common use case is to view the table in the currently >>> foreseen format - so, I'd pay both in space and time... >> >> Most probably you view your table from your application which can >> denormalize the table very quickly. Even if you view your table from >> sqlite3 command line tool you still can write denormalizer even using >> bash scripts and I believe it will still work fast enough and it will >> be better than creating such huge table. >> >> >> Pavel >> >> On Tue, Jan 12, 2010 at 1:09 PM, Stefan_E wrote: >>> >>> Hi Adam, >>> >>> thanks for your suggestion. Unfortunately, it doesn't help in my case. >>> Essentially, we are talking about a time series (rows) of n different >>> measurements (columns) - so normalization would lead to a doubling >>> of the storage space (add a measurement_id to each measurement). >>> >>> Second, the most common use case is to view the table in the currently >>> foreseen format - so, I'd pay both in space and time... >>> >>> Anyway, thanks for the suggestion! >>> >>> Regards, Stefan >>> >>> >>> Adam DeVita wrote: Good day, In general I try to work within the limits of any database engine that I am using. Often, the limits are there for good reasons (such as speed problems). I would suggest seeing if there is a way to normalize the big tables such that infrequently used columns are split into tables that aren't joined in often. (The principal I'm using is borrowed from hardware architecture "Make the common case fast, and ensure the uncommon case is correct.") It may or may not be sensible given your data, but there may be an opportunity to reduce the number of columns by making an encoded column to aggregate, such as lots of mutually exclusive binary flag fields. regards, Adam On Mon, Jan 11, 2010 at 3:46 PM, Stefan_E wrote: > > Hello all, > > I'm considering using SQLite for a new application. The schema will > contain > a bunch of small tables with few columns (~10) plus one large table > with > many columns and 1000...1 rows. > > 'Many columns' typically fits into the default 2000 column limit, but > can > exceed it at times (that is, on some of the foreseen databases). It > will > never exceed the theoretical / compile time selectable limit of 32k > columns. > Queries on this big table will be rather straight-forward: either on > the > table alone (SELECT * FROM table_large) or one join on one field to one > of > the smaller tables. > > The http://www.sqlite.org/limits.html Limits page warns: "There are > places > in the SQLite code generator that use algorithms that are O(N²) where N > is > the number of columns." which is kind of discouraging to increase max. > column count at compile time, but is not very specific about when this > happens... > > I now have two design options: > - increase max. column count at compile time (possibly setting > SQLITE_LIMIT_COLUMN on databases where I don't need more than (say) > 1000 > columns) and accept the quoted performance degradation. > - alternatively, in the client handle cases with more than 2000 > columns, > splitting the storage up into two (or more) tables > > Any advise, experience - or more specifics on the "O(N²)" remark are > highly > welcome! > > Thanks for your help - Stefan > -- > View this message in context: > http://old.nabble.com/Limitation-on-Column-count-tp27117364p27117364.html > Sent
Re: [sqlite] Limitation on Column count
Hi Pavel, thanks a lot; I will reconsider this option and run some test cases to compare. But that will be tomorrow I guess... When doing so: Any idea of what would be the worst case column select strategy on the huge table to compare with? Stefan Pavel Ivanov-2 wrote: > >> so normalization would lead to a doubling >> of the storage space (add a measurement_id to each measurement). > > My strong belief is that when you try this normalization you'll see > that such doubling of storage is a good enough trade-off for the speed > you'll achieve. I don't think that speed of queries on the table with > 100+ columns would be any useful (of course unless you *always* select > all columns and *never* try to select only a few ones). > >> Second, the most common use case is to view the table in the currently >> foreseen format - so, I'd pay both in space and time... > > Most probably you view your table from your application which can > denormalize the table very quickly. Even if you view your table from > sqlite3 command line tool you still can write denormalizer even using > bash scripts and I believe it will still work fast enough and it will > be better than creating such huge table. > > > Pavel > > On Tue, Jan 12, 2010 at 1:09 PM, Stefan_Ewrote: >> >> Hi Adam, >> >> thanks for your suggestion. Unfortunately, it doesn't help in my case. >> Essentially, we are talking about a time series (rows) of n different >> measurements (columns) - so normalization would lead to a doubling >> of the storage space (add a measurement_id to each measurement). >> >> Second, the most common use case is to view the table in the currently >> foreseen format - so, I'd pay both in space and time... >> >> Anyway, thanks for the suggestion! >> >> Regards, Stefan >> >> >> Adam DeVita wrote: >>> >>> Good day, >>> >>> In general I try to work within the limits of any database engine that I >>> am >>> using. Often, the limits are there for good reasons (such as speed >>> problems). I would suggest seeing if there is a way to normalize the >>> big >>> tables such that infrequently used columns are split into tables that >>> aren't >>> joined in often. (The principal I'm using is borrowed from hardware >>> architecture "Make the common case fast, and ensure the uncommon case is >>> correct.") >>> >>> It may or may not be sensible given your data, but there may be an >>> opportunity to reduce the number of columns by making an encoded column >>> to >>> aggregate, such as lots of mutually exclusive binary flag fields. >>> >>> regards, >>> Adam >>> >>> On Mon, Jan 11, 2010 at 3:46 PM, Stefan_E wrote: >>> Hello all, I'm considering using SQLite for a new application. The schema will contain a bunch of small tables with few columns (~10) plus one large table with many columns and 1000...1 rows. 'Many columns' typically fits into the default 2000 column limit, but can exceed it at times (that is, on some of the foreseen databases). It will never exceed the theoretical / compile time selectable limit of 32k columns. Queries on this big table will be rather straight-forward: either on the table alone (SELECT * FROM table_large) or one join on one field to one of the smaller tables. The http://www.sqlite.org/limits.html Limits page warns: "There are places in the SQLite code generator that use algorithms that are O(N²) where N is the number of columns." which is kind of discouraging to increase max. column count at compile time, but is not very specific about when this happens... I now have two design options: - increase max. column count at compile time (possibly setting SQLITE_LIMIT_COLUMN on databases where I don't need more than (say) 1000 columns) and accept the quoted performance degradation. - alternatively, in the client handle cases with more than 2000 columns, splitting the storage up into two (or more) tables Any advise, experience - or more specifics on the "O(N²)" remark are highly welcome! Thanks for your help - Stefan -- View this message in context: http://old.nabble.com/Limitation-on-Column-count-tp27117364p27117364.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >>> >>> >>> -- >>> VerifEye Technologies Inc. >>> 905-948-0015x245 >>> 7100 Warden Ave, Unit 3 >>> Markham ON, L3R 8B5 >>> Canada >>> ___ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >>> >> >> -- >> View this message in context: >>
Re: [sqlite] Limitation on Column count
One may be able to make "measurement type" a column, thus eliminating the need for a column for each type. Some speed may be recoverable with indexing. regards, Adam On Tue, Jan 12, 2010 at 1:21 PM, Pavel Ivanovwrote: > > so normalization would lead to a doubling > > of the storage space (add a measurement_id to each measurement). > > My strong belief is that when you try this normalization you'll see > that such doubling of storage is a good enough trade-off for the speed > you'll achieve. I don't think that speed of queries on the table with > 100+ columns would be any useful (of course unless you *always* select > all columns and *never* try to select only a few ones). > > > Second, the most common use case is to view the table in the currently > > foreseen format - so, I'd pay both in space and time... > > Most probably you view your table from your application which can > denormalize the table very quickly. Even if you view your table from > sqlite3 command line tool you still can write denormalizer even using > bash scripts and I believe it will still work fast enough and it will > be better than creating such huge table. > > > Pavel > > On Tue, Jan 12, 2010 at 1:09 PM, Stefan_E wrote: > > > > Hi Adam, > > > > thanks for your suggestion. Unfortunately, it doesn't help in my case. > > Essentially, we are talking about a time series (rows) of n different > > measurements (columns) - so normalization would lead to a doubling > > of the storage space (add a measurement_id to each measurement). > > > > Second, the most common use case is to view the table in the currently > > foreseen format - so, I'd pay both in space and time... > > > > Anyway, thanks for the suggestion! > > > > Regards, Stefan > > > > > > Adam DeVita wrote: > >> > >> Good day, > >> > >> In general I try to work within the limits of any database engine that I > >> am > >> using. Often, the limits are there for good reasons (such as speed > >> problems). I would suggest seeing if there is a way to normalize the > big > >> tables such that infrequently used columns are split into tables that > >> aren't > >> joined in often. (The principal I'm using is borrowed from hardware > >> architecture "Make the common case fast, and ensure the uncommon case is > >> correct.") > >> > >> It may or may not be sensible given your data, but there may be an > >> opportunity to reduce the number of columns by making an encoded column > >> to > >> aggregate, such as lots of mutually exclusive binary flag fields. > >> > >> regards, > >> Adam > >> > >> On Mon, Jan 11, 2010 at 3:46 PM, Stefan_E wrote: > >> > >>> > >>> Hello all, > >>> > >>> I'm considering using SQLite for a new application. The schema will > >>> contain > >>> a bunch of small tables with few columns (~10) plus one large table > with > >>> many columns and 1000...1 rows. > >>> > >>> 'Many columns' typically fits into the default 2000 column limit, but > can > >>> exceed it at times (that is, on some of the foreseen databases). It > will > >>> never exceed the theoretical / compile time selectable limit of 32k > >>> columns. > >>> Queries on this big table will be rather straight-forward: either on > the > >>> table alone (SELECT * FROM table_large) or one join on one field to one > >>> of > >>> the smaller tables. > >>> > >>> The http://www.sqlite.org/limits.html Limits page warns: "There are > >>> places > >>> in the SQLite code generator that use algorithms that are O(N²) where N > >>> is > >>> the number of columns." which is kind of discouraging to increase max. > >>> column count at compile time, but is not very specific about when this > >>> happens... > >>> > >>> I now have two design options: > >>> - increase max. column count at compile time (possibly setting > >>> SQLITE_LIMIT_COLUMN on databases where I don't need more than (say) > 1000 > >>> columns) and accept the quoted performance degradation. > >>> - alternatively, in the client handle cases with more than 2000 > columns, > >>> splitting the storage up into two (or more) tables > >>> > >>> Any advise, experience - or more specifics on the "O(N²)" remark are > >>> highly > >>> welcome! > >>> > >>> Thanks for your help - Stefan > >>> -- > >>> View this message in context: > >>> > http://old.nabble.com/Limitation-on-Column-count-tp27117364p27117364.html > >>> Sent from the SQLite mailing list archive at Nabble.com. > >>> > >>> ___ > >>> sqlite-users mailing list > >>> sqlite-users@sqlite.org > >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > >>> > >> > >> > >> > >> -- > >> VerifEye Technologies Inc. > >> 905-948-0015x245 > >> 7100 Warden Ave, Unit 3 > >> Markham ON, L3R 8B5 > >> Canada > >> ___ > >> sqlite-users mailing list > >> sqlite-users@sqlite.org > >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > >> > >> > > > > -- > >
Re: [sqlite] Limitation on Column count
> so normalization would lead to a doubling > of the storage space (add a measurement_id to each measurement). My strong belief is that when you try this normalization you'll see that such doubling of storage is a good enough trade-off for the speed you'll achieve. I don't think that speed of queries on the table with 100+ columns would be any useful (of course unless you *always* select all columns and *never* try to select only a few ones). > Second, the most common use case is to view the table in the currently > foreseen format - so, I'd pay both in space and time... Most probably you view your table from your application which can denormalize the table very quickly. Even if you view your table from sqlite3 command line tool you still can write denormalizer even using bash scripts and I believe it will still work fast enough and it will be better than creating such huge table. Pavel On Tue, Jan 12, 2010 at 1:09 PM, Stefan_Ewrote: > > Hi Adam, > > thanks for your suggestion. Unfortunately, it doesn't help in my case. > Essentially, we are talking about a time series (rows) of n different > measurements (columns) - so normalization would lead to a doubling > of the storage space (add a measurement_id to each measurement). > > Second, the most common use case is to view the table in the currently > foreseen format - so, I'd pay both in space and time... > > Anyway, thanks for the suggestion! > > Regards, Stefan > > > Adam DeVita wrote: >> >> Good day, >> >> In general I try to work within the limits of any database engine that I >> am >> using. Often, the limits are there for good reasons (such as speed >> problems). I would suggest seeing if there is a way to normalize the big >> tables such that infrequently used columns are split into tables that >> aren't >> joined in often. (The principal I'm using is borrowed from hardware >> architecture "Make the common case fast, and ensure the uncommon case is >> correct.") >> >> It may or may not be sensible given your data, but there may be an >> opportunity to reduce the number of columns by making an encoded column >> to >> aggregate, such as lots of mutually exclusive binary flag fields. >> >> regards, >> Adam >> >> On Mon, Jan 11, 2010 at 3:46 PM, Stefan_E wrote: >> >>> >>> Hello all, >>> >>> I'm considering using SQLite for a new application. The schema will >>> contain >>> a bunch of small tables with few columns (~10) plus one large table with >>> many columns and 1000...1 rows. >>> >>> 'Many columns' typically fits into the default 2000 column limit, but can >>> exceed it at times (that is, on some of the foreseen databases). It will >>> never exceed the theoretical / compile time selectable limit of 32k >>> columns. >>> Queries on this big table will be rather straight-forward: either on the >>> table alone (SELECT * FROM table_large) or one join on one field to one >>> of >>> the smaller tables. >>> >>> The http://www.sqlite.org/limits.html Limits page warns: "There are >>> places >>> in the SQLite code generator that use algorithms that are O(N²) where N >>> is >>> the number of columns." which is kind of discouraging to increase max. >>> column count at compile time, but is not very specific about when this >>> happens... >>> >>> I now have two design options: >>> - increase max. column count at compile time (possibly setting >>> SQLITE_LIMIT_COLUMN on databases where I don't need more than (say) 1000 >>> columns) and accept the quoted performance degradation. >>> - alternatively, in the client handle cases with more than 2000 columns, >>> splitting the storage up into two (or more) tables >>> >>> Any advise, experience - or more specifics on the "O(N²)" remark are >>> highly >>> welcome! >>> >>> Thanks for your help - Stefan >>> -- >>> View this message in context: >>> http://old.nabble.com/Limitation-on-Column-count-tp27117364p27117364.html >>> Sent from the SQLite mailing list archive at Nabble.com. >>> >>> ___ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >> >> >> >> -- >> VerifEye Technologies Inc. >> 905-948-0015x245 >> 7100 Warden Ave, Unit 3 >> Markham ON, L3R 8B5 >> Canada >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> >> > > -- > View this message in context: > http://old.nabble.com/Limitation-on-Column-count-tp27117364p27131144.html > Sent from the SQLite mailing list archive at Nabble.com. > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Limitation on Column count
Hi Adam, thanks for your suggestion. Unfortunately, it doesn't help in my case. Essentially, we are talking about a time series (rows) of n different measurements (columns) - so normalization would lead to a doubling of the storage space (add a measurement_id to each measurement). Second, the most common use case is to view the table in the currently foreseen format - so, I'd pay both in space and time... Anyway, thanks for the suggestion! Regards, Stefan Adam DeVita wrote: > > Good day, > > In general I try to work within the limits of any database engine that I > am > using. Often, the limits are there for good reasons (such as speed > problems). I would suggest seeing if there is a way to normalize the big > tables such that infrequently used columns are split into tables that > aren't > joined in often. (The principal I'm using is borrowed from hardware > architecture "Make the common case fast, and ensure the uncommon case is > correct.") > > It may or may not be sensible given your data, but there may be an > opportunity to reduce the number of columns by making an encoded column > to > aggregate, such as lots of mutually exclusive binary flag fields. > > regards, > Adam > > On Mon, Jan 11, 2010 at 3:46 PM, Stefan_Ewrote: > >> >> Hello all, >> >> I'm considering using SQLite for a new application. The schema will >> contain >> a bunch of small tables with few columns (~10) plus one large table with >> many columns and 1000...1 rows. >> >> 'Many columns' typically fits into the default 2000 column limit, but can >> exceed it at times (that is, on some of the foreseen databases). It will >> never exceed the theoretical / compile time selectable limit of 32k >> columns. >> Queries on this big table will be rather straight-forward: either on the >> table alone (SELECT * FROM table_large) or one join on one field to one >> of >> the smaller tables. >> >> The http://www.sqlite.org/limits.html Limits page warns: "There are >> places >> in the SQLite code generator that use algorithms that are O(N²) where N >> is >> the number of columns." which is kind of discouraging to increase max. >> column count at compile time, but is not very specific about when this >> happens... >> >> I now have two design options: >> - increase max. column count at compile time (possibly setting >> SQLITE_LIMIT_COLUMN on databases where I don't need more than (say) 1000 >> columns) and accept the quoted performance degradation. >> - alternatively, in the client handle cases with more than 2000 columns, >> splitting the storage up into two (or more) tables >> >> Any advise, experience - or more specifics on the "O(N²)" remark are >> highly >> welcome! >> >> Thanks for your help - Stefan >> -- >> View this message in context: >> http://old.nabble.com/Limitation-on-Column-count-tp27117364p27117364.html >> Sent from the SQLite mailing list archive at Nabble.com. >> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > > > > -- > VerifEye Technologies Inc. > 905-948-0015x245 > 7100 Warden Ave, Unit 3 > Markham ON, L3R 8B5 > Canada > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > -- View this message in context: http://old.nabble.com/Limitation-on-Column-count-tp27117364p27131144.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Limitation on Column count
Good day, In general I try to work within the limits of any database engine that I am using. Often, the limits are there for good reasons (such as speed problems). I would suggest seeing if there is a way to normalize the big tables such that infrequently used columns are split into tables that aren't joined in often. (The principal I'm using is borrowed from hardware architecture "Make the common case fast, and ensure the uncommon case is correct.") It may or may not be sensible given your data, but there may be an opportunity to reduce the number of columns by making an encoded column to aggregate, such as lots of mutually exclusive binary flag fields. regards, Adam On Mon, Jan 11, 2010 at 3:46 PM, Stefan_Ewrote: > > Hello all, > > I'm considering using SQLite for a new application. The schema will contain > a bunch of small tables with few columns (~10) plus one large table with > many columns and 1000...1 rows. > > 'Many columns' typically fits into the default 2000 column limit, but can > exceed it at times (that is, on some of the foreseen databases). It will > never exceed the theoretical / compile time selectable limit of 32k > columns. > Queries on this big table will be rather straight-forward: either on the > table alone (SELECT * FROM table_large) or one join on one field to one of > the smaller tables. > > The http://www.sqlite.org/limits.html Limits page warns: "There are > places > in the SQLite code generator that use algorithms that are O(N²) where N is > the number of columns." which is kind of discouraging to increase max. > column count at compile time, but is not very specific about when this > happens... > > I now have two design options: > - increase max. column count at compile time (possibly setting > SQLITE_LIMIT_COLUMN on databases where I don't need more than (say) 1000 > columns) and accept the quoted performance degradation. > - alternatively, in the client handle cases with more than 2000 columns, > splitting the storage up into two (or more) tables > > Any advise, experience - or more specifics on the "O(N²)" remark are highly > welcome! > > Thanks for your help - Stefan > -- > View this message in context: > http://old.nabble.com/Limitation-on-Column-count-tp27117364p27117364.html > Sent from the SQLite mailing list archive at Nabble.com. > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- VerifEye Technologies Inc. 905-948-0015x245 7100 Warden Ave, Unit 3 Markham ON, L3R 8B5 Canada ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users