Re: [sqlite] [EXTERNAL] Re: Limit on number of columns in SQLite table
If you should happen to have the need for storing columns that each have only a small set of possible values, maybe you would be better off looking into FastBit, which has coulmn-oriented storage -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Mitar Gesendet: Donnerstag, 17. Oktober 2019 15:46 An: SQLite mailing list Betreff: Re: [sqlite] [EXTERNAL] Re: Limit on number of columns in SQLite table Hi! Thanks for this input. So you are saying that sqlite3_column 100k times per row is slow, but retrieving 100k rows to construct one "original" row will be faster? So not sure if I understand why reading and decoding cells in over multiple columns is so much slower than reading and decoding cells in over multiple rows? Mitar On Thu, Oct 17, 2019 at 3:38 PM Hick Gunter wrote: > > I have the impression that you still do not grasp the folly of a 100k column > schema. > > See the example below, which only has 6 fields. As you can see, each field > requires a Column opcode and arguments (about 10 bytes) and a "register" to > hold the value (48 bytes), which for 100k columns uses about 5.5Megabytes to > retrieve a row from the database. It ill also involve SQLite decoding 100k > field values and your application calling sqlite3_column interface 100k times > for each and every row, which yield an expected performance of about 2 rows > per second. Can you afford to use that much memory and time? > > asql> create temp table genes (id integer primary key, name char, f1 > asql> char, f2 char, f3 char, f4 char); .explain explain select * from > asql> genes; > addr opcode p1p2p3p4 p5 comment > - - -- - > 0 Init 0 13000 Start at 13 > 1 OpenRead 0 2 1 6 00 root=2 iDb=1; genes > 2 Explain2 0 0 SCAN TABLE genes 00 > 3 Rewind 0 12000 > 4 Rowid 0 1 000 r[1]=rowid > 5 Column 0 1 200 r[2]=genes.name > 6 Column 0 2 300 r[3]=genes.f1 > 7 Column 0 3 400 r[4]=genes.f2 > 8 Column 0 4 500 r[5]=genes.f3 > 9 Column 0 5 600 r[6]=genes.f4 > 10 ResultRow 1 6 000 output=r[1..6] > 11Next 0 4 001 > 12Halt 0 0 000 > 13Transaction1 0 1 0 01 usesStmtJournal=0 > 14Goto 0 1 000 > > -Ursprüngliche Nachricht- > Von: sqlite-users > [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von > Mitar > Gesendet: Donnerstag, 17. Oktober 2019 15:11 > An: SQLite mailing list > Betreff: [EXTERNAL] Re: [sqlite] Limit on number of columns in SQLite > table > > Hi! > > On Thu, Oct 17, 2019 at 3:04 PM Eric Grange wrote: > > my suggestion would be to store them as JSON in a blob, and use the > > JSON functions of SQLite to extract the data > > JSON has some crazy limitations like by standard it does not support full > floating point spec, so NaN and infinity cannot be represented there. So JSON > is really no a great format when you want to preserve as much of the input as > possible (like, integers, floats, text, and binary). SQLite seems to be spot > on in this regard. > > But yes, if there would be some other standard to SQLite and supported format > to embed, that approach would be useful. Like composite value types. > > > Mitar > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > Gunter Hick | Software Engineer | Scientific Games International GmbH > | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 > | (O) +43 1 80100 - 0 > > May be privileged. May be confidential. Please delete if not the addressee. > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users -- http://mitar.tnode.com/ https://twitter.com/mitar_m ___ sqlite-users mailing list sqlite-u
Re: [sqlite] [EXTERNAL] Re: Limit on number of columns in SQLite table
If you need to perform queries over the 100k rows, then normalization is the only practical way and reconstructing a row will similarly slow. But you have stated that you use case is "retrieving complete rows". In this case, SQLite does not need to know the 100k details of the row. Keep your data in whatever format they come in - which is another requirement you have expressed. All you have to extract from the 100k details is the columns that you need to identify the row(s) you want back. Create table gene_data (sample_id integer primary key, name char, raw_data blob); Anything else you need to do with the row data goes into your application. -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Mitar Gesendet: Donnerstag, 17. Oktober 2019 15:46 An: SQLite mailing list Betreff: Re: [sqlite] [EXTERNAL] Re: Limit on number of columns in SQLite table Hi! Thanks for this input. So you are saying that sqlite3_column 100k times per row is slow, but retrieving 100k rows to construct one "original" row will be faster? So not sure if I understand why reading and decoding cells in over multiple columns is so much slower than reading and decoding cells in over multiple rows? Mitar On Thu, Oct 17, 2019 at 3:38 PM Hick Gunter wrote: > > I have the impression that you still do not grasp the folly of a 100k column > schema. > > See the example below, which only has 6 fields. As you can see, each field > requires a Column opcode and arguments (about 10 bytes) and a "register" to > hold the value (48 bytes), which for 100k columns uses about 5.5Megabytes to > retrieve a row from the database. It ill also involve SQLite decoding 100k > field values and your application calling sqlite3_column interface 100k times > for each and every row, which yield an expected performance of about 2 rows > per second. Can you afford to use that much memory and time? > > asql> create temp table genes (id integer primary key, name char, f1 > asql> char, f2 char, f3 char, f4 char); .explain explain select * from > asql> genes; > addr opcode p1p2p3p4 p5 comment > - - -- - > 0 Init 0 13000 Start at 13 > 1 OpenRead 0 2 1 6 00 root=2 iDb=1; genes > 2 Explain2 0 0 SCAN TABLE genes 00 > 3 Rewind 0 12000 > 4 Rowid 0 1 000 r[1]=rowid > 5 Column 0 1 200 r[2]=genes.name > 6 Column 0 2 300 r[3]=genes.f1 > 7 Column 0 3 400 r[4]=genes.f2 > 8 Column 0 4 500 r[5]=genes.f3 > 9 Column 0 5 600 r[6]=genes.f4 > 10 ResultRow 1 6 000 output=r[1..6] > 11Next 0 4 001 > 12Halt 0 0 000 > 13Transaction1 0 1 0 01 usesStmtJournal=0 > 14Goto 0 1 000 > > -Ursprüngliche Nachricht- > Von: sqlite-users > [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von > Mitar > Gesendet: Donnerstag, 17. Oktober 2019 15:11 > An: SQLite mailing list > Betreff: [EXTERNAL] Re: [sqlite] Limit on number of columns in SQLite > table > > Hi! > > On Thu, Oct 17, 2019 at 3:04 PM Eric Grange wrote: > > my suggestion would be to store them as JSON in a blob, and use the > > JSON functions of SQLite to extract the data > > JSON has some crazy limitations like by standard it does not support full > floating point spec, so NaN and infinity cannot be represented there. So JSON > is really no a great format when you want to preserve as much of the input as > possible (like, integers, floats, text, and binary). SQLite seems to be spot > on in this regard. > > But yes, if there would be some other standard to SQLite and supported format > to embed, that approach would be useful. Like composite value types. > > > Mitar > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > Gunter Hick | Software Engineer | Scientific Games International GmbH > | Klitschgasse 2-4, A-1130 Vienna | F
Re: [sqlite] [EXTERNAL] Re: Limit on number of columns in SQLite table
Hi! Thanks for this input. So you are saying that sqlite3_column 100k times per row is slow, but retrieving 100k rows to construct one "original" row will be faster? So not sure if I understand why reading and decoding cells in over multiple columns is so much slower than reading and decoding cells in over multiple rows? Mitar On Thu, Oct 17, 2019 at 3:38 PM Hick Gunter wrote: > > I have the impression that you still do not grasp the folly of a 100k column > schema. > > See the example below, which only has 6 fields. As you can see, each field > requires a Column opcode and arguments (about 10 bytes) and a "register" to > hold the value (48 bytes), which for 100k columns uses about 5.5Megabytes to > retrieve a row from the database. It ill also involve SQLite decoding 100k > field values and your application calling sqlite3_column interface 100k times > for each and every row, which yield an expected performance of about 2 rows > per second. Can you afford to use that much memory and time? > > asql> create temp table genes (id integer primary key, name char, f1 char, f2 > char, f3 char, f4 char); > asql> .explain > asql> explain select * from genes; > addr opcode p1p2p3p4 p5 comment > - - -- - > 0 Init 0 13000 Start at 13 > 1 OpenRead 0 2 1 6 00 root=2 iDb=1; genes > 2 Explain2 0 0 SCAN TABLE genes 00 > 3 Rewind 0 12000 > 4 Rowid 0 1 000 r[1]=rowid > 5 Column 0 1 200 r[2]=genes.name > 6 Column 0 2 300 r[3]=genes.f1 > 7 Column 0 3 400 r[4]=genes.f2 > 8 Column 0 4 500 r[5]=genes.f3 > 9 Column 0 5 600 r[6]=genes.f4 > 10 ResultRow 1 6 000 output=r[1..6] > 11Next 0 4 001 > 12Halt 0 0 000 > 13Transaction1 0 1 0 01 usesStmtJournal=0 > 14Goto 0 1 000 > > -Ursprüngliche Nachricht- > Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im > Auftrag von Mitar > Gesendet: Donnerstag, 17. Oktober 2019 15:11 > An: SQLite mailing list > Betreff: [EXTERNAL] Re: [sqlite] Limit on number of columns in SQLite table > > Hi! > > On Thu, Oct 17, 2019 at 3:04 PM Eric Grange wrote: > > my suggestion would be to store them as JSON in a blob, and use the > > JSON functions of SQLite to extract the data > > JSON has some crazy limitations like by standard it does not support full > floating point spec, so NaN and infinity cannot be represented there. So JSON > is really no a great format when you want to preserve as much of the input as > possible (like, integers, floats, text, and binary). SQLite seems to be spot > on in this regard. > > But yes, if there would be some other standard to SQLite and supported format > to embed, that approach would be useful. Like composite value types. > > > Mitar > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > Gunter Hick | Software Engineer | Scientific Games International GmbH | > Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) > +43 1 80100 - 0 > > May be privileged. May be confidential. Please delete if not the addressee. > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users -- http://mitar.tnode.com/ https://twitter.com/mitar_m ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] Re: Limit on number of columns in SQLite table
I have the impression that you still do not grasp the folly of a 100k column schema. See the example below, which only has 6 fields. As you can see, each field requires a Column opcode and arguments (about 10 bytes) and a "register" to hold the value (48 bytes), which for 100k columns uses about 5.5Megabytes to retrieve a row from the database. It ill also involve SQLite decoding 100k field values and your application calling sqlite3_column interface 100k times for each and every row, which yield an expected performance of about 2 rows per second. Can you afford to use that much memory and time? asql> create temp table genes (id integer primary key, name char, f1 char, f2 char, f3 char, f4 char); asql> .explain asql> explain select * from genes; addr opcode p1p2p3p4 p5 comment - - -- - 0 Init 0 13000 Start at 13 1 OpenRead 0 2 1 6 00 root=2 iDb=1; genes 2 Explain2 0 0 SCAN TABLE genes 00 3 Rewind 0 12000 4 Rowid 0 1 000 r[1]=rowid 5 Column 0 1 200 r[2]=genes.name 6 Column 0 2 300 r[3]=genes.f1 7 Column 0 3 400 r[4]=genes.f2 8 Column 0 4 500 r[5]=genes.f3 9 Column 0 5 600 r[6]=genes.f4 10 ResultRow 1 6 000 output=r[1..6] 11Next 0 4 001 12Halt 0 0 000 13Transaction1 0 1 0 01 usesStmtJournal=0 14Goto 0 1 000 -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Mitar Gesendet: Donnerstag, 17. Oktober 2019 15:11 An: SQLite mailing list Betreff: [EXTERNAL] Re: [sqlite] Limit on number of columns in SQLite table Hi! On Thu, Oct 17, 2019 at 3:04 PM Eric Grange wrote: > my suggestion would be to store them as JSON in a blob, and use the > JSON functions of SQLite to extract the data JSON has some crazy limitations like by standard it does not support full floating point spec, so NaN and infinity cannot be represented there. So JSON is really no a great format when you want to preserve as much of the input as possible (like, integers, floats, text, and binary). SQLite seems to be spot on in this regard. But yes, if there would be some other standard to SQLite and supported format to embed, that approach would be useful. Like composite value types. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0 May be privileged. May be confidential. Please delete if not the addressee. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] Re: Limit on number of columns in SQLite table
Hi! In that case we would have to define a standard BLOB storage format, slightly defeating the idea of using SQLite to define such standard future-proof format. :-) Mitar On Thu, Oct 17, 2019 at 11:19 AM Hick Gunter wrote: > > Since your data is at least mostly opaque in the sense that SQLite is not > expected to interpret the contents, why not split your data into "stuff you > want to query ins SQLite" and "stuff you want to just store"? The former > means individual columns, whereas the latter could be stored in a single BLOB > field, which only your application knows how to extract data from. > > This allows SQLite to efficiently process the fields it needs to know about, > and return BLOB data efficiently as one single field instead of having to > pick it apart into 100k bits. > > -Ursprüngliche Nachricht- > Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im > Auftrag von Mitar > Gesendet: Donnerstag, 17. Oktober 2019 10:56 > An: Richard Hipp > Cc: SQLite mailing list > Betreff: [EXTERNAL] Re: [sqlite] Limit on number of columns in SQLite table > > Hi! > > I can see how this is a reasonable limit when SQLite is used for querying > power it provides. In our case we are really focusing on it as a standard > long-term storage format. So in the "Appropriate Uses For SQLite" document > [1] you have a section called "File archive and/or data container" and this > is why we started considering SQLite as a dataset archive format. We would > not like to store files directly, but contents of those files (like contents > of CSV). But try to not modify them more than necessary. So we got interested > especially in the "SQLite is a good solution for any situation that requires > bundling diverse content into a self-contained and self-describing package > for shipment across a network." statement. So I can understand how supporting > a large number of columns might be inappropriate when you want to run > complicated SQL queries on data, but to just store data and then extract all > rows to do some data processing, Or as the most complicated query it would be > to extract just a subsample of rows. But not really do to any JOIN queries or > something like that. it looks like except for artificial limit in SQLite, > because it is not useful for general case, there is no other reason why it > could not be supported. > > So why not increase the limit to 2 billion, and have it at runtime by default > limited to 2000. And then using PRAGMA one could increase this if needed to 2 > billion? PRAGMA already can decrease the limit, so we can keep the existing > 2000 limit, but to support it without having to recompile, people could > increase it all the way to 2 billion. Is there any significant performance > downside to this? > > [1] https://www.sqlite.org/whentouse.html > > > Mitar > > On Wed, Oct 16, 2019 at 8:21 PM Richard Hipp wrote: > > > > SQLite could, in theory, be enhanced (with just a few minor tweaks) to > > support up to 2 billion columns. But having a relation with a large > > number of columns seems like a very bad idea stylistically. That's > > not how relational databases are intended to be used. Normally when a > > table acquires more than a couple dozen columns, that is a good > > indication that you need normalize and/or refactor your schema. Schema > > designers almost unanimously follow that design principle. And so > > SQLite is optimized for the overwhelmingly common case of a small > > number of columns per table. > > > > Hence, adding the ability to have a table with a huge number of > > columns is not something that I am interested in supporting in SQLite > > at this time. > > > > -- > > D. Richard Hipp > > d...@sqlite.org > > > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > Gunter Hick | Software Engineer | Scientific Games International GmbH | > Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) > +43 1 80100 - 0 > > May be privileged. May be confidential. Please delete if not the addressee. > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users -- http://mitar.tnode.com/ https://twitter.com/mitar_m ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [EXTERNAL] Re: Limit on number of columns in SQLite table
Since your data is at least mostly opaque in the sense that SQLite is not expected to interpret the contents, why not split your data into "stuff you want to query ins SQLite" and "stuff you want to just store"? The former means individual columns, whereas the latter could be stored in a single BLOB field, which only your application knows how to extract data from. This allows SQLite to efficiently process the fields it needs to know about, and return BLOB data efficiently as one single field instead of having to pick it apart into 100k bits. -Ursprüngliche Nachricht- Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im Auftrag von Mitar Gesendet: Donnerstag, 17. Oktober 2019 10:56 An: Richard Hipp Cc: SQLite mailing list Betreff: [EXTERNAL] Re: [sqlite] Limit on number of columns in SQLite table Hi! I can see how this is a reasonable limit when SQLite is used for querying power it provides. In our case we are really focusing on it as a standard long-term storage format. So in the "Appropriate Uses For SQLite" document [1] you have a section called "File archive and/or data container" and this is why we started considering SQLite as a dataset archive format. We would not like to store files directly, but contents of those files (like contents of CSV). But try to not modify them more than necessary. So we got interested especially in the "SQLite is a good solution for any situation that requires bundling diverse content into a self-contained and self-describing package for shipment across a network." statement. So I can understand how supporting a large number of columns might be inappropriate when you want to run complicated SQL queries on data, but to just store data and then extract all rows to do some data processing, Or as the most complicated query it would be to extract just a subsample of rows. But not really do to any JOIN queries or something like that. it looks like except for artificial limit in SQLite, because it is not useful for general case, there is no other reason why it could not be supported. So why not increase the limit to 2 billion, and have it at runtime by default limited to 2000. And then using PRAGMA one could increase this if needed to 2 billion? PRAGMA already can decrease the limit, so we can keep the existing 2000 limit, but to support it without having to recompile, people could increase it all the way to 2 billion. Is there any significant performance downside to this? [1] https://www.sqlite.org/whentouse.html Mitar On Wed, Oct 16, 2019 at 8:21 PM Richard Hipp wrote: > > SQLite could, in theory, be enhanced (with just a few minor tweaks) to > support up to 2 billion columns. But having a relation with a large > number of columns seems like a very bad idea stylistically. That's > not how relational databases are intended to be used. Normally when a > table acquires more than a couple dozen columns, that is a good > indication that you need normalize and/or refactor your schema. Schema > designers almost unanimously follow that design principle. And so > SQLite is optimized for the overwhelmingly common case of a small > number of columns per table. > > Hence, adding the ability to have a table with a huge number of > columns is not something that I am interested in supporting in SQLite > at this time. > > -- > D. Richard Hipp > d...@sqlite.org -- http://mitar.tnode.com/ https://twitter.com/mitar_m ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ Gunter Hick | Software Engineer | Scientific Games International GmbH | Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 1 80100 - 0 May be privileged. May be confidential. Please delete if not the addressee. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users