Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
""Hi!

On Thu, Oct 17, 2019 at 5:38 PM Jens Alfke  wrote:
> Why should SQLite make changes, which would introduce performance problems if 
> used, just to save your particular application the trouble of concatenating 
> some vectors into single columns, when it uses SQLite for an edge use-case 
> that’s pretty far removed from its main purpose?

Then maybe section "File archive and/or data container" in
"Appropriate Uses For SQLite" should explain that this is not the
purpose of SQLite anymore. Because "SQLite is a good solution for any
situation that requires bundling diverse content into a self-contained
and self-describing package for shipment across a network." seem to
work only when "diverse" is a table with less 2000 columns. Somehow
describing a table with key/value columns can hardly be called
self-describing.

I am on purpose ironic, because I am not sure if talking about "main
purpose" is really a constructive conversation here if there is a list
of many "non-main" but still suggested use cases for SQLite. Not to
mention the "Data analysis" use case, where again, if I am used to do
analysis on datasets with many columns now would have to change the
algorithms how I do my analysis to adapt to limited number of columns.
It does not seem that putting vectors into single columns would really
enable many "Data analysis" options inside SQLite. I am even surprised
that it says "Many bioinformatics researchers use SQLite in this way."
With limit on 2000 columns this is a very strange claim. I would love
to see a reference here and see how they do that. I might learn
something new.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
Hi!

Oh, or we could just split CSV into separate lines, and then just
store one line per SQLite row, into one column. Then we do not have to
use JSON or something.

That would work for CSV files. For other types of inputs we might be
able to find some other similar approach.

So generally the main use case would be sub-sampling rows.


Mitar

On Thu, Oct 17, 2019 at 4:11 PM Donald Griggs  wrote:
>
> So if character-separated values (CSV-ish) were originally your preferred
> import format, would using that format for the blob's work for you?
>
> E.g., Suppose you need to index the first two fields only, and so can use a
> blob column for the bulk of the record.  If the records were supplied as:
>  MyFirstField~MySecondField~thousands|of|data|items|...
> and you imported these records into your 3-column table, defining tilde (~)
> as the separator, you could retain a simple format.
>
> If ever needed, you can easily export the table with the indexed columns
> treated like the bulk of the data
> from the sqlite3 command line utility:
> .mode list
> .separator |
> SELECT * FROM MyTable;
>
>MyFirstField|MySecondField|thousands|of|data|items|...
>
>
>
>
>
>
>
> On Thu, Oct 17, 2019 at 9:10 AM Mitar  wrote:
>
> > Hi!
> >
> > On Thu, Oct 17, 2019 at 3:04 PM Eric Grange  wrote:
> > > my suggestion would be to store them as JSON in a blob, and use the JSON
> > > functions of SQLite to extract the data
> >
> > JSON has some crazy limitations like by standard it does not support
> > full floating point spec, so NaN and infinity cannot be represented
> > there. So JSON is really no a great format when you want to preserve
> > as much of the input as possible (like, integers, floats, text, and
> > binary). SQLite seems to be spot on in this regard.
> >
> > But yes, if there would be some other standard to SQLite and supported
> > format to embed, that approach would be useful. Like composite value
> > types.
> >
> >
> > Mitar
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Re: Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
 Hi!

Thanks for this input. So you are saying that sqlite3_column 100k
times per row is slow, but retrieving 100k rows to construct one
"original" row will be faster? So not sure if I understand why reading
and decoding cells in over multiple columns is so much slower than
reading and decoding cells in over multiple rows?

Mitar

On Thu, Oct 17, 2019 at 3:38 PM Hick Gunter  wrote:
>
> I have the impression that you still do not grasp the folly of a 100k column 
> schema.
>
> See the example below, which only has 6 fields. As you can see, each field 
> requires a Column opcode and arguments (about 10 bytes) and a "register" to 
> hold the value (48 bytes), which for 100k columns uses about 5.5Megabytes to 
> retrieve a row from the database. It ill also involve SQLite decoding 100k 
> field values and your application calling sqlite3_column interface 100k times 
> for each and every row, which yield an expected performance of about 2 rows 
> per second. Can you afford to use that much memory and time?
>
> asql> create temp table genes (id integer primary key, name char, f1 char, f2 
> char, f3 char, f4 char);
> asql> .explain
> asql> explain select * from genes;
> addr  opcode p1p2p3p4 p5  comment
>   -        -  --  -
> 0 Init   0 13000  Start at 13
> 1 OpenRead   0 2 1 6  00  root=2 iDb=1; genes
> 2 Explain2 0 0 SCAN TABLE genes  00
> 3 Rewind 0 12000
> 4   Rowid  0 1 000  r[1]=rowid
> 5   Column 0 1 200  r[2]=genes.name
> 6   Column 0 2 300  r[3]=genes.f1
> 7   Column 0 3 400  r[4]=genes.f2
> 8   Column 0 4 500  r[5]=genes.f3
> 9   Column 0 5 600  r[6]=genes.f4
> 10  ResultRow  1 6 000  output=r[1..6]
> 11Next   0 4 001
> 12Halt   0 0 000
> 13Transaction1 0 1 0  01  usesStmtJournal=0
> 14Goto   0 1 000
>
> -Ursprüngliche Nachricht-
> Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
> Auftrag von Mitar
> Gesendet: Donnerstag, 17. Oktober 2019 15:11
> An: SQLite mailing list 
> Betreff: [EXTERNAL] Re: [sqlite] Limit on number of columns in SQLite table
>
> Hi!
>
> On Thu, Oct 17, 2019 at 3:04 PM Eric Grange  wrote:
> > my suggestion would be to store them as JSON in a blob, and use the
> > JSON functions of SQLite to extract the data
>
> JSON has some crazy limitations like by standard it does not support full 
> floating point spec, so NaN and infinity cannot be represented there. So JSON 
> is really no a great format when you want to preserve as much of the input as 
> possible (like, integers, floats, text, and binary). SQLite seems to be spot 
> on in this regard.
>
> But yes, if there would be some other standard to SQLite and supported format 
> to embed, that approach would be useful. Like composite value types.
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> ___
>  Gunter Hick | Software Engineer | Scientific Games International GmbH | 
> Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) 
> +43 1 80100 - 0
>
> May be privileged. May be confidential. Please delete if not the addressee.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
Hi!

On Thu, Oct 17, 2019 at 3:04 PM Eric Grange  wrote:
> my suggestion would be to store them as JSON in a blob, and use the JSON
> functions of SQLite to extract the data

JSON has some crazy limitations like by standard it does not support
full floating point spec, so NaN and infinity cannot be represented
there. So JSON is really no a great format when you want to preserve
as much of the input as possible (like, integers, floats, text, and
binary). SQLite seems to be spot on in this regard.

But yes, if there would be some other standard to SQLite and supported
format to embed, that approach would be useful. Like composite value
types.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
Hi!

This is getting a bit off topic.

On Thu, Oct 17, 2019 at 12:07 PM Simon Slavin  wrote:
> 1) Almost no piece of software can handle a grid 2 billion cells wide.  Excel 
> maxes out at 16,384 columns.  Matlab can store and retrieve a cell of data 
> directly from a file, but it has a max array size of 1.  R maxes out at 
> 2147483647, which is more than 2 billion.  But R has to hold all the data 
> from a matrix in memory at once and it can't assign enough memory to one 
> object to hold that many cells.

Of course, 2 billion is a lot. But 100k is something many ML libraries
support. Pandas, ndarray, R. Not something to magical about that.

> 2) Object names are not data.  They're descriptions in your favourite human 
> language.  They're not meant to have weird sequences of characters in.

Not sure what this relates to.

> 3) Lots of CSV import filters ignore a column header row, or can only create 
> fieldnames with certain limits (max length, no punctuation characters, etc.). 
>  So you should expect to lose fieldnames if you try to import your data into 
> some new piece of software.

Does SQLite have limitations on what can be a column name? If not,
then I would not worry what some CSV importers do. We would use a good
one to convert to SQLLite.

> (4) SQLite stores all the data for a row is together, in a sequence.  If you 
> ask for the data in the 3756th column of a row, SQLite has to read and parse 
> the data for the first 3755 columns of that row, just to read a single value 
> from storage.  As you can imagine, this is slow and involves a lot of I/O.  
> And while it happens the row up to that point must all be held in memory.  
> Consequently, nobody who uses SQLite for its intended purpose actually does 
> this.  I dread to think how slow random access over 2 billion columns would 
> be in SQLite.

I wrote earlier that for us use case where we are reading whole rows
is the most common one.

> Your gene expressions are data.  They are not the names of table entities.  
> They should be stored in a table as other posts suggested.

Maybe. But often this data is represented as a row of expressions with
columns for each gene. Because this is what is being distributed, we
are looking for ways to store this in a stable format which will be
supported for next 50 years, without modifying to original data too
much. I do hear suggestions to do such transformation, but that is
less ideal for our use case.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Re: Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
Hi!

In that case we would have to define a standard BLOB storage format,
slightly defeating the idea of using SQLite to define such standard
future-proof format. :-)


Mitar

On Thu, Oct 17, 2019 at 11:19 AM Hick Gunter  wrote:
>
> Since your data is at least mostly opaque in the sense that SQLite is not 
> expected to interpret the contents, why not split your data into "stuff you 
> want to query ins SQLite" and "stuff you want to just store"? The former 
> means individual columns, whereas the latter could be stored in a single BLOB 
> field, which only your application knows how to extract data from.
>
> This allows SQLite to efficiently process the fields it needs to know about, 
> and return BLOB data efficiently as one single field instead of having to 
> pick it apart into 100k bits.
>
> -Ursprüngliche Nachricht-
> Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
> Auftrag von Mitar
> Gesendet: Donnerstag, 17. Oktober 2019 10:56
> An: Richard Hipp 
> Cc: SQLite mailing list 
> Betreff: [EXTERNAL] Re: [sqlite] Limit on number of columns in SQLite table
>
> Hi!
>
> I can see how this is a reasonable limit when SQLite is used for querying 
> power it provides. In our case we are really focusing on it as a standard 
> long-term storage format. So in the "Appropriate Uses For SQLite" document 
> [1] you have a section called "File archive and/or data container" and this 
> is why we started considering SQLite as a dataset archive format. We would 
> not like to store files directly, but contents of those files (like contents 
> of CSV). But try to not modify them more than necessary. So we got interested 
> especially in the "SQLite is a good solution for any situation that requires 
> bundling diverse content into a self-contained and self-describing package 
> for shipment across a network." statement. So I can understand how supporting 
> a large number of columns might be inappropriate when you want to run 
> complicated SQL queries on data, but to just store data and then extract all 
> rows to do some data processing, Or as the most complicated query it would be 
> to extract just a subsample of rows. But not really do to any JOIN queries or 
> something like that. it looks like except for artificial limit in SQLite, 
> because it is not useful for general case, there is no other reason why it 
> could not be supported.
>
> So why not increase the limit to 2 billion, and have it at runtime by default 
> limited to 2000. And then using PRAGMA one could increase this if needed to 2 
> billion? PRAGMA already can decrease the limit, so we can keep the existing 
> 2000 limit, but to support it without having to recompile, people could 
> increase it all the way to 2 billion. Is there any significant performance 
> downside to this?
>
> [1] https://www.sqlite.org/whentouse.html
>
>
> Mitar
>
> On Wed, Oct 16, 2019 at 8:21 PM Richard Hipp  wrote:
> >
> > SQLite could, in theory, be enhanced (with just a few minor tweaks) to
> > support up to 2 billion columns.  But having a relation with a large
> > number of columns seems like a very bad idea stylistically.  That's
> > not how relational databases are intended to be used.  Normally when a
> > table acquires more than a couple dozen columns, that is a good
> > indication that you need normalize and/or refactor your schema. Schema
> > designers almost unanimously follow that design principle.  And so
> > SQLite is optimized for the overwhelmingly common case of a small
> > number of columns per table.
> >
> > Hence, adding the ability to have a table with a huge number of
> > columns is not something that I am interested in supporting in SQLite
> > at this time.
> >
> > --
> > D. Richard Hipp
> > d...@sqlite.org
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> ___
>  Gunter Hick | Software Engineer | Scientific Games International GmbH | 
> Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) 
> +43 1 80100 - 0
>
> May be privileged. May be confidential. Please delete if not the addressee.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
Hi!

I can see how this is a reasonable limit when SQLite is used for
querying power it provides. In our case we are really focusing on it
as a standard long-term storage format. So in the "Appropriate Uses
For SQLite" document [1] you have a section called "File archive
and/or data container" and this is why we started considering SQLite
as a dataset archive format. We would not like to store files
directly, but contents of those files (like contents of CSV). But try
to not modify them more than necessary. So we got interested
especially in the "SQLite is a good solution for any situation that
requires bundling diverse content into a self-contained and
self-describing package for shipment across a network." statement. So
I can understand how supporting a large number of columns might be
inappropriate when you want to run complicated SQL queries on data,
but to just store data and then extract all rows to do some data
processing, Or as the most complicated query it would be to extract
just a subsample of rows. But not really do to any JOIN queries or
something like that. it looks like except for artificial limit in
SQLite, because it is not useful for general case, there is no other
reason why it could not be supported.

So why not increase the limit to 2 billion, and have it at runtime by
default limited to 2000. And then using PRAGMA one could increase this
if needed to 2 billion? PRAGMA already can decrease the limit, so we
can keep the existing 2000 limit, but to support it without having to
recompile, people could increase it all the way to 2 billion. Is there
any significant performance downside to this?

[1] https://www.sqlite.org/whentouse.html


Mitar

On Wed, Oct 16, 2019 at 8:21 PM Richard Hipp  wrote:
>
> SQLite could, in theory, be enhanced (with just a few minor tweaks) to
> support up to 2 billion columns.  But having a relation with a large
> number of columns seems like a very bad idea stylistically.  That's
> not how relational databases are intended to be used.  Normally when a
> table acquires more than a couple dozen columns, that is a good
> indication that you need normalize and/or refactor your schema. Schema
> designers almost unanimously follow that design principle.  And so
> SQLite is optimized for the overwhelmingly common case of a small
> number of columns per table.
>
> Hence, adding the ability to have a table with a huge number of
> columns is not something that I am interested in supporting in SQLite
> at this time.
>
> --
> D. Richard Hipp
> d...@sqlite.org



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-16 Thread Mitar
Hi!

On Wed, Oct 16, 2019 at 3:29 PM Richard Hipp  wrote:
> Are you trying to store a big matrix with approx 100k columns?  A
> better way to do that in a relational database (*any* relational
> database, not just SQLite) is to store one entry per matrix elements:

Sure, this is useful for sparse matrix. But in our case we would like
to preserve as much as possible of the original dataset structure. So
those types of transformations then require also users to restructure
data back. For example, Pandas DataFrame can be easily read out of the
SQLite, but if it is converted in a way you described, then user has
to make one more step to move things around.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_table.html

I would hope that if this is something which is generally a good idea,
that databases would internally do such mapping, while exposing
regular structure back to the user.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [EXTERNAL] Limit on number of columns in SQLite table

2019-10-16 Thread Mitar
Hi!

On Wed, Oct 16, 2019 at 3:16 PM Hick Gunter  wrote:
> 100k distinct column names? Or is that 1 repeats of 10 attributes?

100k distinct names. Like each column a different gene expression.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Limit on number of columns in SQLite table

2019-10-16 Thread Mitar
Hi!

We are considering using SQLite as a ML dataset archival format for
datasets in OpenML (https://www.openml.org/). When investigating it,
we noticed that it has a very low limit on number of columns. Quite
some of datasets we are dealing with have 100k or so columns. Are
there any fundamental reasons why this limit is so low (even if we
extend it during compiling, it can be at most 32k columns), while
others are comfortably large? Any plans to extend this limit in the
future?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users