Re: [sqlite] Limit on number of columns in SQLite table

2019-11-06 Thread Digital Dog
On Wed, Nov 6, 2019 at 7:22 PM Jens Alfke  wrote:

>
>
> > On Nov 6, 2019, at 9:25 AM, Digital Dog  wrote:
> >
> > If there are millions or billions of rows
> > in the data set I consider it big data and the only reasonable format for
> > storing it is a column store format.
>
> There are many types of stores for "big data". My employer, Couchbase, has
> customers who would disagree with you; they use our document database to
> store millions or billions of rows.
>

I was talking about this specific data set with very well defined columns.
However I'm not sure if Parquet will be able to store this amount of
columns, or query them efficiently.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-11-06 Thread Jens Alfke


> On Nov 6, 2019, at 9:25 AM, Digital Dog  wrote:
> 
> If there are millions or billions of rows
> in the data set I consider it big data and the only reasonable format for
> storing it is a column store format.

There are many types of stores for "big data". My employer, Couchbase, has 
customers who would disagree with you; they use our document database to store 
millions or billions of rows.

It depends on the data type. For highly structured data with many sparse 
columns you may be right. For less structured data a key-value or document 
store is better. I'm sure there are other types of big-data storage I'm unaware 
of.

In the original poster's case, it didn't seem like the data set really had 
zillions of columns, since [IIRC] they didn't need to be queried separately. 
You could put something like that in a key-value store with each value an 
encoded C array, for example.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-11-06 Thread Digital Dog
Hi!

I'm late to the party, but want to contribute. I did not read all messages
in the thread, but in those I've read did not contain a question about
number of ROWS. We know how many columns you desire, but how many rows are
there?

No matter how I like SQLite, I would not store this kind of data in it,
neither short term nor long term. If there are millions or billions of rows
in the data set I consider it big data and the only reasonable format for
storing it is a column store format. It gives up to 1:100 compression
(depending on the data of course) and incredible querying speeds (if it
doesn't have to read some columns it just doesn't do that). If it can skip
row groups altogether based on metadata, it will. None of row storage
engines will give you comparable efficiency. And especially because you do
not need transactional semantic, you can consider faster engines, tailored
to this use case (without WAL, locking or MVCC). The ones I can think of
now are: Apache Parquet, Cassandra and MonetDB, not to mention excellent
implementation and support in Microsoft SQL Server. You have to do your own
research on this subject.


On Thu, Oct 17, 2019 at 7:38 PM Mitar  wrote:

> ""Hi!
>
> On Thu, Oct 17, 2019 at 5:38 PM Jens Alfke  wrote:
> > Why should SQLite make changes, which would introduce performance
> problems if used, just to save your particular application the trouble of
> concatenating some vectors into single columns, when it uses SQLite for an
> edge use-case that’s pretty far removed from its main purpose?
>
> Then maybe section "File archive and/or data container" in
> "Appropriate Uses For SQLite" should explain that this is not the
> purpose of SQLite anymore. Because "SQLite is a good solution for any
> situation that requires bundling diverse content into a self-contained
> and self-describing package for shipment across a network." seem to
> work only when "diverse" is a table with less 2000 columns. Somehow
> describing a table with key/value columns can hardly be called
> self-describing.
>
> I am on purpose ironic, because I am not sure if talking about "main
> purpose" is really a constructive conversation here if there is a list
> of many "non-main" but still suggested use cases for SQLite. Not to
> mention the "Data analysis" use case, where again, if I am used to do
> analysis on datasets with many columns now would have to change the
> algorithms how I do my analysis to adapt to limited number of columns.
> It does not seem that putting vectors into single columns would really
> enable many "Data analysis" options inside SQLite. I am even surprised
> that it says "Many bioinformatics researchers use SQLite in this way."
> With limit on 2000 columns this is a very strange claim. I would love
> to see a reference here and see how they do that. I might learn
> something new.
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
""Hi!

On Thu, Oct 17, 2019 at 5:38 PM Jens Alfke  wrote:
> Why should SQLite make changes, which would introduce performance problems if 
> used, just to save your particular application the trouble of concatenating 
> some vectors into single columns, when it uses SQLite for an edge use-case 
> that’s pretty far removed from its main purpose?

Then maybe section "File archive and/or data container" in
"Appropriate Uses For SQLite" should explain that this is not the
purpose of SQLite anymore. Because "SQLite is a good solution for any
situation that requires bundling diverse content into a self-contained
and self-describing package for shipment across a network." seem to
work only when "diverse" is a table with less 2000 columns. Somehow
describing a table with key/value columns can hardly be called
self-describing.

I am on purpose ironic, because I am not sure if talking about "main
purpose" is really a constructive conversation here if there is a list
of many "non-main" but still suggested use cases for SQLite. Not to
mention the "Data analysis" use case, where again, if I am used to do
analysis on datasets with many columns now would have to change the
algorithms how I do my analysis to adapt to limited number of columns.
It does not seem that putting vectors into single columns would really
enable many "Data analysis" options inside SQLite. I am even surprised
that it says "Many bioinformatics researchers use SQLite in this way."
With limit on 2000 columns this is a very strange claim. I would love
to see a reference here and see how they do that. I might learn
something new.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Jens Alfke


—Jens 

> On Oct 17, 2019, at 1:56 AM, Mitar  wrote:
> 
> So why not increase the limit to 2 billion

Why should SQLite make changes, which would introduce performance problems if 
used, just to save your particular application the trouble of concatenating 
some vectors into single columns, when it uses SQLite for an edge use-case 
that’s pretty far removed from its main purpose?

It may seem like a harmless change to you, but I’m sure it would result in 
increased support work. Someone will report extremely slow query performance, 
and after several back-and-forth emails will mention that, oh yeah, this query 
does access the five millionth column in a table...

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
Hi!

Oh, or we could just split CSV into separate lines, and then just
store one line per SQLite row, into one column. Then we do not have to
use JSON or something.

That would work for CSV files. For other types of inputs we might be
able to find some other similar approach.

So generally the main use case would be sub-sampling rows.


Mitar

On Thu, Oct 17, 2019 at 4:11 PM Donald Griggs  wrote:
>
> So if character-separated values (CSV-ish) were originally your preferred
> import format, would using that format for the blob's work for you?
>
> E.g., Suppose you need to index the first two fields only, and so can use a
> blob column for the bulk of the record.  If the records were supplied as:
>  MyFirstField~MySecondField~thousands|of|data|items|...
> and you imported these records into your 3-column table, defining tilde (~)
> as the separator, you could retain a simple format.
>
> If ever needed, you can easily export the table with the indexed columns
> treated like the bulk of the data
> from the sqlite3 command line utility:
> .mode list
> .separator |
> SELECT * FROM MyTable;
>
>MyFirstField|MySecondField|thousands|of|data|items|...
>
>
>
>
>
>
>
> On Thu, Oct 17, 2019 at 9:10 AM Mitar  wrote:
>
> > Hi!
> >
> > On Thu, Oct 17, 2019 at 3:04 PM Eric Grange  wrote:
> > > my suggestion would be to store them as JSON in a blob, and use the JSON
> > > functions of SQLite to extract the data
> >
> > JSON has some crazy limitations like by standard it does not support
> > full floating point spec, so NaN and infinity cannot be represented
> > there. So JSON is really no a great format when you want to preserve
> > as much of the input as possible (like, integers, floats, text, and
> > binary). SQLite seems to be spot on in this regard.
> >
> > But yes, if there would be some other standard to SQLite and supported
> > format to embed, that approach would be useful. Like composite value
> > types.
> >
> >
> > Mitar
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Donald Griggs
So if character-separated values (CSV-ish) were originally your preferred
import format, would using that format for the blob's work for you?

E.g., Suppose you need to index the first two fields only, and so can use a
blob column for the bulk of the record.  If the records were supplied as:
 MyFirstField~MySecondField~thousands|of|data|items|...
and you imported these records into your 3-column table, defining tilde (~)
as the separator, you could retain a simple format.

If ever needed, you can easily export the table with the indexed columns
treated like the bulk of the data
from the sqlite3 command line utility:
.mode list
.separator |
SELECT * FROM MyTable;

   MyFirstField|MySecondField|thousands|of|data|items|...







On Thu, Oct 17, 2019 at 9:10 AM Mitar  wrote:

> Hi!
>
> On Thu, Oct 17, 2019 at 3:04 PM Eric Grange  wrote:
> > my suggestion would be to store them as JSON in a blob, and use the JSON
> > functions of SQLite to extract the data
>
> JSON has some crazy limitations like by standard it does not support
> full floating point spec, so NaN and infinity cannot be represented
> there. So JSON is really no a great format when you want to preserve
> as much of the input as possible (like, integers, floats, text, and
> binary). SQLite seems to be spot on in this regard.
>
> But yes, if there would be some other standard to SQLite and supported
> format to embed, that approach would be useful. Like composite value
> types.
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
Hi!

On Thu, Oct 17, 2019 at 3:04 PM Eric Grange  wrote:
> my suggestion would be to store them as JSON in a blob, and use the JSON
> functions of SQLite to extract the data

JSON has some crazy limitations like by standard it does not support
full floating point spec, so NaN and infinity cannot be represented
there. So JSON is really no a great format when you want to preserve
as much of the input as possible (like, integers, floats, text, and
binary). SQLite seems to be spot on in this regard.

But yes, if there would be some other standard to SQLite and supported
format to embed, that approach would be useful. Like composite value
types.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Eric Grange
> I wrote earlier that for us use case where we are reading whole rows is
the most common one.
> [...]
> we are looking for ways to store this in a stable format which will be
supported for next 50 years, without modifying to original data too much.

If you do not need access to individual columns too much, but want to still
be able to operate on them when needed,
my suggestion would be to store them as JSON in a blob, and use the JSON
functions of SQLite to extract the data
when needed (https://www.sqlite.org/json1.html)

This would mean you have both the stability of a text format for long-term
storage (JSON) while still having
the ability to run queries, index and manipulate data with SQL.

I am using such an approach successfully, and when some fields in the JSON
are flagged for indexing or direct access
(f.i. for summaries / descriptions), I just extract them to a relational
table which can then be indexed and used directly
as usual (you can use triggers to automate that extraction).
You may also be able to use indexes on expression if you only want indexing
(https://www.sqlite.org/expridx.html)

Eric

Le jeu. 17 oct. 2019 à 14:50, Mitar  a écrit :

> Hi!
>
> This is getting a bit off topic.
>
> On Thu, Oct 17, 2019 at 12:07 PM Simon Slavin 
> wrote:
> > 1) Almost no piece of software can handle a grid 2 billion cells wide.
> Excel maxes out at 16,384 columns.  Matlab can store and retrieve a cell of
> data directly from a file, but it has a max array size of 1.  R maxes
> out at 2147483647, which is more than 2 billion.  But R has to hold all the
> data from a matrix in memory at once and it can't assign enough memory to
> one object to hold that many cells.
>
> Of course, 2 billion is a lot. But 100k is something many ML libraries
> support. Pandas, ndarray, R. Not something to magical about that.
>
> > 2) Object names are not data.  They're descriptions in your favourite
> human language.  They're not meant to have weird sequences of characters in.
>
> Not sure what this relates to.
>
> > 3) Lots of CSV import filters ignore a column header row, or can only
> create fieldnames with certain limits (max length, no punctuation
> characters, etc.).  So you should expect to lose fieldnames if you try to
> import your data into some new piece of software.
>
> Does SQLite have limitations on what can be a column name? If not,
> then I would not worry what some CSV importers do. We would use a good
> one to convert to SQLLite.
>
> > (4) SQLite stores all the data for a row is together, in a sequence.  If
> you ask for the data in the 3756th column of a row, SQLite has to read and
> parse the data for the first 3755 columns of that row, just to read a
> single value from storage.  As you can imagine, this is slow and involves a
> lot of I/O.  And while it happens the row up to that point must all be held
> in memory.  Consequently, nobody who uses SQLite for its intended purpose
> actually does this.  I dread to think how slow random access over 2 billion
> columns would be in SQLite.
>
> I wrote earlier that for us use case where we are reading whole rows
> is the most common one.
>
> > Your gene expressions are data.  They are not the names of table
> entities.  They should be stored in a table as other posts suggested.
>
> Maybe. But often this data is represented as a row of expressions with
> columns for each gene. Because this is what is being distributed, we
> are looking for ways to store this in a stable format which will be
> supported for next 50 years, without modifying to original data too
> much. I do hear suggestions to do such transformation, but that is
> less ideal for our use case.
>
>
> Mitar
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
Hi!

This is getting a bit off topic.

On Thu, Oct 17, 2019 at 12:07 PM Simon Slavin  wrote:
> 1) Almost no piece of software can handle a grid 2 billion cells wide.  Excel 
> maxes out at 16,384 columns.  Matlab can store and retrieve a cell of data 
> directly from a file, but it has a max array size of 1.  R maxes out at 
> 2147483647, which is more than 2 billion.  But R has to hold all the data 
> from a matrix in memory at once and it can't assign enough memory to one 
> object to hold that many cells.

Of course, 2 billion is a lot. But 100k is something many ML libraries
support. Pandas, ndarray, R. Not something to magical about that.

> 2) Object names are not data.  They're descriptions in your favourite human 
> language.  They're not meant to have weird sequences of characters in.

Not sure what this relates to.

> 3) Lots of CSV import filters ignore a column header row, or can only create 
> fieldnames with certain limits (max length, no punctuation characters, etc.). 
>  So you should expect to lose fieldnames if you try to import your data into 
> some new piece of software.

Does SQLite have limitations on what can be a column name? If not,
then I would not worry what some CSV importers do. We would use a good
one to convert to SQLLite.

> (4) SQLite stores all the data for a row is together, in a sequence.  If you 
> ask for the data in the 3756th column of a row, SQLite has to read and parse 
> the data for the first 3755 columns of that row, just to read a single value 
> from storage.  As you can imagine, this is slow and involves a lot of I/O.  
> And while it happens the row up to that point must all be held in memory.  
> Consequently, nobody who uses SQLite for its intended purpose actually does 
> this.  I dread to think how slow random access over 2 billion columns would 
> be in SQLite.

I wrote earlier that for us use case where we are reading whole rows
is the most common one.

> Your gene expressions are data.  They are not the names of table entities.  
> They should be stored in a table as other posts suggested.

Maybe. But often this data is represented as a row of expressions with
columns for each gene. Because this is what is being distributed, we
are looking for ways to store this in a stable format which will be
supported for next 50 years, without modifying to original data too
much. I do hear suggestions to do such transformation, but that is
less ideal for our use case.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Simon Slavin
On 17 Oct 2019, at 9:56am, Mitar  wrote:

> I can understand how supporting a large number of columns might be
> inappropriate when you want to run complicated SQL queries on data,
> but to just store data and then extract all rows to do some data
> processing, Or as the most complicated query it would be to extract
> just a subsample of rows. But not really do to any JOIN queries or
> something like that. it looks like except for artificial limit in
> SQLite, because it is not useful for general case, there is no other
> reason why it could not be supported.
> 
> So why not increase the limit to 2 billion

1) Almost no piece of software can handle a grid 2 billion cells wide.  Excel 
maxes out at 16,384 columns.  Matlab can store and retrieve a cell of data 
directly from a file, but it has a max array size of 1.  R maxes out at 
2147483647, which is more than 2 billion.  But R has to hold all the data from 
a matrix in memory at once and it can't assign enough memory to one object to 
hold that many cells.

2) Object names are not data.  They're descriptions in your favourite human 
language.  They're not meant to have weird sequences of characters in.

3) Lots of CSV import filters ignore a column header row, or can only create 
fieldnames with certain limits (max length, no punctuation characters, etc.).  
So you should expect to lose fieldnames if you try to import your data into 
some new piece of software.

And here't the dealbreaker:

(4) SQLite stores all the data for a row is together, in a sequence.  If you 
ask for the data in the 3756th column of a row, SQLite has to read and parse 
the data for the first 3755 columns of that row, just to read a single value 
from storage.  As you can imagine, this is slow and involves a lot of I/O.  And 
while it happens the row up to that point must all be held in memory.  
Consequently, nobody who uses SQLite for its intended purpose actually does 
this.  I dread to think how slow random access over 2 billion columns would be 
in SQLite.

So rereading the text from our message I quoted at the top.  No.  Nobody would 
ever use SQLite to store a table 2 billion columns wide even if they didn't 
intend to do anything complicated to it.  Because it would take a ridiculous 
amount of time and memory just to insert data in to a table that wide.

Your gene expressions are data.  They are not the names of table entities.  
They should be stored in a table as other posts suggested.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-17 Thread Mitar
Hi!

I can see how this is a reasonable limit when SQLite is used for
querying power it provides. In our case we are really focusing on it
as a standard long-term storage format. So in the "Appropriate Uses
For SQLite" document [1] you have a section called "File archive
and/or data container" and this is why we started considering SQLite
as a dataset archive format. We would not like to store files
directly, but contents of those files (like contents of CSV). But try
to not modify them more than necessary. So we got interested
especially in the "SQLite is a good solution for any situation that
requires bundling diverse content into a self-contained and
self-describing package for shipment across a network." statement. So
I can understand how supporting a large number of columns might be
inappropriate when you want to run complicated SQL queries on data,
but to just store data and then extract all rows to do some data
processing, Or as the most complicated query it would be to extract
just a subsample of rows. But not really do to any JOIN queries or
something like that. it looks like except for artificial limit in
SQLite, because it is not useful for general case, there is no other
reason why it could not be supported.

So why not increase the limit to 2 billion, and have it at runtime by
default limited to 2000. And then using PRAGMA one could increase this
if needed to 2 billion? PRAGMA already can decrease the limit, so we
can keep the existing 2000 limit, but to support it without having to
recompile, people could increase it all the way to 2 billion. Is there
any significant performance downside to this?

[1] https://www.sqlite.org/whentouse.html


Mitar

On Wed, Oct 16, 2019 at 8:21 PM Richard Hipp  wrote:
>
> SQLite could, in theory, be enhanced (with just a few minor tweaks) to
> support up to 2 billion columns.  But having a relation with a large
> number of columns seems like a very bad idea stylistically.  That's
> not how relational databases are intended to be used.  Normally when a
> table acquires more than a couple dozen columns, that is a good
> indication that you need normalize and/or refactor your schema. Schema
> designers almost unanimously follow that design principle.  And so
> SQLite is optimized for the overwhelmingly common case of a small
> number of columns per table.
>
> Hence, adding the ability to have a table with a huge number of
> columns is not something that I am interested in supporting in SQLite
> at this time.
>
> --
> D. Richard Hipp
> d...@sqlite.org



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-16 Thread Jens Alfke

> On Oct 16, 2019, at 6:08 AM, Mitar  wrote:
> 
> Quite
> some of datasets we are dealing with have 100k or so columns.

There was a thread about this a few months ago. You Should not store every 
number of a huge vector in a separate column. You don’t need to individually 
query on every individual item, do you? Just store the vector as one column — 
JSON works well, or a blob containing a C array.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-16 Thread Richard Hipp
SQLite could, in theory, be enhanced (with just a few minor tweaks) to
support up to 2 billion columns.  But having a relation with a large
number of columns seems like a very bad idea stylistically.  That's
not how relational databases are intended to be used.  Normally when a
table acquires more than a couple dozen columns, that is a good
indication that you need normalize and/or refactor your schema. Schema
designers almost unanimously follow that design principle.  And so
SQLite is optimized for the overwhelmingly common case of a small
number of columns per table.

Hence, adding the ability to have a table with a huge number of
columns is not something that I am interested in supporting in SQLite
at this time.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-16 Thread Mitar
Hi!

On Wed, Oct 16, 2019 at 3:29 PM Richard Hipp  wrote:
> Are you trying to store a big matrix with approx 100k columns?  A
> better way to do that in a relational database (*any* relational
> database, not just SQLite) is to store one entry per matrix elements:

Sure, this is useful for sparse matrix. But in our case we would like
to preserve as much as possible of the original dataset structure. So
those types of transformations then require also users to restructure
data back. For example, Pandas DataFrame can be easily read out of the
SQLite, but if it is converted in a way you described, then user has
to make one more step to move things around.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_table.html

I would hope that if this is something which is generally a good idea,
that databases would internally do such mapping, while exposing
regular structure back to the user.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit on number of columns in SQLite table

2019-10-16 Thread Richard Hipp
On 10/16/19, Mitar  wrote:
> Hi!
>
> We are considering using SQLite as a ML dataset archival format for
> datasets in OpenML (https://www.openml.org/). When investigating it,
> we noticed that it has a very low limit on number of columns. Quite
> some of datasets we are dealing with have 100k or so columns. Are
> there any fundamental reasons why this limit is so low (even if we
> extend it during compiling, it can be at most 32k columns), while
> others are comfortably large? Any plans to extend this limit in the
> future?

Are you trying to store a big matrix with approx 100k columns?  A
better way to do that in a relational database (*any* relational
database, not just SQLite) is to store one entry per matrix elements:

CREATE TABLE matrix(row_number INT, column_number INT, value ANY);

Only three columns are required in such a relation, regardless of the
number of columns in the matrix.

If performance and storage efficiency are a high priority and if the
number of rows and columns are limited to 2 billion each, then you can
combine the row and column number into a single integer key:

cell_number := row_number*2147483648 + column_number;

Then make your table just this:

CREATE TABLE matrix(cell_number INTEGER PRIMARY KEY, value ANY);

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Limit on number of columns in SQLite table

2019-10-16 Thread Mitar
Hi!

We are considering using SQLite as a ML dataset archival format for
datasets in OpenML (https://www.openml.org/). When investigating it,
we noticed that it has a very low limit on number of columns. Quite
some of datasets we are dealing with have 100k or so columns. Are
there any fundamental reasons why this limit is so low (even if we
extend it during compiling, it can be at most 32k columns), while
others are comfortably large? Any plans to extend this limit in the
future?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users