> I wrote earlier that for us use case where we are reading whole rows is the most common one. > [...] > we are looking for ways to store this in a stable format which will be supported for next 50 years, without modifying to original data too much.
If you do not need access to individual columns too much, but want to still be able to operate on them when needed, my suggestion would be to store them as JSON in a blob, and use the JSON functions of SQLite to extract the data when needed (https://www.sqlite.org/json1.html) This would mean you have both the stability of a text format for long-term storage (JSON) while still having the ability to run queries, index and manipulate data with SQL. I am using such an approach successfully, and when some fields in the JSON are flagged for indexing or direct access (f.i. for summaries / descriptions), I just extract them to a relational table which can then be indexed and used directly as usual (you can use triggers to automate that extraction). You may also be able to use indexes on expression if you only want indexing (https://www.sqlite.org/expridx.html) Eric Le jeu. 17 oct. 2019 à 14:50, Mitar <mmi...@gmail.com> a écrit : > Hi! > > This is getting a bit off topic. > > On Thu, Oct 17, 2019 at 12:07 PM Simon Slavin <slav...@bigfraud.org> > wrote: > > 1) Almost no piece of software can handle a grid 2 billion cells wide. > Excel maxes out at 16,384 columns. Matlab can store and retrieve a cell of > data directly from a file, but it has a max array size of 10000. R maxes > out at 2147483647, which is more than 2 billion. But R has to hold all the > data from a matrix in memory at once and it can't assign enough memory to > one object to hold that many cells. > > Of course, 2 billion is a lot. But 100k is something many ML libraries > support. Pandas, ndarray, R. Not something to magical about that. > > > 2) Object names are not data. They're descriptions in your favourite > human language. They're not meant to have weird sequences of characters in. > > Not sure what this relates to. > > > 3) Lots of CSV import filters ignore a column header row, or can only > create fieldnames with certain limits (max length, no punctuation > characters, etc.). So you should expect to lose fieldnames if you try to > import your data into some new piece of software. > > Does SQLite have limitations on what can be a column name? If not, > then I would not worry what some CSV importers do. We would use a good > one to convert to SQLLite. > > > (4) SQLite stores all the data for a row is together, in a sequence. If > you ask for the data in the 3756th column of a row, SQLite has to read and > parse the data for the first 3755 columns of that row, just to read a > single value from storage. As you can imagine, this is slow and involves a > lot of I/O. And while it happens the row up to that point must all be held > in memory. Consequently, nobody who uses SQLite for its intended purpose > actually does this. I dread to think how slow random access over 2 billion > columns would be in SQLite. > > I wrote earlier that for us use case where we are reading whole rows > is the most common one. > > > Your gene expressions are data. They are not the names of table > entities. They should be stored in a table as other posts suggested. > > Maybe. But often this data is represented as a row of expressions with > columns for each gene. Because this is what is being distributed, we > are looking for ways to store this in a stable format which will be > supported for next 50 years, without modifying to original data too > much. I do hear suggestions to do such transformation, but that is > less ideal for our use case. > > > Mitar > > -- > http://mitar.tnode.com/ > https://twitter.com/mitar_m > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users