Hi! This is getting a bit off topic.
On Thu, Oct 17, 2019 at 12:07 PM Simon Slavin <slav...@bigfraud.org> wrote: > 1) Almost no piece of software can handle a grid 2 billion cells wide. Excel > maxes out at 16,384 columns. Matlab can store and retrieve a cell of data > directly from a file, but it has a max array size of 10000. R maxes out at > 2147483647, which is more than 2 billion. But R has to hold all the data > from a matrix in memory at once and it can't assign enough memory to one > object to hold that many cells. Of course, 2 billion is a lot. But 100k is something many ML libraries support. Pandas, ndarray, R. Not something to magical about that. > 2) Object names are not data. They're descriptions in your favourite human > language. They're not meant to have weird sequences of characters in. Not sure what this relates to. > 3) Lots of CSV import filters ignore a column header row, or can only create > fieldnames with certain limits (max length, no punctuation characters, etc.). > So you should expect to lose fieldnames if you try to import your data into > some new piece of software. Does SQLite have limitations on what can be a column name? If not, then I would not worry what some CSV importers do. We would use a good one to convert to SQLLite. > (4) SQLite stores all the data for a row is together, in a sequence. If you > ask for the data in the 3756th column of a row, SQLite has to read and parse > the data for the first 3755 columns of that row, just to read a single value > from storage. As you can imagine, this is slow and involves a lot of I/O. > And while it happens the row up to that point must all be held in memory. > Consequently, nobody who uses SQLite for its intended purpose actually does > this. I dread to think how slow random access over 2 billion columns would > be in SQLite. I wrote earlier that for us use case where we are reading whole rows is the most common one. > Your gene expressions are data. They are not the names of table entities. > They should be stored in a table as other posts suggested. Maybe. But often this data is represented as a row of expressions with columns for each gene. Because this is what is being distributed, we are looking for ways to store this in a stable format which will be supported for next 50 years, without modifying to original data too much. I do hear suggestions to do such transformation, but that is less ideal for our use case. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users