To import a csv file with headers into an existing table, you can use .import '| tail -n +2 yourfile.csv' yourtable
to skip the header line. On Sat, Feb 29, 2020, 4:30 AM Christof Böckler <c...@jamesie.de> wrote: > Hi, > > I want to share some thoughts and make some suggestions about the SQLite > 3 command line interface (CLI) tool, especially its behaviour when > importing CSV files. > > CSV files are probably even more common for data exchange than SQLite > database files. I consider it to be good practice to include a header > line with column names in every CSV file. Metadata should go with the > data. This prevents me from mixing up two columns that contain numbers > with similiar value distributions. > > Let’s look at an example. A file named data.csv contains three lines: > > id,val1,val2 > A,27,8 > B,3,12 > > Now … > sqlite3 > sqlite> .import data.csv tab > > works and looks good at first, but there is a problem with numerical data. > > sqlite> SELECT * FROM tab ORDER BY val2; > B,3,12 > A,27,8 > > This is because on import all three columns were created with affinity > (not to say data type) TEXT (see .schema). As a consequence all numbers > were imported as strings. > '12' < '8' is lexicographically OK, but not so in a mathematical sense. > Having the CSV file in mind I clearly expect to see 8 on the first line > of the above result. > > How to work around this? Just define the table in advance with > appropriate data types (affinity INTEGER): > > sqlite> CREATE TABLE tab (id TEXT, val1 INT, val2 INT); > > But now the above .import command will not work as expected, because it > will result in three rows in our table tab. The first row contains the > header line. > > Two different workarounds come to my mind: > a) sqlite> DELETE FROM tab WHERE rowid = 1; -- Dirty hack! > b) sqlite> .import data.csv temp_tab > sqlite> INSERT INTO tab SELECT * FROM temp_tab; > sqlite> DROP TABLE temp_tab; -- Two tedious extra lines > > Both approaches are not very appealing to me. To make CSV files with a > header line first class citizens, I suggest this instead: > sqlite> .import -h data.csv tab > should ignore the first line of data.csv. This import should fail if > table tab does not already exist. > This is both shorter and more elegant than both workarounds. > > > Now on to a second issue. Let’s assume you have sucessfully imported a > file containing these four lines: > id,val1,val2 > A,27,8 > B,3,12 > C,,1 > into the table tab mentioned above, resulting in three rows. Notice the > missing value in column val1 on the last line. This missing value is > imported as an empty string '' regardlesse of the affinity of column val1. > > That leads to (mathematically) unexpected results from aggregate functions: > sqlite> SELECT SUM(val1) FROM tab; > 30 -- Well, treating '' like 0 is OK in this case > sqlite> SELECT COUNT(val1) FROM tab; > 3 -- but not so here; only two values/numbers were given in data.csv > sqlite> SELECT AVG(val1) FROM tab; > 10 -- the average of 3 and 27 is 15 > sqlite> SELECT MAX(val1) FROM tab; > '' -- not to be expected when looking at data.csv > > OK, I hear you say, what about this workaround: > sqlite> UPDATE tab SET val1 = NULL WHERE val1 = ''; > This makes the surprises above go away, but it is again tedious to do for > all columns containing only numbers. > > My suggestion is: If someone goes the extra mile and defines a table in > advance in order to have an appropriate numerical affinity (INTEGER, REAL > or NUMERIC) for a column, then it is OK to encode a missing value as NULL > instead of ''. It seems right though to keep the current behaviour for > columns with affinity TEXT (the default) or BLOB. > > To sum things up: > 1. There should be no penalty for using header lines in CSV files. Thus a > new flag -h for .import is much appreciated. > 2. Missing values in columns with numerical affinity should show up as > NULL values. > > Thanks for reading, I look forward to your opinions about these issues. > > Greetings > Christof > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users