To import a csv file with headers into an existing table, you can use

.import '| tail -n +2 yourfile.csv' yourtable

to skip the header line.

On Sat, Feb 29, 2020, 4:30 AM Christof Böckler <c...@jamesie.de> wrote:

> Hi,
>
> I want to share some thoughts and make some suggestions about the SQLite
> 3 command line interface (CLI) tool, especially its behaviour when
> importing CSV files.
>
> CSV files are probably even more common for data exchange than SQLite
> database files. I consider it to be good practice to include a header
> line with column names in every CSV file. Metadata should go with the
> data. This prevents me from mixing up two columns that contain numbers
> with similiar value distributions.
>
> Let’s look at an example. A file named data.csv contains three lines:
>
> id,val1,val2
> A,27,8
> B,3,12
>
> Now …
> sqlite3
> sqlite> .import data.csv tab
>
> works and looks good at first, but there is a problem with numerical data.
>
> sqlite> SELECT * FROM tab ORDER BY val2;
> B,3,12
> A,27,8
>
> This is because on import all three columns were created with affinity
> (not to say data type) TEXT (see .schema). As a consequence all numbers
> were imported as strings.
> '12' < '8' is lexicographically OK, but not so in a mathematical sense.
> Having the CSV file in mind I clearly expect to see 8 on the first line
> of the above result.
>
> How to work around this? Just define the table in advance with
> appropriate data types (affinity INTEGER):
>
> sqlite> CREATE TABLE tab (id TEXT, val1 INT, val2 INT);
>
> But now the above .import command will not work as expected, because it
> will result in three rows in our table tab. The first row contains the
> header line.
>
> Two different workarounds come to my mind:
> a) sqlite> DELETE FROM tab WHERE rowid = 1;  -- Dirty hack!
> b) sqlite> .import data.csv temp_tab
>     sqlite> INSERT INTO tab SELECT * FROM temp_tab;
>     sqlite> DROP TABLE temp_tab;  -- Two tedious extra lines
>
> Both approaches are not very appealing to me. To make CSV files with a
> header line first class citizens, I suggest this instead:
> sqlite> .import -h data.csv tab
> should ignore the first line of data.csv. This import should fail if
> table tab does not already exist.
> This is both shorter and more elegant than both workarounds.
>
>
> Now on to a second issue. Let’s assume you have sucessfully imported a
> file containing these four lines:
> id,val1,val2
> A,27,8
> B,3,12
> C,,1
> into the table tab mentioned above, resulting in three rows. Notice the
> missing value in column val1 on the last line. This missing value is
> imported as an empty string '' regardlesse of the affinity of column val1.
>
> That leads to (mathematically) unexpected results from aggregate functions:
> sqlite> SELECT SUM(val1) FROM tab;
> 30    -- Well, treating '' like 0 is OK in this case
> sqlite> SELECT COUNT(val1) FROM tab;
> 3     -- but not so here; only two values/numbers were given in data.csv
> sqlite> SELECT AVG(val1) FROM tab;
> 10    -- the average of 3 and 27 is 15
> sqlite> SELECT MAX(val1) FROM tab;
> ''    -- not to be expected when looking at data.csv
>
> OK, I hear you say, what about this workaround:
> sqlite> UPDATE tab SET val1 = NULL WHERE val1 = '';
> This makes the surprises above go away, but it is again tedious to do for
> all columns containing only numbers.
>
> My suggestion is: If someone goes the extra mile and defines a table in
> advance in order to have an appropriate numerical affinity (INTEGER, REAL
> or NUMERIC) for a column, then it is OK to encode a missing value as NULL
> instead of ''. It seems right though to keep the current behaviour for
> columns with affinity TEXT (the default) or BLOB.
>
> To sum things up:
> 1. There should be no penalty for using header lines in CSV files. Thus a
> new flag -h for .import is much appreciated.
> 2. Missing values in columns with numerical affinity should show up as
> NULL values.
>
> Thanks for reading, I look forward to your opinions about these issues.
>
> Greetings
> Christof
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to