Re: [sqlite] CSV import using CLI (header, NULL)

Warren Young Sat, 29 Feb 2020 08:30:46 -0800

On Feb 27, 2020, at 11:51 PM, Christof Böckler <c...@jamesie.de> wrote:
> 
> 1. There should be no penalty for using header lines in CSV files. Thus a new 
> flag -h for .import is much appreciated.


More than that, SQLite should be able to *actively use* the header when present.

For instance, given:

foo,bar,qux
1,2,3

then there should be a way to import only the first and third columns, perhaps 
with a command like so:

.import data.csv tablename foo,qux

I’ve needed such a thing multiple times.

One case that comes to mind is that tablename has only two columns, and you’re 
purposefully subsetting the CSV data in SQLite.  This is common when importing 
DBs from outside sources, where the other DB has more columns than your local 
SQLite DB.  It’s annoying to need to preprocess the data to strip the “extra” 
columns out.  It’s even more annoying to subset it in SQLite by matching the 
outside DB’s structure in SQLite and then do the common workaround to a lack of 
ALTER TABLE DROP COLUMN.

Another case I’ve run into before is that the table layout of the CSV is the 
same as in SQLite, but you want one or more of the columns to get their default 
value for some reason, not take them from the CSV.  You can see this with 
time-varying data when the CSV is output from SQLite at time t0 and reimported 
at t1 after some of the data has gone stale, so that letting the stale columns 
revert to their defaults is better than importing obsolete data.

To make the second example concrete, imagine an HVAC monitoring system’s DB: 
the names of the stations and the wear data are still useful, but the last 
known temperature shouldn’t be imported because the old data could be entirely 
inaccurate, and fresh data should be coming along soon after the DB table is 
restored from the backup.  Meanwhile, report “I don’t know” for the temperature 
measurements.

> 2. Missing values in columns with numerical affinity should show up as NULL 
> values.

I don’t think one hard-and-fast rule works for everyone here, if for no other 
reason than that there must be people depending on the current behavior.

Instead, let it be a mode:

.blank default
.blank blank
.blank zero
.blank null

This proposed setting tells the CSV importer how to handle blanks: use the DB 
column’s default value, leave them be, make them zeroes, or make them SQL NULLs.

The default value for this setting would be “blank” for backwards compatibility.

Parallel .zero and .null settings could be added to take care of similar cases. 
 The defaults would obviously be “zero” and “null”, respectively.
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] CSV import using CLI (header, NULL)

Reply via email to