Re: [sqlite] csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines

2020-02-05 Thread Keith Medcalf
On Tuesday, 4 February, 2020 22:31, Keith Medcalf  wrote:

The vsv.c (source) file line counting is now verified and I have added a skip= 
parameter.  Adding skip was far easier than variable separators ...

Same location, file updated:  http://www.dessus.com/files/vsv.c
The complete collection:  http://www.dessus.com/files/sqlite3extensions.zip

Using tbartilde.csv containing:
a|b|c|d~1|2|3|4~2|"3|5"|4|5~3|4|5|6~4|5|6|7

SQLite version 3.32.0 2020-02-05 05:21:57
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create virtual table t using vsv(filename='tbartilde.csv', header=on, 
fsep='|', rsep='~');
sqlite> .mode col
sqlite> .head on
sqlite> select * from t;
a   b   c   d
--  --  --  --
1   2   3   4
2   3|5 4   5
3   4   5   6
4   5   6   7

sqlite> drop table t;
sqlite> create virtual table t using vsv(filename='tbartilde.csv', header=on, 
fsep='|', rsep='~', skip=3);
sqlite> select * from t;
a   b   c   d
--  --  --  --
4   5   6   7

sqlite> drop table t;
sqlite> create virtual table t using vsv(filename='tbartilde.csv', header=on, 
fsep='|', rsep='~', skip=4);
sqlite> select * from t;
sqlite> drop table t;
sqlite> create virtual table t using vsv(filename='tbartilde.csv', header=on, 
fsep='|', rsep='~', skip=5);
Error: premature end of file during skip
sqlite>



>On Tuesday, 4 February, 2020 05:19, Robert M. Münch 
> wrote:
>
>>On 3 Dec 2019, at 16:10, Jannick wrote:
>
>>> would it be possible to add to the csv extension the following
>>> parameter options (with syntax along the lines of):
>
>>> - sep=';': field separator character (different from default ',')
>>> - skip=N: skip the first N lines
>
>>> Both options would be very helpful for information of fixed format
>>> downloaded from information providers.
>
>>Hi, those would be very useful enhancements.
>
>I did some of the changes and you can get the resulting extension called
>VSV (Variably Separated Values) from:
>
>http://www.dessus.com/files/vsv.c
>
>I added the facility to specify the field and record separator
>characters.  I did not add skip, though I may look at adding that too,
>but one can simply use the OFFSET in SQL to ignore some rows at the
>beginning of the file.  I made some other changes also which makes this
>non-compliant with the RFC.
>
>Data between the "field separator" markers can consist of any arbitrary
>string of bytes that DOES NOT include the field or record separator
>bytes.
>Data between the double-quotes can consist of any arbitrary string of
>bytes except that double-quotes must be escaped by doubling them.
>
>The added parameters are fsep=SEPERATOR and rsep=SEPERATOR for the field
>and record seperators respectively.
>
>SEPERATOR is a single quoted string that may be in the following formats:
>
>'x'where x is any arbitrary byte and will be used as the separator
>character.
>'\x'   for standard escape codes (tab = \t, vtab = \v, Formfeed = \f,
>Newline = \n).
>'\xhh' where hh is the hexidecimal code for the byte to use.
>
>defaults if not specified are fsep=',' and rsep='\n'
>
>so to read the following file:
>
>a|b|c|d~1|2|3|4~2|"3|5"|4|5~3|4|5|6~4|5|6|7
>
>you can use the following commands:
>
>SQLite version 3.32.0 2020-02-05 02:43:27
>Enter ".help" for usage hints.
>Connected to a transient in-memory database.
>Use ".open FILENAME" to reopen on a persistent database.
>sqlite> create virtual table tbartilde using
>vsv(filename='tbartilde.csv', header=on, fsep='|', rsep='~');
>sqlite> .mode col
>sqlite> .head on
>sqlite> select * from tbartilde;
>a   b   c   d
>--  --  --  --
>1   2   3   4
>2   3|5 4   5
>3   4   5   6
>4   5   6   7
>
>Might still have some line counting errors and haven't figured out how to
>implement skip yet ...

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume. 



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines

2020-02-04 Thread Keith Medcalf

On Tuesday, 4 February, 2020 05:19, Robert M. Münch 
 wrote:

>On 3 Dec 2019, at 16:10, Jannick wrote:

>> would it be possible to add to the csv extension the following
>> parameter options (with syntax along the lines of):

>> - sep=';': field separator character (different from default ',')
>> - skip=N: skip the first N lines

>> Both options would be very helpful for information of fixed format
>> downloaded from information providers.

>Hi, those would be very useful enhancements.

I did some of the changes and you can get the resulting extension called VSV 
(Variably Separated Values) from:

http://www.dessus.com/files/vsv.c

I added the facility to specify the field and record separator characters.  I 
did not add skip, though I may look at adding that too, but one can simply use 
the OFFSET in SQL to ignore some rows at the beginning of the file.  I made 
some other changes also which makes this non-compliant with the RFC.

Data between the "field separator" markers can consist of any arbitrary string 
of bytes that DOES NOT include the field or record separator bytes.
Data between the double-quotes can consist of any arbitrary string of bytes 
except that double-quotes must be escaped by doubling them.

The added parameters are fsep=SEPERATOR and rsep=SEPERATOR for the field and 
record seperators respectively.

SEPERATOR is a single quoted string that may be in the following formats:

'x'where x is any arbitrary byte and will be used as the separator 
character.
'\x'   for standard escape codes (tab = \t, vtab = \v, Formfeed = \f, Newline = 
\n).
'\xhh' where hh is the hexidecimal code for the byte to use.

defaults if not specified are fsep=',' and rsep='\n'

so to read the following file:

a|b|c|d~1|2|3|4~2|"3|5"|4|5~3|4|5|6~4|5|6|7

you can use the following commands:

SQLite version 3.32.0 2020-02-05 02:43:27
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create virtual table tbartilde using vsv(filename='tbartilde.csv', 
header=on, fsep='|', rsep='~');
sqlite> .mode col
sqlite> .head on
sqlite> select * from tbartilde;
a   b   c   d
--  --  --  --
1   2   3   4
2   3|5 4   5
3   4   5   6
4   5   6   7

Might still have some line counting errors and haven't figured out how to 
implement skip yet ...

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines

2020-02-04 Thread Robert Hairgrove

On 04.02.20 15:42, Simon Slavin wrote:

On 4 Feb 2020, at 12:18pm, Robert M. Münch  wrote:


- sep=';': field separator character (different from default ',')

If you provide this facility, please don't add it to anything called 'csv' 
since the 'c' stands for 'comma'.

For those playing along at home, csv files using semi-colon are a result of a 
bug in Excel.  Windows has a setting for a 'list separator'.  The two most 
usual values are ',' and ';'.  The CSV export filter in Excel takes its 
separator from this field rather than always using a comma, because it was 
written by someone who wasn't aware of, didn't understand, or was intentionally 
trying to disrupt the standard.  Decades after being told about the bug, 
Microsoft hasn't fixed it.

There are a couple of other errors in Excel's CSV filters including how strings 
are quoted and how a blank value differs from a zero-length string.  The best 
way I've seen to handle this was to add a new filter to your software, similar 
to 'csv', called something like 'exceltext' which did things the Excel way.


Believe it or not, there is no binding standard for the CSV format. The 
closest anyone has come was RFC 4180.

However:

According to RFC 4180, section 2:
  "While there are various specifications and implementations for the
   CSV format (for ex. [4], [5], [6] and [7]), there is no formal
   specification in existence, which allows for a wide variety of
   interpretations of CSV files."

https://tools.ietf.org/html/rfc4180#section-2

In section 3, under "Interoperability considerations":
  "Due to lack of a single specification, there are considerable
   differences among implementations.  Implementors should "be
   conservative in what you do, be liberal in what you accept from
   others" (RFC 793 [8]) when processing CSV files."

https://tools.ietf.org/html/rfc4180#section-3

That being said, the problem with trying to enforce the comma as the 
sole delimiter character is due to the fact that over half of the 
non-English speaking world (or perhaps even more) uses the comma as the 
decimal separator. The "work-around" for that, of course, would be to 
enclose all fields in double quote characters. But, as we know, the 
800-pound gorilla in the room doesn't necessarily do that...


I agree that this would be a very good option to have. In the meantime, 
check out libcsv on GitHub:

https://github.com/rgamble/libcsv

It adheres as closely to what standards there are, and you can choose 
your own delimiter and quote character if you like. Of course, you have 
to do some programming to use it, but it's really easy to use. And it is 
very fast since it does just one thing, but does it very well.


HTH,
Bob Hairgrove

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines

2020-02-04 Thread Simon Slavin
On 4 Feb 2020, at 12:18pm, Robert M. Münch  wrote:

> - sep=';': field separator character (different from default ',')

If you provide this facility, please don't add it to anything called 'csv' 
since the 'c' stands for 'comma'.

For those playing along at home, csv files using semi-colon are a result of a 
bug in Excel.  Windows has a setting for a 'list separator'.  The two most 
usual values are ',' and ';'.  The CSV export filter in Excel takes its 
separator from this field rather than always using a comma, because it was 
written by someone who wasn't aware of, didn't understand, or was intentionally 
trying to disrupt the standard.  Decades after being told about the bug, 
Microsoft hasn't fixed it.

There are a couple of other errors in Excel's CSV filters including how strings 
are quoted and how a blank value differs from a zero-length string.  The best 
way I've seen to handle this was to add a new filter to your software, similar 
to 'csv', called something like 'exceltext' which did things the Excel way.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines

2020-02-04 Thread Kees Nuyt
On Tue, 04 Feb 2020 13:18:30 +0100, you wrote:

>On 3 Dec 2019, at 16:10, Jannick wrote:
>
>> would it be possible to add to the csv extension the following parameter
>> options (with syntax along the lines of):
>>
>> - sep=';': field separator character (different from default ',')
>> - skip=N: skip the first N lines
>>
>> Both options would be very helpful for information of fixed format
>> downloaded from information providers.
>
> Hi, those would be very useful enhancements.

I don't see the need, the feature is available in another form

$ cat test.csv

"a";"b"
"c";"d"

$ sqlite3 test.db \
"DROP TABLE IF EXISTS tbl1" \
".mode csv" \
".separator ;" \
".import test.csv tbl1" \
". mode column" \
".headers on" \
"SELECT * FROM tbl1"

a   b
--  --
c   d


-- 
Regards,
Kees Nuyt

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines

2020-02-04 Thread Robert M. Münch
On 3 Dec 2019, at 16:10, Jannick wrote:

> would it be possible to add to the csv extension the following parameter
> options (with syntax along the lines of):
>
> - sep=';': field separator character (different from default ',')
> - skip=N: skip the first N lines
>
> Both options would be very helpful for information of fixed format
> downloaded from information providers.

Hi, those would be very useful enhancements.

-- 

Robert M. Münch


signature.asc
Description: OpenPGP digital signature
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines

2019-12-04 Thread Jannick
Hi everyone,

would it be possible to add to the csv extension the following parameter
options (with syntax along the lines of):

- sep=';': field separator character (different from default ',')
- skip=N: skip the first N lines

Both options would be very helpful for information of fixed format
downloaded from information providers.

I would be happy to suggest tiny patches for this if the sqlite devels
wanted me to do so and advised me on how to submit patches, a pull request
or alike.

Many thanks,
J.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users