Re: [sqlite] csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines
On Tuesday, 4 February, 2020 22:31, Keith Medcalf wrote: The vsv.c (source) file line counting is now verified and I have added a skip= parameter. Adding skip was far easier than variable separators ... Same location, file updated: http://www.dessus.com/files/vsv.c The complete collection: http://www.dessus.com/files/sqlite3extensions.zip Using tbartilde.csv containing: a|b|c|d~1|2|3|4~2|"3|5"|4|5~3|4|5|6~4|5|6|7 SQLite version 3.32.0 2020-02-05 05:21:57 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. sqlite> create virtual table t using vsv(filename='tbartilde.csv', header=on, fsep='|', rsep='~'); sqlite> .mode col sqlite> .head on sqlite> select * from t; a b c d -- -- -- -- 1 2 3 4 2 3|5 4 5 3 4 5 6 4 5 6 7 sqlite> drop table t; sqlite> create virtual table t using vsv(filename='tbartilde.csv', header=on, fsep='|', rsep='~', skip=3); sqlite> select * from t; a b c d -- -- -- -- 4 5 6 7 sqlite> drop table t; sqlite> create virtual table t using vsv(filename='tbartilde.csv', header=on, fsep='|', rsep='~', skip=4); sqlite> select * from t; sqlite> drop table t; sqlite> create virtual table t using vsv(filename='tbartilde.csv', header=on, fsep='|', rsep='~', skip=5); Error: premature end of file during skip sqlite> >On Tuesday, 4 February, 2020 05:19, Robert M. Münch > wrote: > >>On 3 Dec 2019, at 16:10, Jannick wrote: > >>> would it be possible to add to the csv extension the following >>> parameter options (with syntax along the lines of): > >>> - sep=';': field separator character (different from default ',') >>> - skip=N: skip the first N lines > >>> Both options would be very helpful for information of fixed format >>> downloaded from information providers. > >>Hi, those would be very useful enhancements. > >I did some of the changes and you can get the resulting extension called >VSV (Variably Separated Values) from: > >http://www.dessus.com/files/vsv.c > >I added the facility to specify the field and record separator >characters. I did not add skip, though I may look at adding that too, >but one can simply use the OFFSET in SQL to ignore some rows at the >beginning of the file. I made some other changes also which makes this >non-compliant with the RFC. > >Data between the "field separator" markers can consist of any arbitrary >string of bytes that DOES NOT include the field or record separator >bytes. >Data between the double-quotes can consist of any arbitrary string of >bytes except that double-quotes must be escaped by doubling them. > >The added parameters are fsep=SEPERATOR and rsep=SEPERATOR for the field >and record seperators respectively. > >SEPERATOR is a single quoted string that may be in the following formats: > >'x'where x is any arbitrary byte and will be used as the separator >character. >'\x' for standard escape codes (tab = \t, vtab = \v, Formfeed = \f, >Newline = \n). >'\xhh' where hh is the hexidecimal code for the byte to use. > >defaults if not specified are fsep=',' and rsep='\n' > >so to read the following file: > >a|b|c|d~1|2|3|4~2|"3|5"|4|5~3|4|5|6~4|5|6|7 > >you can use the following commands: > >SQLite version 3.32.0 2020-02-05 02:43:27 >Enter ".help" for usage hints. >Connected to a transient in-memory database. >Use ".open FILENAME" to reopen on a persistent database. >sqlite> create virtual table tbartilde using >vsv(filename='tbartilde.csv', header=on, fsep='|', rsep='~'); >sqlite> .mode col >sqlite> .head on >sqlite> select * from tbartilde; >a b c d >-- -- -- -- >1 2 3 4 >2 3|5 4 5 >3 4 5 6 >4 5 6 7 > >Might still have some line counting errors and haven't figured out how to >implement skip yet ... -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines
On Tuesday, 4 February, 2020 05:19, Robert M. Münch wrote: >On 3 Dec 2019, at 16:10, Jannick wrote: >> would it be possible to add to the csv extension the following >> parameter options (with syntax along the lines of): >> - sep=';': field separator character (different from default ',') >> - skip=N: skip the first N lines >> Both options would be very helpful for information of fixed format >> downloaded from information providers. >Hi, those would be very useful enhancements. I did some of the changes and you can get the resulting extension called VSV (Variably Separated Values) from: http://www.dessus.com/files/vsv.c I added the facility to specify the field and record separator characters. I did not add skip, though I may look at adding that too, but one can simply use the OFFSET in SQL to ignore some rows at the beginning of the file. I made some other changes also which makes this non-compliant with the RFC. Data between the "field separator" markers can consist of any arbitrary string of bytes that DOES NOT include the field or record separator bytes. Data between the double-quotes can consist of any arbitrary string of bytes except that double-quotes must be escaped by doubling them. The added parameters are fsep=SEPERATOR and rsep=SEPERATOR for the field and record seperators respectively. SEPERATOR is a single quoted string that may be in the following formats: 'x'where x is any arbitrary byte and will be used as the separator character. '\x' for standard escape codes (tab = \t, vtab = \v, Formfeed = \f, Newline = \n). '\xhh' where hh is the hexidecimal code for the byte to use. defaults if not specified are fsep=',' and rsep='\n' so to read the following file: a|b|c|d~1|2|3|4~2|"3|5"|4|5~3|4|5|6~4|5|6|7 you can use the following commands: SQLite version 3.32.0 2020-02-05 02:43:27 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. sqlite> create virtual table tbartilde using vsv(filename='tbartilde.csv', header=on, fsep='|', rsep='~'); sqlite> .mode col sqlite> .head on sqlite> select * from tbartilde; a b c d -- -- -- -- 1 2 3 4 2 3|5 4 5 3 4 5 6 4 5 6 7 Might still have some line counting errors and haven't figured out how to implement skip yet ... -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines
On 04.02.20 15:42, Simon Slavin wrote: On 4 Feb 2020, at 12:18pm, Robert M. Münch wrote: - sep=';': field separator character (different from default ',') If you provide this facility, please don't add it to anything called 'csv' since the 'c' stands for 'comma'. For those playing along at home, csv files using semi-colon are a result of a bug in Excel. Windows has a setting for a 'list separator'. The two most usual values are ',' and ';'. The CSV export filter in Excel takes its separator from this field rather than always using a comma, because it was written by someone who wasn't aware of, didn't understand, or was intentionally trying to disrupt the standard. Decades after being told about the bug, Microsoft hasn't fixed it. There are a couple of other errors in Excel's CSV filters including how strings are quoted and how a blank value differs from a zero-length string. The best way I've seen to handle this was to add a new filter to your software, similar to 'csv', called something like 'exceltext' which did things the Excel way. Believe it or not, there is no binding standard for the CSV format. The closest anyone has come was RFC 4180. However: According to RFC 4180, section 2: "While there are various specifications and implementations for the CSV format (for ex. [4], [5], [6] and [7]), there is no formal specification in existence, which allows for a wide variety of interpretations of CSV files." https://tools.ietf.org/html/rfc4180#section-2 In section 3, under "Interoperability considerations": "Due to lack of a single specification, there are considerable differences among implementations. Implementors should "be conservative in what you do, be liberal in what you accept from others" (RFC 793 [8]) when processing CSV files." https://tools.ietf.org/html/rfc4180#section-3 That being said, the problem with trying to enforce the comma as the sole delimiter character is due to the fact that over half of the non-English speaking world (or perhaps even more) uses the comma as the decimal separator. The "work-around" for that, of course, would be to enclose all fields in double quote characters. But, as we know, the 800-pound gorilla in the room doesn't necessarily do that... I agree that this would be a very good option to have. In the meantime, check out libcsv on GitHub: https://github.com/rgamble/libcsv It adheres as closely to what standards there are, and you can choose your own delimiter and quote character if you like. Of course, you have to do some programming to use it, but it's really easy to use. And it is very fast since it does just one thing, but does it very well. HTH, Bob Hairgrove ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines
On 4 Feb 2020, at 12:18pm, Robert M. Münch wrote: > - sep=';': field separator character (different from default ',') If you provide this facility, please don't add it to anything called 'csv' since the 'c' stands for 'comma'. For those playing along at home, csv files using semi-colon are a result of a bug in Excel. Windows has a setting for a 'list separator'. The two most usual values are ',' and ';'. The CSV export filter in Excel takes its separator from this field rather than always using a comma, because it was written by someone who wasn't aware of, didn't understand, or was intentionally trying to disrupt the standard. Decades after being told about the bug, Microsoft hasn't fixed it. There are a couple of other errors in Excel's CSV filters including how strings are quoted and how a blank value differs from a zero-length string. The best way I've seen to handle this was to add a new filter to your software, similar to 'csv', called something like 'exceltext' which did things the Excel way. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines
On Tue, 04 Feb 2020 13:18:30 +0100, you wrote: >On 3 Dec 2019, at 16:10, Jannick wrote: > >> would it be possible to add to the csv extension the following parameter >> options (with syntax along the lines of): >> >> - sep=';': field separator character (different from default ',') >> - skip=N: skip the first N lines >> >> Both options would be very helpful for information of fixed format >> downloaded from information providers. > > Hi, those would be very useful enhancements. I don't see the need, the feature is available in another form $ cat test.csv "a";"b" "c";"d" $ sqlite3 test.db \ "DROP TABLE IF EXISTS tbl1" \ ".mode csv" \ ".separator ;" \ ".import test.csv tbl1" \ ". mode column" \ ".headers on" \ "SELECT * FROM tbl1" a b -- -- c d -- Regards, Kees Nuyt ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines
On 3 Dec 2019, at 16:10, Jannick wrote: > would it be possible to add to the csv extension the following parameter > options (with syntax along the lines of): > > - sep=';': field separator character (different from default ',') > - skip=N: skip the first N lines > > Both options would be very helpful for information of fixed format > downloaded from information providers. Hi, those would be very useful enhancements. -- Robert M. Münch signature.asc Description: OpenPGP digital signature ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] csv extension/feature requests: new options for field separator different from default ', ' and skip=N to skip first N lines
Hi everyone, would it be possible to add to the csv extension the following parameter options (with syntax along the lines of): - sep=';': field separator character (different from default ',') - skip=N: skip the first N lines Both options would be very helpful for information of fixed format downloaded from information providers. I would be happy to suggest tiny patches for this if the sqlite devels wanted me to do so and advised me on how to submit patches, a pull request or alike. Many thanks, J. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users