Re: [sqlite] Importing a cvs file

Rui Fernandes Fri, 18 Jul 2014 09:05:38 -0700

Hi,

I think I'm getting a bunch of attached fields. Everything is added to the
same line, without separators. I've seen it when I read it.


Kind regards,

Miguel Fernandes


On Fri, Jul 18, 2014 at 4:49 PM, RSmith <[email protected]> wrote:

>
> On 2014/07/18 16:53, Jonathan Leslie wrote:
>
>> I have a similar situation about to happen.  I also have the issue where
>> one of the data fields is a text section that the user fills in and of
>> course, he is free to put commas in this field.   I believe that this will
>> result in higgly-piggly (that's a technical term) when I do the import.
>> What I do is I never use a CSV file as a "Comma separated Values" file but
>> rather as a "Character separated Values"  file, but rather use a Character,
>> º (<alt>0186), a legal character but not one on the keyboard, as the
>> separator character.  You can set Excel to do this automatically, and it
>> makes life a whole lot easier.   I believe with sqlite the >.separator "º"<
>> directive will allow the import correctly, and keep the commas in place in
>> the data fields.
>>
>
> Hi Jonathan, while this is a nice solution, it is not a perfect one. What
> if a user does enter that character? (Even if not directly linked to a
> keyboard key, it can happen). The very correct way to do it is to simply
> use quoting, which preserves the text specifically, whatever characters
> were used. A correct CSV format can hold any data (though most readers
> would have difficulty reading Chr(0) values, but that is an implementation
> issue). More importantly, Excel, Open Office Calc etc, all knows exactly
> how to export this correctly and without the higgly-pigglies. The only
> place I have seen confusion is where you export it in a country that uses
> semi-colon as a separator (yes it happens, mostly in the Netherlands) and
> then try to read in another country where the locality values are different
> (comma mostly).
>
> Mostly the problems experienced by people is that they make some home-brew
> CSV importer that does not realise how to correctly read output from a
> standards-based exporter such as Excel, and then try to change things like
> separation or quoting methods to "fix" it after the fact.
>
> The method described by the RFC for CSV is quite genius ( I think RFC2048,
> but not sure now and not important, but google will know if you need it) so
> let me explain a simple way of making csv files.
>
> The Important bits to remember is:
>
> 1 - Header, first line of the CSV must be the header, must contain field
> names, unless you do not have those or they are implicit, such as in a
> log-type listing or import to existing structure- but any standard importer
> will expect row 1 to contain field-identifiers if previously unknown.
>
> 2 - Every row must be separated by whichever Line-separation character(s)
> used by the operating system - this sometimes causes problems.
>  Carriage-Return-Linefeed (aka CR/LF) pairs (0x0D+0x0A) are mostly used in
> Windows, *nix environments prefer to use just the Linefeed (LF) character
> (0x0A). The Windows-way allows other linefeeds or carriage returns to be
> inside the text and not affect line-termination, and the *nix method is
> more lightweight in data terms, so both have advantages. Important to note
> that a line-separator may appear inside a set of quotes which makes it
> simply part of the value, and not indication of line termination - this is
> the biggest problem with homegrown importers I've seen, they all try read
> the file line-by-line which breaks badly when a field contains such a CR/LF.
>
> 3 - if a field starts with a Quote (after any leading/trailing white-space
> was stripped) any and all characters up to the next quote is part of the
> field value, even if they are separators (; or , or CR or LF) or
> white-space characters (such as SPACE/TAB/VTAB/etc. - these are stripped
> when leading/trailing field values, unless in quotes) and when we come to
> the next Quote (typically " or 0x22, but may differ between
> implementations), we check whether it's immediately followed by another
> quote, which means simply add the actual quote char and continue, or not,
> which means the end of the field. (Note: Any text after the ending quote in
> a quoted field and before the next field separator ( ; or , ) is ignored,
> one can use this quirk to add some comments or directives even).
>
> 4 - If you are writing fields, if your field contains any Quote
> characters, any separators or leading/trailing white-space characters that
> must not be discarded, it must be quoted. The best method for quoting a
> string to be valid CSV field data is like this pseudo code:
>
> QuoteChar = Char(0x22); // Or whatever is used for quoting in your system
> for i = last index of value down to first index:       // Go downward
> because it allows inserting without messing up final index
>   if  value[i] = QuoteChar then insert QuoteChar into value at position i;
>        // Duplicate any existing quotes
> OutputFieldValue = QuoteChar + value + QuoteChar;       // Surround the
> result with quotes
>
>
> This way whatever is used in your fields do not matter, it will be
> understood by any standard-validated CSV reader. Note that quotes may be
> added regardless of content, but it is preferred to not add quotes where
> not needed as it can cause very significant weight-gains to a large file,
> especially one with many fields per row, and moreso if lots of fields are
> short or empty.
>
> The sqlite3.exe (or its Linux counterpart) is tested to understand the
> correct formatted CSV as used in most places, not sure if it recognizes the
> OS's preference for say ; vs. , separators, but it's an easy test. If your
> data gets mangled, it is probably due to the input being not
> well-processed, and I hope the above will help to fix that.
>
> Cheers,
> Ryan
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Importing a cvs file

Reply via email to