Hi, I think I'm getting a bunch of attached fields. Everything is added to the same line, without separators. I've seen it when I read it.
Kind regards, Miguel Fernandes On Fri, Jul 18, 2014 at 4:49 PM, RSmith <[email protected]> wrote: > > On 2014/07/18 16:53, Jonathan Leslie wrote: > >> I have a similar situation about to happen. I also have the issue where >> one of the data fields is a text section that the user fills in and of >> course, he is free to put commas in this field. I believe that this will >> result in higgly-piggly (that's a technical term) when I do the import. >> What I do is I never use a CSV file as a "Comma separated Values" file but >> rather as a "Character separated Values" file, but rather use a Character, >> º (<alt>0186), a legal character but not one on the keyboard, as the >> separator character. You can set Excel to do this automatically, and it >> makes life a whole lot easier. I believe with sqlite the >.separator "º"< >> directive will allow the import correctly, and keep the commas in place in >> the data fields. >> > > Hi Jonathan, while this is a nice solution, it is not a perfect one. What > if a user does enter that character? (Even if not directly linked to a > keyboard key, it can happen). The very correct way to do it is to simply > use quoting, which preserves the text specifically, whatever characters > were used. A correct CSV format can hold any data (though most readers > would have difficulty reading Chr(0) values, but that is an implementation > issue). More importantly, Excel, Open Office Calc etc, all knows exactly > how to export this correctly and without the higgly-pigglies. The only > place I have seen confusion is where you export it in a country that uses > semi-colon as a separator (yes it happens, mostly in the Netherlands) and > then try to read in another country where the locality values are different > (comma mostly). > > Mostly the problems experienced by people is that they make some home-brew > CSV importer that does not realise how to correctly read output from a > standards-based exporter such as Excel, and then try to change things like > separation or quoting methods to "fix" it after the fact. > > The method described by the RFC for CSV is quite genius ( I think RFC2048, > but not sure now and not important, but google will know if you need it) so > let me explain a simple way of making csv files. > > The Important bits to remember is: > > 1 - Header, first line of the CSV must be the header, must contain field > names, unless you do not have those or they are implicit, such as in a > log-type listing or import to existing structure- but any standard importer > will expect row 1 to contain field-identifiers if previously unknown. > > 2 - Every row must be separated by whichever Line-separation character(s) > used by the operating system - this sometimes causes problems. > Carriage-Return-Linefeed (aka CR/LF) pairs (0x0D+0x0A) are mostly used in > Windows, *nix environments prefer to use just the Linefeed (LF) character > (0x0A). The Windows-way allows other linefeeds or carriage returns to be > inside the text and not affect line-termination, and the *nix method is > more lightweight in data terms, so both have advantages. Important to note > that a line-separator may appear inside a set of quotes which makes it > simply part of the value, and not indication of line termination - this is > the biggest problem with homegrown importers I've seen, they all try read > the file line-by-line which breaks badly when a field contains such a CR/LF. > > 3 - if a field starts with a Quote (after any leading/trailing white-space > was stripped) any and all characters up to the next quote is part of the > field value, even if they are separators (; or , or CR or LF) or > white-space characters (such as SPACE/TAB/VTAB/etc. - these are stripped > when leading/trailing field values, unless in quotes) and when we come to > the next Quote (typically " or 0x22, but may differ between > implementations), we check whether it's immediately followed by another > quote, which means simply add the actual quote char and continue, or not, > which means the end of the field. (Note: Any text after the ending quote in > a quoted field and before the next field separator ( ; or , ) is ignored, > one can use this quirk to add some comments or directives even). > > 4 - If you are writing fields, if your field contains any Quote > characters, any separators or leading/trailing white-space characters that > must not be discarded, it must be quoted. The best method for quoting a > string to be valid CSV field data is like this pseudo code: > > QuoteChar = Char(0x22); // Or whatever is used for quoting in your system > for i = last index of value down to first index: // Go downward > because it allows inserting without messing up final index > if value[i] = QuoteChar then insert QuoteChar into value at position i; > // Duplicate any existing quotes > OutputFieldValue = QuoteChar + value + QuoteChar; // Surround the > result with quotes > > > This way whatever is used in your fields do not matter, it will be > understood by any standard-validated CSV reader. Note that quotes may be > added regardless of content, but it is preferred to not add quotes where > not needed as it can cause very significant weight-gains to a large file, > especially one with many fields per row, and moreso if lots of fields are > short or empty. > > The sqlite3.exe (or its Linux counterpart) is tested to understand the > correct formatted CSV as used in most places, not sure if it recognizes the > OS's preference for say ; vs. , separators, but it's an easy test. If your > data gets mangled, it is probably due to the input being not > well-processed, and I hope the above will help to fix that. > > Cheers, > Ryan > > > > _______________________________________________ > sqlite-users mailing list > [email protected] > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

