On Aug 5, 6:56 am, Larry Bates <[EMAIL PROTECTED]> wrote: > Ryan Rosario wrote: > > On Aug 4, 8:30 am, Emile van Sebille <[EMAIL PROTECTED]> wrote: > >> John Machin wrote: > >>> On Aug 4, 6:15 pm, Ryan Rosario <[EMAIL PROTECTED]> wrote: > >>>> On Aug 4, 1:01 am, John Machin <[EMAIL PROTECTED]> wrote: > >>>>> On Aug 4, 5:49 pm, Ryan Rosario <[EMAIL PROTECTED]> wrote: > >>>>>> Thanks Emile! Works almost perfectly, but is there some way I can > >>>>>> adapt this to quote fields that contain a comma in them? > >> <snip> > > >>> Emile's snippet is pushing it through the csv reading process, to > >>> demonstrate that his series of replaces works (on your *sole* example, > >>> at least). > >> Exactly -- just print out the results of the passed argument: > > >> rec.replace(',"',",'''").replace('",',"''',").replace('"','""').replace("'''",'"') > > >> '123,"Here is some, text ""and some quoted text"" where the quotes > >> should have been doubled",321' > > >> Where it won't work is if any of the field embedded quotes are next to > >> commas. > > >> I'd run it against the file. Presumably, you've got a consistent field > >> count expectation per record. Any resulting record not matching is > >> suspect and will identify records this approach won't address. > > >> There's probably better ways, but sometimes it's fun to create > >> executable line noise. :) > > >> Emile > > > Thanks for your responses. I think John may be right that I am reading > > it a second time. I will take a look at the CSV reader documentation > > and see if that helps. Then once I run it I can see if I need to worry > > about the comma-next-to-quote issue. > > This is a perfect demonstration of why tab delimited files are so much better > than comma and quote delimited.
No, it's a perfect demonstration of what happens when a protocol is not followed. > Virtually all software can handle table > delimited as well as comma and quote delimited, but you would have none of > these > problems if you had used tab delimited. The chances of tabs being embedded in > most data is virtually nil. > There may be no tabs in *your* data. There is no guarantee that there are no tabs in a VARCHAR(n) column in somebody else's database. I've seen all of \x00, \t, \n, \r and \x1a (Ctrl-Z (EOF in CP/M, *DOS and Windows text files)). The possibilities include (1) Don't check (2) check if '\t' in field and raise an exception (3) silently remove tabs; what do you recommend? -- http://mail.python.org/mailman/listinfo/python-list