Reviving an old thread from a few months ago ....
At 15:02 01/07/2004 -0700, Richard Gaskin wrote:
If CSV were consistently implemented CSV2TabNew would work excellently right out of the box, but since some CSVs escape quotes by doubling them I needed to add one line (see below) to also substitute doubled quote chars with the quote placeholder.
Bonus: since the added line reduces the number of quote characters, the function is now even faster.
Here's CSV2Tab3:
function CSV2Tab3 pData local tNuData -- contains tabbed copy of data local tReturnPlaceholder -- replaces cr in field data to avoid line -- breaks which would be misread as records; -- replaced later during dislay local tEscapedQuotePlaceholder -- used for keeping track of quotes -- in data local tInQuotedText -- flag set while reading data between quotes -- put numtochar(11) into tReturnPlaceholder -- vertical tab as -- placeholder put numtochar(2) into tEscapedQuotePlaceholder -- used to simplify -- distinction between quotes in data and those -- used in delimiters -- -- Normalize line endings: replace crlf with cr in pData -- Win to UNIX replace numtochar(13) with cr in pData -- Mac to UNIX -- -- Put placeholder in escaped quote (non-delimiter) chars: replace ("\""e) with tEscapedQuotePlaceholder in pData replace quote"e with tEscapedQuotePlaceholder in pData --<NEW -- put space before pData -- to avoid ambiguity of starting context split pData by quote put False into tInsideQuoted repeat for each element k in pData if (tInsideQuoted) then replace cr with tReturnPlaceholder in k put k after tNuData put False into tInsideQuoted else replace comma with tab in k put k after tNuData put true into tInsideQuoted end if end repeat -- delete char 1 of tNuData -- remove the leading space replace tEscapedQuotePlaceholder with quote in tNuData return tNuData end CSV2Tab3
Unfortunately, there's a problem with this code; the heart of it is split pData by quote repeat for each element k in pData -- build up new string end repeat
and this is not guaranteed to work. The form "repeat for each element" will process the elements in the order of the keys of the array. Normally this is the correct order (because split by only a primary separator produces an array whose keys are consecutive integers), but there is no guarantee that they will always be.
And I've found at least one case where they're not - I have a spreadsheet which works just fine up to 3904 lines - but add one more line and it fails completely.
(verified by "put the keys of pData after msg")
Changing
repeat for each element k in pData
to
repeat with tCounter = 1 to the number of lines in the keys of pData
put pData[tCounter] into ksolves it. Obviously it will be slower - but "slow and correct" beats "fast and wrong" :-)
-- Alex.
_______________________________________________ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
