Re: Another Revolution Success Story

Alex Tweedly Sun, 17 Oct 2004 16:43:23 -0700

Reviving an old thread from a few months ago ....

At 15:02 01/07/2004 -0700, Richard Gaskin wrote:

If CSV were consistently implemented CSV2TabNew would work excellently right out of the box, but since some CSVs escape quotes by doubling them I needed to add one line (see below) to also substitute doubled quote chars with the quote placeholder.

Bonus: since the added line reduces the number of quote characters, the function is now even faster.

Here's CSV2Tab3:


function CSV2Tab3 pData
  local tNuData -- contains tabbed copy of data
  local tReturnPlaceholder -- replaces cr in field data to avoid line
  --                       breaks which would be misread as records;
  --                       replaced later during dislay
  local tEscapedQuotePlaceholder -- used for keeping track of quotes
  --                       in data
  local tInQuotedText -- flag set while reading data between quotes
  --
  put numtochar(11) into tReturnPlaceholder -- vertical tab as
  --                       placeholder
  put numtochar(2)  into tEscapedQuotePlaceholder -- used to simplify
  --                       distinction between quotes in data and those
  --                       used in delimiters
  --
  -- Normalize line endings:
  replace crlf with cr in pData          -- Win to UNIX
  replace numtochar(13) with cr in pData -- Mac to UNIX
  --
  -- Put placeholder in escaped quote (non-delimiter) chars:
  replace ("\"&quote) with tEscapedQuotePlaceholder in pData
  replace quote&quote with tEscapedQuotePlaceholder in pData --<NEW
  --
  put space before pData   -- to avoid ambiguity of starting context
  split pData by quote
  put False into tInsideQuoted
  repeat for each element k in pData
    if (tInsideQuoted) then
      replace cr with tReturnPlaceholder in k
      put k after tNuData
      put False into tInsideQuoted
    else
      replace comma with tab in k
      put k after tNuData
      put true into tInsideQuoted
    end if
  end repeat
  --
  delete char 1 of tNuData -- remove the leading space
  replace tEscapedQuotePlaceholder with quote in tNuData
  return tNuData
end CSV2Tab3


Unfortunately, there's a problem with this code; the heart of it is
  split pData by quote
  repeat for each element k in pData
     -- build up new string
  end repeat

and this is not guaranteed to work. The form "repeat for each element" will process the elements in the order of the keys of the array. Normally this is the correct order (because split by only a primary separator produces an array whose keys are consecutive integers), but there is no guarantee that they will always be.

And I've found at least one case where they're not - I have a spreadsheet which works just fine up to 3904 lines - but add one more line and it fails completely. (verified by "put the keys of pData after msg")

Changing
   repeat for each element k in pData
to
  repeat with tCounter = 1 to the number of lines in the keys of pData
    put pData[tCounter] into k

solves it. Obviously it will be slower - but "slow and correct" beats "fast and wrong" :-)

-- Alex.

_______________________________________________
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Another Revolution Success Story

Reply via email to