Alex Tweedly wrote:
At 00:25 01/07/2004 -0700, Richard Gaskin wrote:My post from 14 June 2002 with my own CSV2Tab function is at <http://lists.runrev.com/pipermail/metacard/2002-June/001767.html>.
Hats off to anyone who can improve it's speed, and a bottle of 12-year-old single malt to anyone who can come up with an algorithm I can use which is at least twice as fast.
Now there's a challenge I can relate to :-)
BUT - the speed of the conversion depends on the data ... Enclosed below is a version of the script which is between 10% and 90% faster - and probably has potential to go even faster than that.
It uses the same set up as you did - so there are no quotes left except those around fields.
Then instead of walking through the data char by char, it use "split()" to divide into an array; the array elements must then alternate between in-quotes and not-in-quotes.
Each array element has only the relevant processing applied.
Great stuff. Using split is a ingenious way to reduce the load.
If CSV were consistently implemented CSV2TabNew would work excellently right out of the box, but since some CSVs escape quotes by doubling them I needed to add one line (see below) to also substitute doubled quote chars with the quote placeholder.
Bonus: since the added line reduces the number of quote characters, the function is now even faster.
I ran 1000 iterations of all three algorithms on a small test file which uses the Excel escaping format of doubled quotes (I believe it's MS Access that uses slash-quote to escape, if memory serves). Average times on my machine (G4 PowerBook) are roughly:
CSV2Tab: 438ms CSV2TabNew: 369ms CSV2Tab3: 195ms
On a slightly more complex example the times were:
CSV2Tab: 892ms CSV2TabNew: 333ms CSV2Tab3: 153ms
Here's CSV2Tab3:
function CSV2Tab3 pData local tNuData -- contains tabbed copy of data local tReturnPlaceholder -- replaces cr in field data to avoid line -- breaks which would be misread as records; -- replaced later during dislay local tEscapedQuotePlaceholder -- used for keeping track of quotes -- in data local tInQuotedText -- flag set while reading data between quotes -- put numtochar(11) into tReturnPlaceholder -- vertical tab as -- placeholder put numtochar(2) into tEscapedQuotePlaceholder -- used to simplify -- distinction between quotes in data and those -- used in delimiters -- -- Normalize line endings: replace crlf with cr in pData -- Win to UNIX replace numtochar(13) with cr in pData -- Mac to UNIX -- -- Put placeholder in escaped quote (non-delimiter) chars: replace ("\""e) with tEscapedQuotePlaceholder in pData replace quote"e with tEscapedQuotePlaceholder in pData --<NEW -- put space before pData -- to avoid ambiguity of starting context split pData by quote put False into tInsideQuoted repeat for each element k in pData if (tInsideQuoted) then replace cr with tReturnPlaceholder in k put k after tNuData put False into tInsideQuoted else replace comma with tab in k put k after tNuData put true into tInsideQuoted end if end repeat -- delete char 1 of tNuData -- remove the leading space replace tEscapedQuotePlaceholder with quote in tNuData return tNuData end CSV2Tab3
Please send me a private email and we'll make arrangements for the scotch delivery. Thanks for the assist.
-- Richard Gaskin Fourth World Media Corporation ___________________________________________________ Rev tools and more: http://www.fourthworld.com/rev
_______________________________________________ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
