Re: CSV to TSV (was Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?)

2022-04-05 Thread Keith Clarke via use-livecode
Ah, thanks Alex - I’ll dig into that. 
I did search around for CSV to TSV in several places before posting but not CSV 
to Tab and not github!
Best,
Keith

> On 5 Apr 2022, at 18:17, Alex Tweedly via use-livecode 
>  wrote:
> 
> Hi Keith,
> 
> that code will fail for any commas which occur within quoted entries - they 
> will be wrongly converted to TABs
> 
> I'd suggest getting cvsToTab (a community effort by Richard Gaskin, me and a 
> whole host of others over the years) as a good starting place, and perhaps 
> finishing place. It will handle most CSV oddities (not all of them - that is 
> provably impossible :-).
> 
> This does an efficient walk through the data, remembering whether it is 
> inside or outside quoted entries, and hence handles commas accordingly.
> 
> https://github.com/macMikey/csvToText/blob/master/csvToTab.livecodescript
> 
> Alex.
> 
> On 05/04/2022 17:02, Keith Clarke via use-livecode wrote:
>> Hi folks,
>> Thanks all for the responses and ideas on consolidating multiple CSV files 
>> into - much appreciated.
>> 
>> Ben - Thank you for sharing your working recipe. This lifted my spirits as 
>> it showed I was on the right path (very nearly!) and you moved me on a big 
>> step from where I was stuck.
>> 
>> My script was successfully iterating through folders and files, with 
>> filtering to get a file list of just CSVs with their paths for onward 
>> processing. I’d also identified the need to maintain registers of (growing) 
>> column names, together with  a master row template and a mapping of the 
>> current file’s column headers in row-1 to the master to put align output 
>> columns. I got stuck when I set up nested repeat loops for files, then 
>> lines, then items and was trying to deal with row 1 column headers and data 
>> rows at the same time, which got rather confusing. Separating the column 
>> name processing from parsing row data made life a lot simpler and I’ve now 
>> got LC parsing the ~200 CSV files into a ~60,000 row TSV file that opens in 
>> Excel.
>> 
>> However… I’m getting cells dropped into the wrong columns in the output 
>> file. So, I’m wondering if delimiters are broken in my CSV-to-TSV 
>> pre-processing. Can anyone spot any obvious errors or omissions in the 
>> following...
>> -- convert from CSV to TSV
>> 
>> replace tab with space in tFileData -- clear any tabs in the content before 
>> setting as a delimiter
>> 
>> replace quote & comma & quote with tab in tFileData -- change delimiter for 
>> quoted values
>> 
>> replace comma with tab in tFileData -- change delimiter for unquoted values
>> 
>> replace quote with "" in tFileData -- clear quotes in first & last items
>> 
>> set the itemDelimiter to tab
>> 
>> Best,
>> Keith
>>___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: CSV to TSV (was Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?)

2022-04-05 Thread Alex Tweedly via use-livecode

Hi Keith,

that code will fail for any commas which occur within quoted entries - 
they will be wrongly converted to TABs


I'd suggest getting cvsToTab (a community effort by Richard Gaskin, me 
and a whole host of others over the years) as a good starting place, and 
perhaps finishing place. It will handle most CSV oddities (not all of 
them - that is provably impossible :-).


This does an efficient walk through the data, remembering whether it is 
inside or outside quoted entries, and hence handles commas accordingly.


https://github.com/macMikey/csvToText/blob/master/csvToTab.livecodescript

Alex.

On 05/04/2022 17:02, Keith Clarke via use-livecode wrote:

Hi folks,
Thanks all for the responses and ideas on consolidating multiple CSV files into 
- much appreciated.

Ben - Thank you for sharing your working recipe. This lifted my spirits as it 
showed I was on the right path (very nearly!) and you moved me on a big step 
from where I was stuck.

My script was successfully iterating through folders and files, with filtering 
to get a file list of just CSVs with their paths for onward processing. I’d 
also identified the need to maintain registers of (growing) column names, 
together with  a master row template and a mapping of the current file’s column 
headers in row-1 to the master to put align output columns. I got stuck when I 
set up nested repeat loops for files, then lines, then items and was trying to 
deal with row 1 column headers and data rows at the same time, which got rather 
confusing. Separating the column name processing from parsing row data made 
life a lot simpler and I’ve now got LC parsing the ~200 CSV files into a 
~60,000 row TSV file that opens in Excel.

However… I’m getting cells dropped into the wrong columns in the output file. 
So, I’m wondering if delimiters are broken in my CSV-to-TSV pre-processing. Can 
anyone spot any obvious errors or omissions in the following...
-- convert from CSV to TSV

replace tab with space in tFileData -- clear any tabs in the content before 
setting as a delimiter

replace quote & comma & quote with tab in tFileData -- change delimiter for 
quoted values

replace comma with tab in tFileData -- change delimiter for unquoted values

replace quote with "" in tFileData -- clear quotes in first & last items

set the itemDelimiter to tab

Best,
Keith

___

use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: CSV to TSV (was Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?)

2022-04-05 Thread Keith Clarke via use-livecode
This code sits early in the process of preparing CSV file data (from files that 
I didn’t create, so their content is unknown) into TSV text for onward 
processing.
 
So, the logic here is attempting to address the very concerns that you raise, 
albeit in reverse order i.e. 
- remove any tab characters with (future) special meaning from the (CSV) data 
before introducing tabs as delimiters - to avoid false item-breaks downstream.
- swap out the two forms of CSV delimiter (commas between quotes and commas) 
with the alternative tab delimiters
…and finally clean up any orphan quotes left at the beginning of the first item 
in the line or end of the last.  

Best,
Keith 

> On 5 Apr 2022, at 17:09, Mike Kerner via use-livecode 
>  wrote:
> 
> I'm confused by the code, above. If you are using tab as your column
> delimiter, then you wouldn't replace it with a space, since your tsv/csv
> files would have tabs in them when they were exported, originally.
> In any case, when you are going to replace a character with another
> character, you should make sure that the character you are replacing it to
> either a) does not have any special meaning (e.g. a delimiter) or b) if it
> does have a special meaning that it does not appear in the data, already.
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: CSV to TSV (was Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?)

2022-04-05 Thread Mike Kerner via use-livecode
I'm confused by the code, above. If you are using tab as your column
delimiter, then you wouldn't replace it with a space, since your tsv/csv
files would have tabs in them when they were exported, originally.
In any case, when you are going to replace a character with another
character, you should make sure that the character you are replacing it to
either a) does not have any special meaning (e.g. a delimiter) or b) if it
does have a special meaning that it does not appear in the data, already.
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


CSV to TSV (was Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?)

2022-04-05 Thread Keith Clarke via use-livecode
Hi folks, 
Thanks all for the responses and ideas on consolidating multiple CSV files into 
- much appreciated.

Ben - Thank you for sharing your working recipe. This lifted my spirits as it 
showed I was on the right path (very nearly!) and you moved me on a big step 
from where I was stuck. 

My script was successfully iterating through folders and files, with filtering 
to get a file list of just CSVs with their paths for onward processing. I’d 
also identified the need to maintain registers of (growing) column names, 
together with  a master row template and a mapping of the current file’s column 
headers in row-1 to the master to put align output columns. I got stuck when I 
set up nested repeat loops for files, then lines, then items and was trying to 
deal with row 1 column headers and data rows at the same time, which got rather 
confusing. Separating the column name processing from parsing row data made 
life a lot simpler and I’ve now got LC parsing the ~200 CSV files into a 
~60,000 row TSV file that opens in Excel. 

However… I’m getting cells dropped into the wrong columns in the output file. 
So, I’m wondering if delimiters are broken in my CSV-to-TSV pre-processing. Can 
anyone spot any obvious errors or omissions in the following...
-- convert from CSV to TSV

replace tab with space in tFileData -- clear any tabs in the content before 
setting as a delimiter

replace quote & comma & quote with tab in tFileData -- change delimiter for 
quoted values

replace comma with tab in tFileData -- change delimiter for unquoted values

replace quote with "" in tFileData -- clear quotes in first & last items

set the itemDelimiter to tab

Best,
Keith
   
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode