Re: CSV to TSV (was Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?)

2022-04-05 Thread Keith Clarke via use-livecode
Ah, thanks Alex - I’ll dig into that. 
I did search around for CSV to TSV in several places before posting but not CSV 
to Tab and not github!
Best,
Keith

> On 5 Apr 2022, at 18:17, Alex Tweedly via use-livecode 
>  wrote:
> 
> Hi Keith,
> 
> that code will fail for any commas which occur within quoted entries - they 
> will be wrongly converted to TABs
> 
> I'd suggest getting cvsToTab (a community effort by Richard Gaskin, me and a 
> whole host of others over the years) as a good starting place, and perhaps 
> finishing place. It will handle most CSV oddities (not all of them - that is 
> provably impossible :-).
> 
> This does an efficient walk through the data, remembering whether it is 
> inside or outside quoted entries, and hence handles commas accordingly.
> 
> https://github.com/macMikey/csvToText/blob/master/csvToTab.livecodescript
> 
> Alex.
> 
> On 05/04/2022 17:02, Keith Clarke via use-livecode wrote:
>> Hi folks,
>> Thanks all for the responses and ideas on consolidating multiple CSV files 
>> into - much appreciated.
>> 
>> Ben - Thank you for sharing your working recipe. This lifted my spirits as 
>> it showed I was on the right path (very nearly!) and you moved me on a big 
>> step from where I was stuck.
>> 
>> My script was successfully iterating through folders and files, with 
>> filtering to get a file list of just CSVs with their paths for onward 
>> processing. I’d also identified the need to maintain registers of (growing) 
>> column names, together with  a master row template and a mapping of the 
>> current file’s column headers in row-1 to the master to put align output 
>> columns. I got stuck when I set up nested repeat loops for files, then 
>> lines, then items and was trying to deal with row 1 column headers and data 
>> rows at the same time, which got rather confusing. Separating the column 
>> name processing from parsing row data made life a lot simpler and I’ve now 
>> got LC parsing the ~200 CSV files into a ~60,000 row TSV file that opens in 
>> Excel.
>> 
>> However… I’m getting cells dropped into the wrong columns in the output 
>> file. So, I’m wondering if delimiters are broken in my CSV-to-TSV 
>> pre-processing. Can anyone spot any obvious errors or omissions in the 
>> following...
>> -- convert from CSV to TSV
>> 
>> replace tab with space in tFileData -- clear any tabs in the content before 
>> setting as a delimiter
>> 
>> replace quote & comma & quote with tab in tFileData -- change delimiter for 
>> quoted values
>> 
>> replace comma with tab in tFileData -- change delimiter for unquoted values
>> 
>> replace quote with "" in tFileData -- clear quotes in first & last items
>> 
>> set the itemDelimiter to tab
>> 
>> Best,
>> Keith
>>___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: CSV to TSV (was Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?)

2022-04-05 Thread Alex Tweedly via use-livecode

Hi Keith,

that code will fail for any commas which occur within quoted entries - 
they will be wrongly converted to TABs


I'd suggest getting cvsToTab (a community effort by Richard Gaskin, me 
and a whole host of others over the years) as a good starting place, and 
perhaps finishing place. It will handle most CSV oddities (not all of 
them - that is provably impossible :-).


This does an efficient walk through the data, remembering whether it is 
inside or outside quoted entries, and hence handles commas accordingly.


https://github.com/macMikey/csvToText/blob/master/csvToTab.livecodescript

Alex.

On 05/04/2022 17:02, Keith Clarke via use-livecode wrote:

Hi folks,
Thanks all for the responses and ideas on consolidating multiple CSV files into 
- much appreciated.

Ben - Thank you for sharing your working recipe. This lifted my spirits as it 
showed I was on the right path (very nearly!) and you moved me on a big step 
from where I was stuck.

My script was successfully iterating through folders and files, with filtering 
to get a file list of just CSVs with their paths for onward processing. I’d 
also identified the need to maintain registers of (growing) column names, 
together with  a master row template and a mapping of the current file’s column 
headers in row-1 to the master to put align output columns. I got stuck when I 
set up nested repeat loops for files, then lines, then items and was trying to 
deal with row 1 column headers and data rows at the same time, which got rather 
confusing. Separating the column name processing from parsing row data made 
life a lot simpler and I’ve now got LC parsing the ~200 CSV files into a 
~60,000 row TSV file that opens in Excel.

However… I’m getting cells dropped into the wrong columns in the output file. 
So, I’m wondering if delimiters are broken in my CSV-to-TSV pre-processing. Can 
anyone spot any obvious errors or omissions in the following...
-- convert from CSV to TSV

replace tab with space in tFileData -- clear any tabs in the content before 
setting as a delimiter

replace quote & comma & quote with tab in tFileData -- change delimiter for 
quoted values

replace comma with tab in tFileData -- change delimiter for unquoted values

replace quote with "" in tFileData -- clear quotes in first & last items

set the itemDelimiter to tab

Best,
Keith

___

use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: CSV to TSV (was Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?)

2022-04-05 Thread Keith Clarke via use-livecode
This code sits early in the process of preparing CSV file data (from files that 
I didn’t create, so their content is unknown) into TSV text for onward 
processing.
 
So, the logic here is attempting to address the very concerns that you raise, 
albeit in reverse order i.e. 
- remove any tab characters with (future) special meaning from the (CSV) data 
before introducing tabs as delimiters - to avoid false item-breaks downstream.
- swap out the two forms of CSV delimiter (commas between quotes and commas) 
with the alternative tab delimiters
…and finally clean up any orphan quotes left at the beginning of the first item 
in the line or end of the last.  

Best,
Keith 

> On 5 Apr 2022, at 17:09, Mike Kerner via use-livecode 
>  wrote:
> 
> I'm confused by the code, above. If you are using tab as your column
> delimiter, then you wouldn't replace it with a space, since your tsv/csv
> files would have tabs in them when they were exported, originally.
> In any case, when you are going to replace a character with another
> character, you should make sure that the character you are replacing it to
> either a) does not have any special meaning (e.g. a delimiter) or b) if it
> does have a special meaning that it does not appear in the data, already.
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode


___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: CSV to TSV (was Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?)

2022-04-05 Thread Mike Kerner via use-livecode
I'm confused by the code, above. If you are using tab as your column
delimiter, then you wouldn't replace it with a space, since your tsv/csv
files would have tabs in them when they were exported, originally.
In any case, when you are going to replace a character with another
character, you should make sure that the character you are replacing it to
either a) does not have any special meaning (e.g. a delimiter) or b) if it
does have a special meaning that it does not appear in the data, already.
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


CSV to TSV (was Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?)

2022-04-05 Thread Keith Clarke via use-livecode
Hi folks, 
Thanks all for the responses and ideas on consolidating multiple CSV files into 
- much appreciated.

Ben - Thank you for sharing your working recipe. This lifted my spirits as it 
showed I was on the right path (very nearly!) and you moved me on a big step 
from where I was stuck. 

My script was successfully iterating through folders and files, with filtering 
to get a file list of just CSVs with their paths for onward processing. I’d 
also identified the need to maintain registers of (growing) column names, 
together with  a master row template and a mapping of the current file’s column 
headers in row-1 to the master to put align output columns. I got stuck when I 
set up nested repeat loops for files, then lines, then items and was trying to 
deal with row 1 column headers and data rows at the same time, which got rather 
confusing. Separating the column name processing from parsing row data made 
life a lot simpler and I’ve now got LC parsing the ~200 CSV files into a 
~60,000 row TSV file that opens in Excel. 

However… I’m getting cells dropped into the wrong columns in the output file. 
So, I’m wondering if delimiters are broken in my CSV-to-TSV pre-processing. Can 
anyone spot any obvious errors or omissions in the following...
-- convert from CSV to TSV

replace tab with space in tFileData -- clear any tabs in the content before 
setting as a delimiter

replace quote & comma & quote with tab in tFileData -- change delimiter for 
quoted values

replace comma with tab in tFileData -- change delimiter for unquoted values

replace quote with "" in tFileData -- clear quotes in first & last items

set the itemDelimiter to tab

Best,
Keith
   
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?

2022-04-04 Thread Bob Sneidar via use-livecode
Okay no answer to my question so here is why I ask. SQL databases typically 
have a limit on how many columns you can have, and how many total bytes a 
record will take. If it is possible to import each CSV file as a separate 
table, that would be ideal. If not you need to make yourself aware of those 
limits, and if you are going to exceed them, then SQL is not the way to go. 

CSV format is OK if the text fields are enclosed in quotes, and the numeric 
fields can be enclosed in quotes or not, it doesn't matter. If text fields ARE 
enclosed in quotes, then Excel should be able to open it natively, even if 
there are commas in the field text, that is not as delimiters. Id the fields 
are NOT enclosed in quotes, and the field text DOES contain commas, you are 
essentially screwed. 

Bob S


> On Apr 4, 2022, at 10:37 , Bob Sneidar via use-livecode 
>  wrote:
> 
> Does all the data need to be in a single table? 
> 
> Sent from my iPhone
> 
>> On Apr 4, 2022, at 10:31, Mike Kerner via use-livecode 
>>  wrote:
>> 
>> keith,
>> are all the files structured the same way? are they all gathered in the
>> same place?
>> LC's big strength, IMHO, is text handling, so you're right in its
>> wheelhouse.
>> for the simplest example, let's assume that all the files have all the same
>> column layout and they're all in the same folder.
>> in that case, you would
>> * grab the list of files in the folder
>> * exclude files that shouldn't be included - maybe filter everything
>> that doesn't have a .csv suffix. you could do this during the loop, below,
>> or ahead of time.
>> * iterate through the files by reading each one, and appending the contents
>> to a variable/container (if you do this, don't forget to make sure that
>> when you append each file, the last line ends with a line delimiter)
>> * create a new file
>> * save the variable/container to the file
>> 
>> slightly more complicated: the layouts aren't the same
>> you can either:
>> * rearrange the columns upon reading the file or
>> * use something like an sqlite db and create a record for each row,
>> assigning the column in each record based on the column name in the csv
>> file,  or
>> * use a LC array, using the column names in the file as the keys of the
>> array.
>> * create a new file
>> * output the result of whichever of the three solutions you chose to the
>> new file.
>> 
>> both the easy and less-easy scenarios should take, i'm guessing, somewhere
>> between twenty and fifty lines of code, and are easy to implement. if you
>> have a couple hundred files, i think that the LC solution would be much
>> faster and easier to write, test, and run than the drag-and-drop solution.
>> 
>>> On Mon, Apr 4, 2022 at 1:04 PM Keith Clarke via use-livecode <
>>> use-livecode@lists.runrev.com> wrote:
>>> 
>>> Hi folks,
>>> I need to consolidate a couple of hundred CSV files of varying sizes
>>> (dozens to hundreds of rows) and column structures (some shared columns but
>>> many unique), into a single superset dataset that comprises all data from
>>> all files.
>>> 
>>> There are too many files and columns to attempt a manual process, so in
>>> I’ve been trying to achieve this with LiveCode - by iterating through the
>>> files to parse the column header rows into into row template and column
>>> mapping arrays used to build the combined table. However, I'm struggling to
>>> both compile the superset and render the results.
>>> 
>>> I feel I may be designing a very complicated solution for a problem that
>>> has probably been solved before. I wonder if I’m missing a simpler method -
>>> with LiveCode or perhaps with Excel or ‘drag & drop’ into some kind of
>>> self-organising database GUI for SQLite, etc?
>>> 
>>> Thanks in advance for any ideas.
>>> Best,
>>> Keith
>>> ___
>>> use-livecode mailing list
>>> use-livecode@lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>> 
>> 
>> 
>> -- 
>> On the first day, God created the heavens and the Earth
>> On the second day, God created the oceans.
>> On the third day, God put the animals on hold for a few hours,
>>  and did a little diving.
>> And God said, "This is good."
>> ___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode

___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your 

Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?

2022-04-04 Thread Ben Rubinstein via use-livecode

Hi Keith,

I feel your pain. I spend a lot of time doing this kind of manipulation.

I generally avoid CSV, the format of the devil. If necessary I have a library 
that does a reasonable job of converting CSV to TSV, and run everything 
through that. Let's just assume for now that you've already done this


My technique is this:

- given that you've got a routine to iterate over a folder or tree of files
- given that you can easily treat a row at a time, and on each row can easily 
work through a column at a time (e.g., this is TSV, you've set the 
itemdelimiter to tab)

- given that the first row in each file gives the column names

1. Maintain an ordered list of output column names

I'd probably keep it in two formats: a string with tab separated column names, 
 an array mapping column name to index, and a variable giving the number of 
columns.


2. For each file, go through the list of column names (the 'items' of the 
first row). For each one, if it's not already in the master list of column 
names (e.g., it's not in the array) then append it with a tab to the end of 
the string, and add it to the array with the appropriate index.


Also keep an array mapping column index in _this_ file to column index in the 
master file.


3. Then for each row after that, start with an empty array. For each non-empty 
item on the row, add it to this 'row array', with the key being the index in 
the master file corresponding to this item's index in this file.


4. When you've got to the end of the row, dump the data from this array; index 
from 1 to number-of-master-columns, adding a tab between each. Then add this 
to the master file accumulator.



So code would look something like this


local tMasterColumns -- tab delimited column names for output 'master' file
local aColumnNameToMasterIndex -- array mapping column name to index in above 
 local iNumMasterColumns -- number of items in the above two


local tMasterFileData -- will accumulate the rows of data for the output file

local aFileIndexToMasterIndex -- for each file, maps index of column in file 
to index in master file

local aRowData -- for each row, we first move data into this array...
local tOutRow -- ...then output it into this string

local iFileNumCols -- number of columns in the current input file

local iFileColInx, iMasterColInx -- keep track of input and output col indices


repeat for each file... load it into tFileData.. etc

  -- map the columns of this file to the (growing) columns of the masterfile
  put 0 into iFileColInx
  repeat for each item x in line 1 of tFileData
add 1 to iFileColInx
get aColumnNameToMasterIndex[x]
if it = empty then
   put tab & x after tMasterColumns
   add 1 to iNumMasterColumns
   put iNumMasterColumns into aColumnNameToMasterIndex[x]
   get iNumMasterColumns
end if
-- now it is the index of this column in the master file
put it into aFileIndexToMasterIndex[iFileColInx]
  end repeat
  delete line 1 of tFileData
  put iFileColInx into iFileNumCols

  repeat for each line tRowData in tFileData

-- get data from the row into the proper columns
put empty into aRowData
repeat with i = 1 to iFileNumCols -- number of columns in this file
  put aFileIndexToMasterIndex[i] into iMasterColInx
  put item i of tRowData into aRowData[iMasterColInx]
end repeat

-- now dump the row
put empty into tOutRow
repeat with i = 1 to iNumMasterColumns
  put aRowData[i] & tab after tOutRow
end repeat
put (char 1 to -2 of tOutRow) \ -- delete last tab
  & return after tMasterFileData

  end repeat

end repeat -- for each file

-- finally save tMasterColumns & return & tMasterFileData










On 04/04/2022 18:03, Keith Clarke via use-livecode wrote:

Hi folks,
I need to consolidate a couple of hundred CSV files of varying sizes (dozens to 
hundreds of rows) and column structures (some shared columns but many unique), 
into a single superset dataset that comprises all data from all files.

There are too many files and columns to attempt a manual process, so in I’ve 
been trying to achieve this with LiveCode - by iterating through the files to 
parse the column header rows into into row template and column mapping arrays 
used to build the combined table. However, I'm struggling to both compile the 
superset and render the results.

I feel I may be designing a very complicated solution for a problem that has 
probably been solved before. I wonder if I’m missing a simpler method - with 
LiveCode or perhaps with Excel or ‘drag & drop’ into some kind of 
self-organising database GUI for SQLite, etc?

Thanks in advance for any ideas.
Best,
Keith
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


___

Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?

2022-04-04 Thread Craig Newman via use-livecode
Hi.

I rarely deal with the csv monster, but every time I have had to it all boiled 
down to the fact that a comma is a terrible character to use as a delimiter of 
ANY kind.

Ideally it would be possible to simply replace every comma with a tab. The 
returns remain untouched. If that works, you are very lucky. Have you tried it?

Craig

> On Apr 4, 2022, at 1:37 PM, Bob Sneidar via use-livecode 
>  wrote:
> 
> Does all the data need to be in a single table? 
> 
> Sent from my iPhone
> 
>> On Apr 4, 2022, at 10:31, Mike Kerner via use-livecode 
>> mailto:use-livecode@lists.runrev.com>> wrote:
>> 
>> keith,
>> are all the files structured the same way? are they all gathered in the
>> same place?
>> LC's big strength, IMHO, is text handling, so you're right in its
>> wheelhouse.
>> for the simplest example, let's assume that all the files have all the same
>> column layout and they're all in the same folder.
>> in that case, you would
>> * grab the list of files in the folder
>> * exclude files that shouldn't be included - maybe filter everything
>> that doesn't have a .csv suffix. you could do this during the loop, below,
>> or ahead of time.
>> * iterate through the files by reading each one, and appending the contents
>> to a variable/container (if you do this, don't forget to make sure that
>> when you append each file, the last line ends with a line delimiter)
>> * create a new file
>> * save the variable/container to the file
>> 
>> slightly more complicated: the layouts aren't the same
>> you can either:
>> * rearrange the columns upon reading the file or
>> * use something like an sqlite db and create a record for each row,
>> assigning the column in each record based on the column name in the csv
>> file,  or
>> * use a LC array, using the column names in the file as the keys of the
>> array.
>> * create a new file
>> * output the result of whichever of the three solutions you chose to the
>> new file.
>> 
>> both the easy and less-easy scenarios should take, i'm guessing, somewhere
>> between twenty and fifty lines of code, and are easy to implement. if you
>> have a couple hundred files, i think that the LC solution would be much
>> faster and easier to write, test, and run than the drag-and-drop solution.
>> 
>>> On Mon, Apr 4, 2022 at 1:04 PM Keith Clarke via use-livecode <
>>> use-livecode@lists.runrev.com > wrote:
>>> 
>>> Hi folks,
>>> I need to consolidate a couple of hundred CSV files of varying sizes
>>> (dozens to hundreds of rows) and column structures (some shared columns but
>>> many unique), into a single superset dataset that comprises all data from
>>> all files.
>>> 
>>> There are too many files and columns to attempt a manual process, so in
>>> I’ve been trying to achieve this with LiveCode - by iterating through the
>>> files to parse the column header rows into into row template and column
>>> mapping arrays used to build the combined table. However, I'm struggling to
>>> both compile the superset and render the results.
>>> 
>>> I feel I may be designing a very complicated solution for a problem that
>>> has probably been solved before. I wonder if I’m missing a simpler method -
>>> with LiveCode or perhaps with Excel or ‘drag & drop’ into some kind of
>>> self-organising database GUI for SQLite, etc?
>>> 
>>> Thanks in advance for any ideas.
>>> Best,
>>> Keith
>>> ___
>>> use-livecode mailing list
>>> use-livecode@lists.runrev.com
>>> Please visit this url to subscribe, unsubscribe and manage your
>>> subscription preferences:
>>> http://lists.runrev.com/mailman/listinfo/use-livecode
>>> 
>> 
>> 
>> -- 
>> On the first day, God created the heavens and the Earth
>> On the second day, God created the oceans.
>> On the third day, God put the animals on hold for a few hours,
>>  and did a little diving.
>> And God said, "This is good."
>> ___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com 
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode 
> 
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?

2022-04-04 Thread Bob Sneidar via use-livecode
Does all the data need to be in a single table? 

Sent from my iPhone

> On Apr 4, 2022, at 10:31, Mike Kerner via use-livecode 
>  wrote:
> 
> keith,
> are all the files structured the same way? are they all gathered in the
> same place?
> LC's big strength, IMHO, is text handling, so you're right in its
> wheelhouse.
> for the simplest example, let's assume that all the files have all the same
> column layout and they're all in the same folder.
> in that case, you would
> * grab the list of files in the folder
> * exclude files that shouldn't be included - maybe filter everything
> that doesn't have a .csv suffix. you could do this during the loop, below,
> or ahead of time.
> * iterate through the files by reading each one, and appending the contents
> to a variable/container (if you do this, don't forget to make sure that
> when you append each file, the last line ends with a line delimiter)
> * create a new file
> * save the variable/container to the file
> 
> slightly more complicated: the layouts aren't the same
> you can either:
> * rearrange the columns upon reading the file or
> * use something like an sqlite db and create a record for each row,
> assigning the column in each record based on the column name in the csv
> file,  or
> * use a LC array, using the column names in the file as the keys of the
> array.
> * create a new file
> * output the result of whichever of the three solutions you chose to the
> new file.
> 
> both the easy and less-easy scenarios should take, i'm guessing, somewhere
> between twenty and fifty lines of code, and are easy to implement. if you
> have a couple hundred files, i think that the LC solution would be much
> faster and easier to write, test, and run than the drag-and-drop solution.
> 
>> On Mon, Apr 4, 2022 at 1:04 PM Keith Clarke via use-livecode <
>> use-livecode@lists.runrev.com> wrote:
>> 
>> Hi folks,
>> I need to consolidate a couple of hundred CSV files of varying sizes
>> (dozens to hundreds of rows) and column structures (some shared columns but
>> many unique), into a single superset dataset that comprises all data from
>> all files.
>> 
>> There are too many files and columns to attempt a manual process, so in
>> I’ve been trying to achieve this with LiveCode - by iterating through the
>> files to parse the column header rows into into row template and column
>> mapping arrays used to build the combined table. However, I'm struggling to
>> both compile the superset and render the results.
>> 
>> I feel I may be designing a very complicated solution for a problem that
>> has probably been solved before. I wonder if I’m missing a simpler method -
>> with LiveCode or perhaps with Excel or ‘drag & drop’ into some kind of
>> self-organising database GUI for SQLite, etc?
>> 
>> Thanks in advance for any ideas.
>> Best,
>> Keith
>> ___
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your
>> subscription preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
>> 
> 
> 
> -- 
> On the first day, God created the heavens and the Earth
> On the second day, God created the oceans.
> On the third day, God put the animals on hold for a few hours,
>   and did a little diving.
> And God said, "This is good."
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?

2022-04-04 Thread Mike Kerner via use-livecode
keith,
are all the files structured the same way? are they all gathered in the
same place?
LC's big strength, IMHO, is text handling, so you're right in its
wheelhouse.
for the simplest example, let's assume that all the files have all the same
column layout and they're all in the same folder.
in that case, you would
* grab the list of files in the folder
* exclude files that shouldn't be included - maybe filter everything
that doesn't have a .csv suffix. you could do this during the loop, below,
or ahead of time.
* iterate through the files by reading each one, and appending the contents
to a variable/container (if you do this, don't forget to make sure that
when you append each file, the last line ends with a line delimiter)
* create a new file
* save the variable/container to the file

slightly more complicated: the layouts aren't the same
you can either:
* rearrange the columns upon reading the file or
* use something like an sqlite db and create a record for each row,
assigning the column in each record based on the column name in the csv
file,  or
* use a LC array, using the column names in the file as the keys of the
array.
* create a new file
* output the result of whichever of the three solutions you chose to the
new file.

both the easy and less-easy scenarios should take, i'm guessing, somewhere
between twenty and fifty lines of code, and are easy to implement. if you
have a couple hundred files, i think that the LC solution would be much
faster and easier to write, test, and run than the drag-and-drop solution.

On Mon, Apr 4, 2022 at 1:04 PM Keith Clarke via use-livecode <
use-livecode@lists.runrev.com> wrote:

> Hi folks,
> I need to consolidate a couple of hundred CSV files of varying sizes
> (dozens to hundreds of rows) and column structures (some shared columns but
> many unique), into a single superset dataset that comprises all data from
> all files.
>
> There are too many files and columns to attempt a manual process, so in
> I’ve been trying to achieve this with LiveCode - by iterating through the
> files to parse the column header rows into into row template and column
> mapping arrays used to build the combined table. However, I'm struggling to
> both compile the superset and render the results.
>
> I feel I may be designing a very complicated solution for a problem that
> has probably been solved before. I wonder if I’m missing a simpler method -
> with LiveCode or perhaps with Excel or ‘drag & drop’ into some kind of
> self-organising database GUI for SQLite, etc?
>
> Thanks in advance for any ideas.
> Best,
> Keith
> ___
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>


-- 
On the first day, God created the heavens and the Earth
On the second day, God created the oceans.
On the third day, God put the animals on hold for a few hours,
   and did a little diving.
And God said, "This is good."
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


Tools & techniques for one-off consolidation of multiple 'similar' CSV files?

2022-04-04 Thread Keith Clarke via use-livecode
Hi folks,
I need to consolidate a couple of hundred CSV files of varying sizes (dozens to 
hundreds of rows) and column structures (some shared columns but many unique), 
into a single superset dataset that comprises all data from all files.

There are too many files and columns to attempt a manual process, so in I’ve 
been trying to achieve this with LiveCode - by iterating through the files to 
parse the column header rows into into row template and column mapping arrays 
used to build the combined table. However, I'm struggling to both compile the 
superset and render the results. 

I feel I may be designing a very complicated solution for a problem that has 
probably been solved before. I wonder if I’m missing a simpler method - with 
LiveCode or perhaps with Excel or ‘drag & drop’ into some kind of 
self-organising database GUI for SQLite, etc? 

Thanks in advance for any ideas.
Best,
Keith
___
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode