Re: CSV to TSV (was Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?)
Ah, thanks Alex - I’ll dig into that. I did search around for CSV to TSV in several places before posting but not CSV to Tab and not github! Best, Keith > On 5 Apr 2022, at 18:17, Alex Tweedly via use-livecode > wrote: > > Hi Keith, > > that code will fail for any commas which occur within quoted entries - they > will be wrongly converted to TABs > > I'd suggest getting cvsToTab (a community effort by Richard Gaskin, me and a > whole host of others over the years) as a good starting place, and perhaps > finishing place. It will handle most CSV oddities (not all of them - that is > provably impossible :-). > > This does an efficient walk through the data, remembering whether it is > inside or outside quoted entries, and hence handles commas accordingly. > > https://github.com/macMikey/csvToText/blob/master/csvToTab.livecodescript > > Alex. > > On 05/04/2022 17:02, Keith Clarke via use-livecode wrote: >> Hi folks, >> Thanks all for the responses and ideas on consolidating multiple CSV files >> into - much appreciated. >> >> Ben - Thank you for sharing your working recipe. This lifted my spirits as >> it showed I was on the right path (very nearly!) and you moved me on a big >> step from where I was stuck. >> >> My script was successfully iterating through folders and files, with >> filtering to get a file list of just CSVs with their paths for onward >> processing. I’d also identified the need to maintain registers of (growing) >> column names, together with a master row template and a mapping of the >> current file’s column headers in row-1 to the master to put align output >> columns. I got stuck when I set up nested repeat loops for files, then >> lines, then items and was trying to deal with row 1 column headers and data >> rows at the same time, which got rather confusing. Separating the column >> name processing from parsing row data made life a lot simpler and I’ve now >> got LC parsing the ~200 CSV files into a ~60,000 row TSV file that opens in >> Excel. >> >> However… I’m getting cells dropped into the wrong columns in the output >> file. So, I’m wondering if delimiters are broken in my CSV-to-TSV >> pre-processing. Can anyone spot any obvious errors or omissions in the >> following... >> -- convert from CSV to TSV >> >> replace tab with space in tFileData -- clear any tabs in the content before >> setting as a delimiter >> >> replace quote & comma & quote with tab in tFileData -- change delimiter for >> quoted values >> >> replace comma with tab in tFileData -- change delimiter for unquoted values >> >> replace quote with "" in tFileData -- clear quotes in first & last items >> >> set the itemDelimiter to tab >> >> Best, >> Keith >>___ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your subscription >> preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode > > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: CSV to TSV (was Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?)
Hi Keith, that code will fail for any commas which occur within quoted entries - they will be wrongly converted to TABs I'd suggest getting cvsToTab (a community effort by Richard Gaskin, me and a whole host of others over the years) as a good starting place, and perhaps finishing place. It will handle most CSV oddities (not all of them - that is provably impossible :-). This does an efficient walk through the data, remembering whether it is inside or outside quoted entries, and hence handles commas accordingly. https://github.com/macMikey/csvToText/blob/master/csvToTab.livecodescript Alex. On 05/04/2022 17:02, Keith Clarke via use-livecode wrote: Hi folks, Thanks all for the responses and ideas on consolidating multiple CSV files into - much appreciated. Ben - Thank you for sharing your working recipe. This lifted my spirits as it showed I was on the right path (very nearly!) and you moved me on a big step from where I was stuck. My script was successfully iterating through folders and files, with filtering to get a file list of just CSVs with their paths for onward processing. I’d also identified the need to maintain registers of (growing) column names, together with a master row template and a mapping of the current file’s column headers in row-1 to the master to put align output columns. I got stuck when I set up nested repeat loops for files, then lines, then items and was trying to deal with row 1 column headers and data rows at the same time, which got rather confusing. Separating the column name processing from parsing row data made life a lot simpler and I’ve now got LC parsing the ~200 CSV files into a ~60,000 row TSV file that opens in Excel. However… I’m getting cells dropped into the wrong columns in the output file. So, I’m wondering if delimiters are broken in my CSV-to-TSV pre-processing. Can anyone spot any obvious errors or omissions in the following... -- convert from CSV to TSV replace tab with space in tFileData -- clear any tabs in the content before setting as a delimiter replace quote & comma & quote with tab in tFileData -- change delimiter for quoted values replace comma with tab in tFileData -- change delimiter for unquoted values replace quote with "" in tFileData -- clear quotes in first & last items set the itemDelimiter to tab Best, Keith ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: CSV to TSV (was Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?)
This code sits early in the process of preparing CSV file data (from files that I didn’t create, so their content is unknown) into TSV text for onward processing. So, the logic here is attempting to address the very concerns that you raise, albeit in reverse order i.e. - remove any tab characters with (future) special meaning from the (CSV) data before introducing tabs as delimiters - to avoid false item-breaks downstream. - swap out the two forms of CSV delimiter (commas between quotes and commas) with the alternative tab delimiters …and finally clean up any orphan quotes left at the beginning of the first item in the line or end of the last. Best, Keith > On 5 Apr 2022, at 17:09, Mike Kerner via use-livecode > wrote: > > I'm confused by the code, above. If you are using tab as your column > delimiter, then you wouldn't replace it with a space, since your tsv/csv > files would have tabs in them when they were exported, originally. > In any case, when you are going to replace a character with another > character, you should make sure that the character you are replacing it to > either a) does not have any special meaning (e.g. a delimiter) or b) if it > does have a special meaning that it does not appear in the data, already. > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: CSV to TSV (was Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?)
I'm confused by the code, above. If you are using tab as your column delimiter, then you wouldn't replace it with a space, since your tsv/csv files would have tabs in them when they were exported, originally. In any case, when you are going to replace a character with another character, you should make sure that the character you are replacing it to either a) does not have any special meaning (e.g. a delimiter) or b) if it does have a special meaning that it does not appear in the data, already. ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
CSV to TSV (was Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?)
Hi folks, Thanks all for the responses and ideas on consolidating multiple CSV files into - much appreciated. Ben - Thank you for sharing your working recipe. This lifted my spirits as it showed I was on the right path (very nearly!) and you moved me on a big step from where I was stuck. My script was successfully iterating through folders and files, with filtering to get a file list of just CSVs with their paths for onward processing. I’d also identified the need to maintain registers of (growing) column names, together with a master row template and a mapping of the current file’s column headers in row-1 to the master to put align output columns. I got stuck when I set up nested repeat loops for files, then lines, then items and was trying to deal with row 1 column headers and data rows at the same time, which got rather confusing. Separating the column name processing from parsing row data made life a lot simpler and I’ve now got LC parsing the ~200 CSV files into a ~60,000 row TSV file that opens in Excel. However… I’m getting cells dropped into the wrong columns in the output file. So, I’m wondering if delimiters are broken in my CSV-to-TSV pre-processing. Can anyone spot any obvious errors or omissions in the following... -- convert from CSV to TSV replace tab with space in tFileData -- clear any tabs in the content before setting as a delimiter replace quote & comma & quote with tab in tFileData -- change delimiter for quoted values replace comma with tab in tFileData -- change delimiter for unquoted values replace quote with "" in tFileData -- clear quotes in first & last items set the itemDelimiter to tab Best, Keith ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?
Okay no answer to my question so here is why I ask. SQL databases typically have a limit on how many columns you can have, and how many total bytes a record will take. If it is possible to import each CSV file as a separate table, that would be ideal. If not you need to make yourself aware of those limits, and if you are going to exceed them, then SQL is not the way to go. CSV format is OK if the text fields are enclosed in quotes, and the numeric fields can be enclosed in quotes or not, it doesn't matter. If text fields ARE enclosed in quotes, then Excel should be able to open it natively, even if there are commas in the field text, that is not as delimiters. Id the fields are NOT enclosed in quotes, and the field text DOES contain commas, you are essentially screwed. Bob S > On Apr 4, 2022, at 10:37 , Bob Sneidar via use-livecode > wrote: > > Does all the data need to be in a single table? > > Sent from my iPhone > >> On Apr 4, 2022, at 10:31, Mike Kerner via use-livecode >> wrote: >> >> keith, >> are all the files structured the same way? are they all gathered in the >> same place? >> LC's big strength, IMHO, is text handling, so you're right in its >> wheelhouse. >> for the simplest example, let's assume that all the files have all the same >> column layout and they're all in the same folder. >> in that case, you would >> * grab the list of files in the folder >> * exclude files that shouldn't be included - maybe filter everything >> that doesn't have a .csv suffix. you could do this during the loop, below, >> or ahead of time. >> * iterate through the files by reading each one, and appending the contents >> to a variable/container (if you do this, don't forget to make sure that >> when you append each file, the last line ends with a line delimiter) >> * create a new file >> * save the variable/container to the file >> >> slightly more complicated: the layouts aren't the same >> you can either: >> * rearrange the columns upon reading the file or >> * use something like an sqlite db and create a record for each row, >> assigning the column in each record based on the column name in the csv >> file, or >> * use a LC array, using the column names in the file as the keys of the >> array. >> * create a new file >> * output the result of whichever of the three solutions you chose to the >> new file. >> >> both the easy and less-easy scenarios should take, i'm guessing, somewhere >> between twenty and fifty lines of code, and are easy to implement. if you >> have a couple hundred files, i think that the LC solution would be much >> faster and easier to write, test, and run than the drag-and-drop solution. >> >>> On Mon, Apr 4, 2022 at 1:04 PM Keith Clarke via use-livecode < >>> use-livecode@lists.runrev.com> wrote: >>> >>> Hi folks, >>> I need to consolidate a couple of hundred CSV files of varying sizes >>> (dozens to hundreds of rows) and column structures (some shared columns but >>> many unique), into a single superset dataset that comprises all data from >>> all files. >>> >>> There are too many files and columns to attempt a manual process, so in >>> I’ve been trying to achieve this with LiveCode - by iterating through the >>> files to parse the column header rows into into row template and column >>> mapping arrays used to build the combined table. However, I'm struggling to >>> both compile the superset and render the results. >>> >>> I feel I may be designing a very complicated solution for a problem that >>> has probably been solved before. I wonder if I’m missing a simpler method - >>> with LiveCode or perhaps with Excel or ‘drag & drop’ into some kind of >>> self-organising database GUI for SQLite, etc? >>> >>> Thanks in advance for any ideas. >>> Best, >>> Keith >>> ___ >>> use-livecode mailing list >>> use-livecode@lists.runrev.com >>> Please visit this url to subscribe, unsubscribe and manage your >>> subscription preferences: >>> http://lists.runrev.com/mailman/listinfo/use-livecode >>> >> >> >> -- >> On the first day, God created the heavens and the Earth >> On the second day, God created the oceans. >> On the third day, God put the animals on hold for a few hours, >> and did a little diving. >> And God said, "This is good." >> ___ >> use-livecode mailing list >> use-live
Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?
Hi Keith, I feel your pain. I spend a lot of time doing this kind of manipulation. I generally avoid CSV, the format of the devil. If necessary I have a library that does a reasonable job of converting CSV to TSV, and run everything through that. Let's just assume for now that you've already done this My technique is this: - given that you've got a routine to iterate over a folder or tree of files - given that you can easily treat a row at a time, and on each row can easily work through a column at a time (e.g., this is TSV, you've set the itemdelimiter to tab) - given that the first row in each file gives the column names 1. Maintain an ordered list of output column names I'd probably keep it in two formats: a string with tab separated column names, an array mapping column name to index, and a variable giving the number of columns. 2. For each file, go through the list of column names (the 'items' of the first row). For each one, if it's not already in the master list of column names (e.g., it's not in the array) then append it with a tab to the end of the string, and add it to the array with the appropriate index. Also keep an array mapping column index in _this_ file to column index in the master file. 3. Then for each row after that, start with an empty array. For each non-empty item on the row, add it to this 'row array', with the key being the index in the master file corresponding to this item's index in this file. 4. When you've got to the end of the row, dump the data from this array; index from 1 to number-of-master-columns, adding a tab between each. Then add this to the master file accumulator. So code would look something like this local tMasterColumns -- tab delimited column names for output 'master' file local aColumnNameToMasterIndex -- array mapping column name to index in above local iNumMasterColumns -- number of items in the above two local tMasterFileData -- will accumulate the rows of data for the output file local aFileIndexToMasterIndex -- for each file, maps index of column in file to index in master file local aRowData -- for each row, we first move data into this array... local tOutRow -- ...then output it into this string local iFileNumCols -- number of columns in the current input file local iFileColInx, iMasterColInx -- keep track of input and output col indices repeat for each file... load it into tFileData.. etc -- map the columns of this file to the (growing) columns of the masterfile put 0 into iFileColInx repeat for each item x in line 1 of tFileData add 1 to iFileColInx get aColumnNameToMasterIndex[x] if it = empty then put tab & x after tMasterColumns add 1 to iNumMasterColumns put iNumMasterColumns into aColumnNameToMasterIndex[x] get iNumMasterColumns end if -- now it is the index of this column in the master file put it into aFileIndexToMasterIndex[iFileColInx] end repeat delete line 1 of tFileData put iFileColInx into iFileNumCols repeat for each line tRowData in tFileData -- get data from the row into the proper columns put empty into aRowData repeat with i = 1 to iFileNumCols -- number of columns in this file put aFileIndexToMasterIndex[i] into iMasterColInx put item i of tRowData into aRowData[iMasterColInx] end repeat -- now dump the row put empty into tOutRow repeat with i = 1 to iNumMasterColumns put aRowData[i] & tab after tOutRow end repeat put (char 1 to -2 of tOutRow) \ -- delete last tab & return after tMasterFileData end repeat end repeat -- for each file -- finally save tMasterColumns & return & tMasterFileData On 04/04/2022 18:03, Keith Clarke via use-livecode wrote: Hi folks, I need to consolidate a couple of hundred CSV files of varying sizes (dozens to hundreds of rows) and column structures (some shared columns but many unique), into a single superset dataset that comprises all data from all files. There are too many files and columns to attempt a manual process, so in I’ve been trying to achieve this with LiveCode - by iterating through the files to parse the column header rows into into row template and column mapping arrays used to build the combined table. However, I'm struggling to both compile the superset and render the results. I feel I may be designing a very complicated solution for a problem that has probably been solved before. I wonder if I’m missing a simpler method - with LiveCode or perhaps with Excel or ‘drag & drop’ into some kind of self-organising database GUI for SQLite, etc? Thanks in advance for any ideas. Best, Keith ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___
Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?
Hi. I rarely deal with the csv monster, but every time I have had to it all boiled down to the fact that a comma is a terrible character to use as a delimiter of ANY kind. Ideally it would be possible to simply replace every comma with a tab. The returns remain untouched. If that works, you are very lucky. Have you tried it? Craig > On Apr 4, 2022, at 1:37 PM, Bob Sneidar via use-livecode > wrote: > > Does all the data need to be in a single table? > > Sent from my iPhone > >> On Apr 4, 2022, at 10:31, Mike Kerner via use-livecode >> mailto:use-livecode@lists.runrev.com>> wrote: >> >> keith, >> are all the files structured the same way? are they all gathered in the >> same place? >> LC's big strength, IMHO, is text handling, so you're right in its >> wheelhouse. >> for the simplest example, let's assume that all the files have all the same >> column layout and they're all in the same folder. >> in that case, you would >> * grab the list of files in the folder >> * exclude files that shouldn't be included - maybe filter everything >> that doesn't have a .csv suffix. you could do this during the loop, below, >> or ahead of time. >> * iterate through the files by reading each one, and appending the contents >> to a variable/container (if you do this, don't forget to make sure that >> when you append each file, the last line ends with a line delimiter) >> * create a new file >> * save the variable/container to the file >> >> slightly more complicated: the layouts aren't the same >> you can either: >> * rearrange the columns upon reading the file or >> * use something like an sqlite db and create a record for each row, >> assigning the column in each record based on the column name in the csv >> file, or >> * use a LC array, using the column names in the file as the keys of the >> array. >> * create a new file >> * output the result of whichever of the three solutions you chose to the >> new file. >> >> both the easy and less-easy scenarios should take, i'm guessing, somewhere >> between twenty and fifty lines of code, and are easy to implement. if you >> have a couple hundred files, i think that the LC solution would be much >> faster and easier to write, test, and run than the drag-and-drop solution. >> >>> On Mon, Apr 4, 2022 at 1:04 PM Keith Clarke via use-livecode < >>> use-livecode@lists.runrev.com <mailto:use-livecode@lists.runrev.com>> wrote: >>> >>> Hi folks, >>> I need to consolidate a couple of hundred CSV files of varying sizes >>> (dozens to hundreds of rows) and column structures (some shared columns but >>> many unique), into a single superset dataset that comprises all data from >>> all files. >>> >>> There are too many files and columns to attempt a manual process, so in >>> I’ve been trying to achieve this with LiveCode - by iterating through the >>> files to parse the column header rows into into row template and column >>> mapping arrays used to build the combined table. However, I'm struggling to >>> both compile the superset and render the results. >>> >>> I feel I may be designing a very complicated solution for a problem that >>> has probably been solved before. I wonder if I’m missing a simpler method - >>> with LiveCode or perhaps with Excel or ‘drag & drop’ into some kind of >>> self-organising database GUI for SQLite, etc? >>> >>> Thanks in advance for any ideas. >>> Best, >>> Keith >>> ___ >>> use-livecode mailing list >>> use-livecode@lists.runrev.com >>> Please visit this url to subscribe, unsubscribe and manage your >>> subscription preferences: >>> http://lists.runrev.com/mailman/listinfo/use-livecode >>> >> >> >> -- >> On the first day, God created the heavens and the Earth >> On the second day, God created the oceans. >> On the third day, God put the animals on hold for a few hours, >> and did a little diving. >> And God said, "This is good." >> ___ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your subscription >> preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode > ___ > use-livecode mailing list > use-livecode@lists.runrev.com <mailto:use-livecode@lists.runrev.com> > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > <http://lists.runrev.com/mailman/listinfo/use-livecode> ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?
Does all the data need to be in a single table? Sent from my iPhone > On Apr 4, 2022, at 10:31, Mike Kerner via use-livecode > wrote: > > keith, > are all the files structured the same way? are they all gathered in the > same place? > LC's big strength, IMHO, is text handling, so you're right in its > wheelhouse. > for the simplest example, let's assume that all the files have all the same > column layout and they're all in the same folder. > in that case, you would > * grab the list of files in the folder > * exclude files that shouldn't be included - maybe filter everything > that doesn't have a .csv suffix. you could do this during the loop, below, > or ahead of time. > * iterate through the files by reading each one, and appending the contents > to a variable/container (if you do this, don't forget to make sure that > when you append each file, the last line ends with a line delimiter) > * create a new file > * save the variable/container to the file > > slightly more complicated: the layouts aren't the same > you can either: > * rearrange the columns upon reading the file or > * use something like an sqlite db and create a record for each row, > assigning the column in each record based on the column name in the csv > file, or > * use a LC array, using the column names in the file as the keys of the > array. > * create a new file > * output the result of whichever of the three solutions you chose to the > new file. > > both the easy and less-easy scenarios should take, i'm guessing, somewhere > between twenty and fifty lines of code, and are easy to implement. if you > have a couple hundred files, i think that the LC solution would be much > faster and easier to write, test, and run than the drag-and-drop solution. > >> On Mon, Apr 4, 2022 at 1:04 PM Keith Clarke via use-livecode < >> use-livecode@lists.runrev.com> wrote: >> >> Hi folks, >> I need to consolidate a couple of hundred CSV files of varying sizes >> (dozens to hundreds of rows) and column structures (some shared columns but >> many unique), into a single superset dataset that comprises all data from >> all files. >> >> There are too many files and columns to attempt a manual process, so in >> I’ve been trying to achieve this with LiveCode - by iterating through the >> files to parse the column header rows into into row template and column >> mapping arrays used to build the combined table. However, I'm struggling to >> both compile the superset and render the results. >> >> I feel I may be designing a very complicated solution for a problem that >> has probably been solved before. I wonder if I’m missing a simpler method - >> with LiveCode or perhaps with Excel or ‘drag & drop’ into some kind of >> self-organising database GUI for SQLite, etc? >> >> Thanks in advance for any ideas. >> Best, >> Keith >> ___ >> use-livecode mailing list >> use-livecode@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your >> subscription preferences: >> http://lists.runrev.com/mailman/listinfo/use-livecode >> > > > -- > On the first day, God created the heavens and the Earth > On the second day, God created the oceans. > On the third day, God put the animals on hold for a few hours, > and did a little diving. > And God said, "This is good." > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your subscription > preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: Tools & techniques for one-off consolidation of multiple 'similar' CSV files?
keith, are all the files structured the same way? are they all gathered in the same place? LC's big strength, IMHO, is text handling, so you're right in its wheelhouse. for the simplest example, let's assume that all the files have all the same column layout and they're all in the same folder. in that case, you would * grab the list of files in the folder * exclude files that shouldn't be included - maybe filter everything that doesn't have a .csv suffix. you could do this during the loop, below, or ahead of time. * iterate through the files by reading each one, and appending the contents to a variable/container (if you do this, don't forget to make sure that when you append each file, the last line ends with a line delimiter) * create a new file * save the variable/container to the file slightly more complicated: the layouts aren't the same you can either: * rearrange the columns upon reading the file or * use something like an sqlite db and create a record for each row, assigning the column in each record based on the column name in the csv file, or * use a LC array, using the column names in the file as the keys of the array. * create a new file * output the result of whichever of the three solutions you chose to the new file. both the easy and less-easy scenarios should take, i'm guessing, somewhere between twenty and fifty lines of code, and are easy to implement. if you have a couple hundred files, i think that the LC solution would be much faster and easier to write, test, and run than the drag-and-drop solution. On Mon, Apr 4, 2022 at 1:04 PM Keith Clarke via use-livecode < use-livecode@lists.runrev.com> wrote: > Hi folks, > I need to consolidate a couple of hundred CSV files of varying sizes > (dozens to hundreds of rows) and column structures (some shared columns but > many unique), into a single superset dataset that comprises all data from > all files. > > There are too many files and columns to attempt a manual process, so in > I’ve been trying to achieve this with LiveCode - by iterating through the > files to parse the column header rows into into row template and column > mapping arrays used to build the combined table. However, I'm struggling to > both compile the superset and render the results. > > I feel I may be designing a very complicated solution for a problem that > has probably been solved before. I wonder if I’m missing a simpler method - > with LiveCode or perhaps with Excel or ‘drag & drop’ into some kind of > self-organising database GUI for SQLite, etc? > > Thanks in advance for any ideas. > Best, > Keith > ___ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > -- On the first day, God created the heavens and the Earth On the second day, God created the oceans. On the third day, God put the animals on hold for a few hours, and did a little diving. And God said, "This is good." ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Tools & techniques for one-off consolidation of multiple 'similar' CSV files?
Hi folks, I need to consolidate a couple of hundred CSV files of varying sizes (dozens to hundreds of rows) and column structures (some shared columns but many unique), into a single superset dataset that comprises all data from all files. There are too many files and columns to attempt a manual process, so in I’ve been trying to achieve this with LiveCode - by iterating through the files to parse the column header rows into into row template and column mapping arrays used to build the combined table. However, I'm struggling to both compile the superset and render the results. I feel I may be designing a very complicated solution for a problem that has probably been solved before. I wonder if I’m missing a simpler method - with LiveCode or perhaps with Excel or ‘drag & drop’ into some kind of self-organising database GUI for SQLite, etc? Thanks in advance for any ideas. Best, Keith ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
CSV Files
OK, so I know CSV files are not everyone's favorite subject but sometimes you have no control over how data reaches you! I have a handler in place that deals with the vagaries of CSV files, like returns, commas, and quotes in the data (did I miss any?) but it relies on a repeat loop looking at every character in the csv file. It seems to work just fine but I'm wondering if the collective knowledge of this list gas already come up with a more efficient solution. Thanks, -- Pete Molly's Revenge http://www.mollysrevenge.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: CSV Files
I have noticed that csv exports from Excel will stop short of including all the columns in any given row if there is no more data, so you will get rows with varying numbers of columns if you have any empty cells at the end of a row. That has tripped me up a few times with other software that expects data to be there, even if it's an empty value. Bob On Mar 28, 2012, at 9:55 AM, Pete wrote: OK, so I know CSV files are not everyone's favorite subject but sometimes you have no control over how data reaches you! I have a handler in place that deals with the vagaries of CSV files, like returns, commas, and quotes in the data (did I miss any?) but it relies on a repeat loop looking at every character in the csv file. It seems to work just fine but I'm wondering if the collective knowledge of this list gas already come up with a more efficient solution. Thanks, -- Pete Molly's Revenge http://www.mollysrevenge.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: CSV Files
...that seems to be the same for Google sheets, too - though I've not dug into whether there is an obvious end-of-line character. Best, Keith.. On 28 Mar 2012, at 18:03, Bob Sneidar wrote: I have noticed that csv exports from Excel will stop short of including all the columns in any given row if there is no more data, so you will get rows with varying numbers of columns if you have any empty cells at the end of a row. That has tripped me up a few times with other software that expects data to be there, even if it's an empty value. Bob On Mar 28, 2012, at 9:55 AM, Pete wrote: OK, so I know CSV files are not everyone's favorite subject but sometimes you have no control over how data reaches you! I have a handler in place that deals with the vagaries of CSV files, like returns, commas, and quotes in the data (did I miss any?) but it relies on a repeat loop looking at every character in the csv file. It seems to work just fine but I'm wondering if the collective knowledge of this list gas already come up with a more efficient solution. Thanks, -- Pete Molly's Revenge http://www.mollysrevenge.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: CSV Files
Now that I think about it, a properly formatted CSV file will quote string values but not numerical values, the obvious reason being, a value may actually contain commas, so they should be quoted to prevent false terminations. Excel will not do that. Maybe that is a good thing, so long as your values do not contain commas. Bob On Mar 28, 2012, at 10:09 AM, Keith Clarke wrote: ...that seems to be the same for Google sheets, too - though I've not dug into whether there is an obvious end-of-line character. Best, Keith.. On 28 Mar 2012, at 18:03, Bob Sneidar wrote: I have noticed that csv exports from Excel will stop short of including all the columns in any given row if there is no more data, so you will get rows with varying numbers of columns if you have any empty cells at the end of a row. That has tripped me up a few times with other software that expects data to be there, even if it's an empty value. Bob On Mar 28, 2012, at 9:55 AM, Pete wrote: OK, so I know CSV files are not everyone's favorite subject but sometimes you have no control over how data reaches you! I have a handler in place that deals with the vagaries of CSV files, like returns, commas, and quotes in the data (did I miss any?) but it relies on a repeat loop looking at every character in the csv file. It seems to work just fine but I'm wondering if the collective knowledge of this list gas already come up with a more efficient solution. Thanks, -- Pete Molly's Revenge http://www.mollysrevenge.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: CSV Files
The csv files are created by an export from a database, not from a spreadsheet so I don't think I need to worry about the Excel/Google issue (thankfully). Non-numeric strings will be in quotes, and the data between the quotes can contain commas and returns, escaped quotes, just about any visible ASCII character in fact. Pete On Wed, Mar 28, 2012 at 10:09 AM, Keith Clarke keith.cla...@clarkeandclarke.co.uk wrote: ...that seems to be the same for Google sheets, too - though I've not dug into whether there is an obvious end-of-line character. Best, Keith.. On 28 Mar 2012, at 18:03, Bob Sneidar wrote: I have noticed that csv exports from Excel will stop short of including all the columns in any given row if there is no more data, so you will get rows with varying numbers of columns if you have any empty cells at the end of a row. That has tripped me up a few times with other software that expects data to be there, even if it's an empty value. Bob On Mar 28, 2012, at 9:55 AM, Pete wrote: OK, so I know CSV files are not everyone's favorite subject but sometimes you have no control over how data reaches you! I have a handler in place that deals with the vagaries of CSV files, like returns, commas, and quotes in the data (did I miss any?) but it relies on a repeat loop looking at every character in the csv file. It seems to work just fine but I'm wondering if the collective knowledge of this list gas already come up with a more efficient solution. Thanks, -- Pete Molly's Revenge http://www.mollysrevenge.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode -- Pete Molly's Revenge http://www.mollysrevenge.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: CSV Files
Pete wrote: OK, so I know CSV files are not everyone's favorite subject but sometimes you have no control over how data reaches you! I have a handler in place that deals with the vagaries of CSV files, like returns, commas, and quotes in the data (did I miss any?) but it relies on a repeat loop looking at every character in the csv file. It seems to work just fine but I'm wondering if the collective knowledge of this list gas already come up with a more efficient solution. The article here outlines some of the many oddities about CSV, and includes a handler from Alex Tweedly which thus far has been the most efficient solution I've found: CSV Must Die A Plea to Introduce Sanity to the Software Development World by Pledging to Never Write CSV Exporters http://www.fourthworld.com/embassy/articles/csv-must-die.html -- Richard Gaskin Fourth World LiveCode training and consulting: http://www.fourthworld.com Webzine for LiveCode developers: http://www.LiveCodeJournal.com LiveCode Journal blog: http://LiveCodejournal.com/blog.irv ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode
Re: CSV Files
Thanks Richard. I'm happy to say my handler parsed the extended test csv lines with flying colors. I'll try it along Alex's with some large datasets to see if there's any significant performance difference. By conicidence, I want the output form my handler to an array and I see Alex's does createw an array at one point so there may well be some benfits to switching over to his routine. Pete On Wed, Mar 28, 2012 at 2:10 PM, Richard Gaskin ambassa...@fourthworld.comwrote: Pete wrote: OK, so I know CSV files are not everyone's favorite subject but sometimes you have no control over how data reaches you! I have a handler in place that deals with the vagaries of CSV files, like returns, commas, and quotes in the data (did I miss any?) but it relies on a repeat loop looking at every character in the csv file. It seems to work just fine but I'm wondering if the collective knowledge of this list gas already come up with a more efficient solution. The article here outlines some of the many oddities about CSV, and includes a handler from Alex Tweedly which thus far has been the most efficient solution I've found: CSV Must Die A Plea to Introduce Sanity to the Software Development World by Pledging to Never Write CSV Exporters http://www.fourthworld.com/**embassy/articles/csv-must-die.**htmlhttp://www.fourthworld.com/embassy/articles/csv-must-die.html -- Richard Gaskin Fourth World LiveCode training and consulting: http://www.fourthworld.com Webzine for LiveCode developers: http://www.LiveCodeJournal.com LiveCode Journal blog: http://LiveCodejournal.com/**blog.irvhttp://LiveCodejournal.com/blog.irv __**_ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/**mailman/listinfo/use-livecodehttp://lists.runrev.com/mailman/listinfo/use-livecode -- Pete Molly's Revenge http://www.mollysrevenge.com ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode