David Coker wrote:

> Hi Richard!
>>If you have it in Excel, can you export it using tab-delimited?
>
> Yes sir, been there and tried that several different ways.
> Unfortunately, that creates issues of a different sort. As a really
> poor example, this is one of the things I continue to run across
> after converting to tabbed format..
>
> Original example:
> blah blah,"doodah somestupidgarbagecharacterorlinefeed doodah
> doodah",12345abc
>
> Converted to tab delimited:
> blah blah(tab)"doodah somestupidgarbagecharacterorlinefeed doodah
> doodah"(tab)12345abc
>
> When saved, tab delimited format all too often renders something like
> this:
>
> blah blah,"doodah somestupidgarbagecharacter
> doodah
> doodah"(tab)12345abc
>
> Gone is any hope of reusing the file data.

I wonder if it might be worth doing a replace on somestupidgarbagecharacterorlinefeed to something like "_mydumbplaceholderthang_" (or any arbitrary string unlikely to appear in the data), then do your parsing and as the last step replace your placeholder with the linefeed char again.

For parsing WebMerge templates I use placeholders a lot as a convenient way to get odd characters and strings out of the way so I can do the heavy work, putting them back when needed.


> That was actually the whole point of my original question about
> importing csv into a database. At that point I could likely pull
> the data out field by field as required and run scanners to clean
> it up enough to be used in a tab delimited format. At that point,
> it would be easy to work with using Rev in any number of ways.

If the final destination is one of the more common DBMSes out there like SQLite or MySQL, there's got to be a CSV import filter available for them, no?


> BTW, I used your Webmerge program last night for the first time in
> a pretty long while... As part of a test run I was doing, it created
> 81,000+ html pages in just over 90 minutes. :-)

If it took that long I can tell you most of the processing time was spent parsing the CSV.

Internally, WM uses the same format as FileMaker's Merge format, tab-delimited without added quotes, escaping tabs in values with ASCII 4 and returns in values with ASCII 11. All supported formats (CSV, pipes, Merge, etc.) get translated to that internal format so the actual template processing can be standardized and fairly well optimized.

If your data was in such a format to begin with, or even using the most common pipe- or tab-delimited schemes which don't add quotes and use a standard escape sequence for returns, it would complete those 81,000 pages in just a few minutes.

Here's an example from our Gallery page:

   "On my first use of the full program yesterday, WebMerge generated
    4.5MB of clean, error free HTML in less than 9 seconds.

I have one customer who cranks out more than 300,000 pages at a time in well under an hour, and his templates are fairly complex.

With most templates the processing time after parsing the data file is a fraction of a second per page.

For example, the tutorial set included with the demo generally finishes its 20 pages in well under a second. In fact, when I first made WM I set up the results dialog to show time spent in minutes, and that was too long so I added seconds, but even that was too long so I had to go back and revise it to be able to show elapsed time in milliseconds. :)

And this is the slow version. I originally set up WebMerge to use a template syntax that mimic's FileMaker's but as time goes on our customer base now includes very few people for whom familiarity with FMP matters, so in a future version we'll be able to use an alternate template syntax that lets us move most of the processing directly into the Rev engine with the merge function, similar to how on-rev works.

Compared with the careful parsing of the FMP-style tags we do now, this change will drop per-page processing times to a few milliseconds on average, and for simpler templates even less.

SuperCard's merge function was one of the best things ever added to the Rev engine. For all its convenient power, until on-rev expanded and popularized it it was one of the most under-utilized powerhouses in the language.

--
 Richard Gaskin
 Fourth World
 Rev training and consulting: http://www.fourthworld.com
 Webzine for Rev developers: http://www.revjournal.com
 revJournal blog: http://revjournal.com/blog.irv
_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to