I would try, looping through the files, using <@tokenize> to split data 
into fields.  Then write (append) as tab separated values to a new file.  

Import the new file into excel to see what it looks like.  If there's 
only a couple of problems, it's faster to clean it here.  If there's a 
bunch of problems, elaborate on the tokenizing.

Then take the cleaned up tab separated file directly into mysql (I use 
Webmin for that purpose).


>How would I go about parsing a file in to clean variables?
>
>about a year ago, I created an unthreaded forum. Not wanting to put it 
>on FMP, and witango not ready for Mac, I made it file-based. Each 
>submission writes a file with the guts of the message in it and then a 
>link is created to a .tml that sucks up the guts as an include into a 
>formatted page.
>
>http://www.patricknagel.com/forum/forum.taf
>
>OK, it's ugly, but it worked.
>
>Someone reminded me that when mysql and witango talked, I'd redo that 
>as a database app so it can be searched, etc.
>
>Yesterday, I did that, and now have a challenge. Getting all those 
>messages into the database. I can copy/paste about 1000 files, which 
>will take forever and be boring as hell. Then, I might come up with a 
>way to read and parse these messages and auto-submit.
>
>Since the files are serial numbered: 101.txt, 102.txt, etc.
>
>I was thinking of a taf that looped through a number sequence, read 
>each file, parsed it, turned the pieces into variables, and then 
>submitted to database. The question is how to parse it.
>
>Here is a typical file:
>
><H1>I have two Nagels</H1>
><H2>Roni</H2>
><p><A HREF="mailto:[EMAIL PROTECTED]">[EMAIL PROTECTED]</A></p>
><h2>07/23/03</H2>
><p>I have two Nagel prints with CERT. OF AUTHENTICITY....   but, they 
>don't say a name on them..  HOW DO I FIND OUT WHAT THEY ARE WORTH?  HOW 
>DO I LOOK THEM UP..?   THE CERTIFICATE NUMBERS ARE B4288G &  B101250..  
>I WOULD REALLY APPRECIATE anyone's assistance with this.   
>[EMAIL PROTECTED]</p>
><center>
><form name="reply" method="post" 
>action="/forum/forum.taf?_function=reply">
>   <input type="hidden" name="subject" value="I have two Nagels">
>   <input type="hidden" name="author" value="Roni">
>   <input type="hidden" name="comment" value="I have two Nagel prints 
>with CERT. OF AUTHENTICITY....   but, they don't say a name on them..  
>HOW DO I FIND OUT WHAT THEY ARE WORTH?  HOW DO I LOOK THEM UP..?   THE 
>CERTIFICATE NUMBERS ARE B4288G &  B101250..  I WOULD REALLY APPRECIATE 
>anyone's assistance with this.   [EMAIL PROTECTED]">
>   <input type="submit" name="Submit" value="Respond to this message">
></form>
></center>
>
>Everything past the first <center> is unnecessary. Got an idea how best 
>to do that?
>Thanks for any hints.
>
>RAD
>________________________________________________________________________
>TO UNSUBSCRIBE: Go to http://www.witango.com/maillist.taf


Bill Conlon

To the Point
345 California Avenue Suite 2
Palo Alto, CA 94306

office: 650.327.2175
fax:    650.329.8335
mobile: 650.906.9929
e-mail: mailto:[EMAIL PROTECTED]
web:    http://www.tothept.com


________________________________________________________________________
TO UNSUBSCRIBE: Go to http://www.witango.com/maillist.taf

Reply via email to