We parse electronic documents all the time here and we use Witango. If your headings and footings are similar, for examples our are the same except for page number, department number and date, you can find them easily, take any information you want out of them (for example, our reports always prefix the dept number with Dept:), and then remove them completely. For each line you want, tokenize on CRLF giving you all lines for a page. After that, transpose the array, go through each line in a rows statement and tokenize on blanks to pull out specific columns. Since most reports are column oriented, and most have the same data in the same columns on a page, it's then easy to build a database from the extracted data.
Do this for every page and you have all of your report in a searchable database. We've been doing this for over a year with an electronic report from corporate and it works quite well. -----Original Message----- From: Chuck Lockwood [mailto:[EMAIL PROTECTED] Sent: Friday, February 27, 2004 10:17 AM To: [EMAIL PROTECTED] Subject: RE: Witango-Talk: [OT] Extract data from report I highly recommend a product called Monarch whenever you need to extract data from text files. It makes it quick and easy. Conbine it with data from other sources as well. http://www.datawatch.com ________________________________________________________________________ TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf
