On 12/13/2012 02:49 PM, Jim Giner wrote:
Thanks for all the posts. After reading and googling all afternoon, I
think the best approach for me is:

Create two macros in Word (done!) to export each of my .doc files to
.txt and .pdf formats.

Create a sql table to hold the .txt contents of my .doc files, along
with a reference to the meeting date and the name of the corresponding
.pdf file.

Upload my two sets of files with an ftp client and then use a script to
load the table with my .txt file data.

Now I just need a couple of scripts to allow a user to locate a file and
bring up the pdf for when he wants to read about a meeting. And a second
script to accept user input (search words) and perform a query against
the textual data and present some kind of results - probably a listing
containing a reference to the meeting date and a tbd-length string
showing the matching result for each occurrence, ie, something like n
chars in front of and after the match so the user can see the context of
the match.

Sizes - a 28k .doc file grows to 142kb in .pdf format and is only 5kb in
.txt format. (actually, if I 'print' the .doc as a pdf instead of using
the Word's "File,Save as", the resulting pdf is only 70kb. Might need a
new macro!)

Thanks again!

I wrote this script a few years ago that extracted the plain text out of the .doc file.


if you look in the directory you will see a few example files.

You can view them like this.


replace test_building.doc with any of the other .doc files from the dir listing to see its contents.

I currently have it set to 64bit width rows. Show you some nice pattern stuff with the MS Word format.

I have the source file viewable for the convert.php script as well.


I have thought about extending this even further to figure out the layout and test formatting. But it hasn't gotten much attention for quite some time now.

Hope it helps.

Jim Lucas


PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to