On Wed, 2005-03-30 at 00:32, Dean Hopstein wrote: > I've been using antiword on linux for some time with great success if > you're only needing to maintain basic format structures into HTML.
Yes. I suspect that when converting a large corpus of data like that, one would prefer to convert the hidden fields in the Word files into metadata along the lines of the Dublin Core or something more specialised and particular to the establishment, if it is in XML. I may be wrong, but I can imagine someone saying "give me all the documents from 1999 written by Dr Smith" (of course, whether Dr Smith's letters are actually identified by any computable field in a Word document is another matter. Handling correspondence, and moving correspondence from one place to another after the patient, is a topic worth considering, and one that a FLOSS approach to may be viable.
