Hi Ian, I couldn't see a difference in the file format for protected or non-protected documents. I got "Microsoft Word 97-2003 Document" for `.doc` and "Microsoft Word Document" for `.docx` though. Is what you're seeing based on the file extension or definitely on the protection status?
Assuming that you can't tell without opening the files, here's what I'd do. Using a machine with Word 2007 or 2010 on it, I would use VSTO to run through each of the 50,000+ documents and convert all `.doc` format files to `.docx` (in a temporary folder, of course) and then use `System.IO.Packaging` to open each file and look at the ` ~\word\settings.xml` stream within the file and see if it contains a `<w:documentProtection />` node (or similar). Would that work for you? Cheers. James. From: [email protected] [mailto:[email protected]] On Behalf Of Ian Thomas Sent: Monday, 30 May 2011 22:13 To: [email protected]; 'ozDotNet' Subject: Word VSTO question I have 50000+ short Word documents, a proportion of which have a small protected section. As a first pass, I need to identify which of the files have a protected section. Can anyone help me with how to do that? On the basis of a sample of one of each, the Word file format is "Microsoft Word 97-2003 Document" for the files without a protected section, and "Microsoft Word Document" for those that do have a protected section. (the machine I inspected these with has only Office 2003 installed). ________________________________ Ian Thomas Victoria Park, Western Australia
