Hi James - thanks for the suggestion. The docx package and API is certainly a lot more explicit.
I won't have a need to create docx documents, but (see later) the security level set by VSTO for different Word versions may be a consideration. I've got over the practical problem, but I would like to dispel some of my ignorance of the Word object model and understand it a little more, before going further into a VSTO solution. The documents are public data, and were created between 2008 and 2010, are all .doc, and can be opened with Word 2003. When opening any of these documents, they have a macro code file (protected), but no document password protection. When I open them, I see just a Section Break for the "not-protected" documents - For the docs with a protected section, the Section Break is displayed like this - However, identifying which documents have a protected Section isn't relevant, I have found. Using "Ask Cindy" (Cindy Meister, Word MVP and moderator on one of the Word forums) last night, I was alerted to the fact that the protected section can be copied, so that jumps over the problem of identifying protected sections. I can just ignore that, and parse the text data within my application. But I'm curious to understand why documents which have one Section protected, and those that have nothing protected, both show Section #1 with its .ProtectedForForms property True. I'm guessing that this is due to a Macro that is in every document. On (manually) opening with Word 2003, if I "Disable Macros I can see a (password-protected) Macro "autoopen" in what I had called the "unprotected" documents, but if I "Enable Macros" then I see the "End of Protected Section" adornment on the Section Break (second pic, above). For the "protected" documents (Section 1 showing "End of Protected Section" whichever security mode is chosen), the Tools>Macros>Macros (Alt-F8) menu choice is greyed. So I suspect that the explanation is that I need to explicitly set the security level for opening Word docs in my code - ie, by default it must be Low (whereas for testing by manually opening the docs in the installed Word 2003, I have it set to Medium). If the security is LOW then the code would detect Section #1 as a protected Section. I am using the Word 11.0 interop currently, and I'm wondering if the "security setting" (?) is more rigorous by default in Word 12.0 and 14.0. _____ Ian Thomas Victoria Park, Western Australia _____ From: [email protected] [mailto:[email protected]] On Behalf Of James Chapman-Smith Sent: Tuesday, May 31, 2011 10:29 AM To: ozDotNet Subject: RE: Word VSTO question Hi Ian, I couldn't see a difference in the file format for protected or non-protected documents. I got "Microsoft Word 97-2003 Document" for `.doc` and "Microsoft Word Document" for `.docx` though. Is what you're seeing based on the file extension or definitely on the protection status? Assuming that you can't tell without opening the files, here's what I'd do. Using a machine with Word 2007 or 2010 on it, I would use VSTO to run through each of the 50,000+ documents and convert all `.doc` format files to `.docx` (in a temporary folder, of course) and then use `System.IO.Packaging` to open each file and look at the ` ~\word\settings.xml` stream within the file and see if it contains a `<w:documentProtection />` node (or similar). Would that work for you? Cheers. James. From: [email protected] [mailto:[email protected]] On Behalf Of Ian Thomas Sent: Monday, 30 May 2011 22:13 To: [email protected]; 'ozDotNet' Subject: Word VSTO question I have 50000+ short Word documents, a proportion of which have a small protected section. As a first pass, I need to identify which of the files have a protected section. Can anyone help me with how to do that? On the basis of a sample of one of each, the Word file format is "Microsoft Word 97-2003 Document" for the files without a protected section, and "Microsoft Word Document" for those that do have a protected section. (the machine I inspected these with has only Office 2003 installed). _____ Ian Thomas Victoria Park, Western Australia
<<image003.jpg>>
<<image004.jpg>>
