Hi James - thanks for the suggestion. The docx package and API is certainly
a lot more explicit. 

I won't have a need to create docx documents, but (see later) the security
level set by VSTO for different Word versions may be a consideration. 

I've got over the practical problem, but I would like to dispel some of my
ignorance of the Word object model and understand it a little more, before
going further into a VSTO solution. 

The documents are public data, and were created between 2008 and 2010, are
all .doc, and can be opened with Word 2003. When opening any of these
documents, they have a macro code file (protected), but no document password
protection. 

When I open them, I see just a Section Break for the "not-protected"
documents - 



For the docs with a protected section, the Section Break is displayed like
this - 



However, identifying which documents have a protected Section isn't
relevant, I have found.

Using "Ask Cindy" (Cindy Meister, Word MVP and moderator on one of the Word
forums) last night, I was alerted to the fact that the protected section can
be copied, so that jumps over the problem of identifying protected sections.
I can just ignore that, and parse the text data within my application. 

But I'm curious to understand why documents which have one Section
protected, and those that have nothing protected, both show Section #1 with
its .ProtectedForForms property True. 

I'm guessing that this is due to a Macro that is in every document. On
(manually) opening with Word 2003, if I "Disable Macros I can see a
(password-protected) Macro "autoopen" in what I had called the "unprotected"
documents, but if I "Enable Macros" then I see the "End of Protected
Section" adornment on the Section Break (second pic, above). 

For the "protected" documents (Section 1 showing "End of Protected Section"
whichever security mode is chosen), the Tools>Macros>Macros (Alt-F8) menu
choice is greyed. 

So I suspect that the explanation is that I need to explicitly set the
security level for opening Word docs in my code - ie, by default it must be
Low (whereas for testing by manually opening the docs in the installed Word
2003, I have it set to Medium). If the security is LOW then the code would
detect Section #1 as a protected Section.

I am using the Word 11.0 interop currently, and I'm wondering if the
"security setting" (?) is more rigorous by default in Word 12.0 and 14.0.

  _____  

Ian Thomas
Victoria Park, Western Australia

  _____  

From: [email protected] [mailto:[email protected]]
On Behalf Of James Chapman-Smith
Sent: Tuesday, May 31, 2011 10:29 AM
To: ozDotNet
Subject: RE: Word VSTO question

 

Hi Ian,

 

I couldn't see a difference in the file format for protected or
non-protected documents. I got "Microsoft Word 97-2003 Document" for `.doc`
and "Microsoft Word Document" for `.docx` though. Is what you're seeing
based on the file extension or definitely on the protection status?

 

Assuming that you can't tell without opening the files, here's what I'd do.

 

Using a machine with Word 2007 or 2010 on it, I would use VSTO to run
through each of the 50,000+ documents and convert all `.doc` format files to
`.docx` (in a temporary folder, of course) and then use
`System.IO.Packaging` to open each file and look at the `
~\word\settings.xml` stream within the file and see if it contains a
`<w:documentProtection />` node (or similar).

 

Would that work for you?

 

Cheers.

 

James.

 

From: [email protected] [mailto:[email protected]]
On Behalf Of Ian Thomas
Sent: Monday, 30 May 2011 22:13
To: [email protected]; 'ozDotNet'
Subject: Word VSTO question

 

 

I have 50000+ short Word documents, a proportion of which have a small
protected section. As a first pass, I need to identify which of the files
have a protected section. Can anyone help me with how to do that? 

On the basis of a sample of one of each, the Word file format is "Microsoft
Word 97-2003 Document" for the files without a protected section, and
"Microsoft Word Document" for those that do have a protected section. (the
machine I inspected these with has only Office 2003 installed). 

  _____  

Ian Thomas
Victoria Park, Western Australia

<<image003.jpg>>

<<image004.jpg>>

Reply via email to