It seems then correct to assume that Word inserts some information - metadata
most likely - into the .doc file that it uses to recognise that a file is a
mail merge master? Whilst I am in no way an expert, I would doubt very much
that HWPF will be able to read this and recognise it. What it can do however
is expose both the DocumentProperties and it's FileInformationBlock - I have
capitalised these names as there are classes of that name in the hwpf.nodel
package. As a first step, it might be worth creating two files - one a merge
master, the other an ordinary Word document - and seeing how the properties
anf ile information differs. Just as an aside, I have had a quick look at
the javadoc and there are two get and two set methods in the
DocumentProperties class that have the word 'merge' in their names. I do not
know what they are for but it could be a good starting point.

With regard the the WordExtractor class, you need to access the scratchpad
as David suggests to get your hands on that.

Finally, I have been thinking about my previous reply and I neglected to
mention that I was making a HUGE assumption. I assumed that the bookmarks
would be recognised as text also, not simply some special series of control
characters. That being the case, it should be possible to recover them from
the document and perform the sorts of comparisons you need to undertake.
That ssumption will be very easy for you to test once you get your hands on
HWPF - simply run the WordExtractor class against a mail merge document and
see what the class returns. Even if that class does not give you just what
you want, you can still inspect the document further as there are other
sorts of objects bound up within the document.


Christian Gosch-2 wrote:
> 
> Hello, MSB [markbrdsly],
> 
> to answer the last one first: I do not know if there is any useful 
> internal / technical difference, but in fact Word itself does recognize 
> that: If you open a document prepared as mail merge master file, Word 
> knows that it is one, and e. g. display the mail merge ribbon / toolbar.
> 
> The first one should not be possible without the second one (or returns 
> the answer to the other question intrinsically): If there are mail merge 
> fields inside a document, usually it is supposed to be a mail merge 
> master document. (To be honest: I do not know how this kind of doc is 
> officially called in English -- in German it as called 
> "Seriendruck-Hauptdokument".)
> 
> By the way: When / with which version was the method in question 
> introduced? Currently I use POI 3.2-FINAL-20081019 and cannot find any 
> hwpf package or WordExtractor class, and I'm stuck to JDK 1.4.2 due to 
> IBM WebSphere 6.0 as runtime...
> 
> Thanks anyway,
> Christian
> 
>> -----Original Message-----
>> From: MSB [mailto:[email protected]]
>> Sent: Thursday, February 26, 2009 6:00 PM
>> To: [email protected]
>> Subject: Re: Q: How to check if a Word .doc file is a mail merge 
> master
>> file?
>> 
>> 
>> Hello Christian,
>> 
>> I would guess that the answer to your second question is yes. It is
>> possible
>> to use HWPF to extract the data from a Word document - in fact Nick 
> has
>> built a class that does just this and it is called WordExtractor I 
> think.
>> It
>> returns an array of Strings if I remember correctly and it would not 
> be
>> too
>> difficult to imagine that you could check the complete set of values
>> returned and if - only if - that complete set was limited to your 
> 'table
>> structure' (if I understand that correctly) then the document would 
> pass
>> your validation test.
>> 
>> To answer your first question, I need to ask another one; what set or
>> criteria distinguish a mail merge master file from any other document 
> or
>> document template that could be created using Word? If you are able to
>> formulate such a list then it would be possible to determine if HWPF 
> could
>> be used to parse the Word file and determine it's status.
>> 
>> 
>> Christian Gosch-2 wrote:
>> >
>> > Is it possible using POI to check if a given Word *.doc file
>> > (Word2K/2003) is a Mail Merge master file?
>> >
>> > Is it then possible to retrieve or find by inspection the mail merge
>> > data field references used in the mail merge master file?
>> >
>> > We do not need to change anything, we just want to check if a given 
> file
>> > is a valid mail merge master and matches a given and known "table
>> > structure", i. e. uses only a given set of mail merge data field
>> > references. (validation)
>> >
>> > Up to now, our validation just checks the file extension and does 
> not
>> > execute any introspection.
>> >
>> > Thanks for answers,
>> > --
>> > Dipl.-Inform. Christian Gosch, PMI PMP
>> > Systems Architecture, Project Management
>> >
>> > inovex GmbH
>> > Büro Pforzheim
>> > Karlsruher Strasse 71
>> > D-75179 Pforzheim
>> > Tel: +49 (0)7231 3191-85
>> > Fax: +49 (0)7231 3191-91
>> > [email protected]
>> > www.inovex.de
>> >
>> > Sitz der Gesellschaft: Pforzheim
>> > AG Mannheim, HRB 502126
>> > Geschäftsführer: Stephan Müller
>> >
>> >
>> >
>> > 
> ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected]
>> > For additional commands, e-mail: [email protected]
>> >
>> >
>> >
>> 
>> --
>> View this message in context: 
> http://www.nabble.com/Q%3A-How-to-check-if-
>> a-Word-.doc-file-is-a-mail-merge-master-file--tp22220571p22228552.html
>> Sent from the POI - User mailing list archive at Nabble.com.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
>> 
>> !DSPAM:49a6ca9e326666883415967!
>> 
>> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Q%3A-How-to-check-if-a-Word-.doc-file-is-a-mail-merge-master-file--tp22220571p22241109.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to