It seems then correct to assume that Word inserts some information - metadata most likely - into the .doc file that it uses to recognise that a file is a mail merge master? Whilst I am in no way an expert, I would doubt very much that HWPF will be able to read this and recognise it. What it can do however is expose both the DocumentProperties and it's FileInformationBlock - I have capitalised these names as there are classes of that name in the hwpf.nodel package. As a first step, it might be worth creating two files - one a merge master, the other an ordinary Word document - and seeing how the properties anf ile information differs. Just as an aside, I have had a quick look at the javadoc and there are two get and two set methods in the DocumentProperties class that have the word 'merge' in their names. I do not know what they are for but it could be a good starting point.
With regard the the WordExtractor class, you need to access the scratchpad as David suggests to get your hands on that. Finally, I have been thinking about my previous reply and I neglected to mention that I was making a HUGE assumption. I assumed that the bookmarks would be recognised as text also, not simply some special series of control characters. That being the case, it should be possible to recover them from the document and perform the sorts of comparisons you need to undertake. That ssumption will be very easy for you to test once you get your hands on HWPF - simply run the WordExtractor class against a mail merge document and see what the class returns. Even if that class does not give you just what you want, you can still inspect the document further as there are other sorts of objects bound up within the document. Christian Gosch-2 wrote: > > Hello, MSB [markbrdsly], > > to answer the last one first: I do not know if there is any useful > internal / technical difference, but in fact Word itself does recognize > that: If you open a document prepared as mail merge master file, Word > knows that it is one, and e. g. display the mail merge ribbon / toolbar. > > The first one should not be possible without the second one (or returns > the answer to the other question intrinsically): If there are mail merge > fields inside a document, usually it is supposed to be a mail merge > master document. (To be honest: I do not know how this kind of doc is > officially called in English -- in German it as called > "Seriendruck-Hauptdokument".) > > By the way: When / with which version was the method in question > introduced? Currently I use POI 3.2-FINAL-20081019 and cannot find any > hwpf package or WordExtractor class, and I'm stuck to JDK 1.4.2 due to > IBM WebSphere 6.0 as runtime... > > Thanks anyway, > Christian > >> -----Original Message----- >> From: MSB [mailto:[email protected]] >> Sent: Thursday, February 26, 2009 6:00 PM >> To: [email protected] >> Subject: Re: Q: How to check if a Word .doc file is a mail merge > master >> file? >> >> >> Hello Christian, >> >> I would guess that the answer to your second question is yes. It is >> possible >> to use HWPF to extract the data from a Word document - in fact Nick > has >> built a class that does just this and it is called WordExtractor I > think. >> It >> returns an array of Strings if I remember correctly and it would not > be >> too >> difficult to imagine that you could check the complete set of values >> returned and if - only if - that complete set was limited to your > 'table >> structure' (if I understand that correctly) then the document would > pass >> your validation test. >> >> To answer your first question, I need to ask another one; what set or >> criteria distinguish a mail merge master file from any other document > or >> document template that could be created using Word? If you are able to >> formulate such a list then it would be possible to determine if HWPF > could >> be used to parse the Word file and determine it's status. >> >> >> Christian Gosch-2 wrote: >> > >> > Is it possible using POI to check if a given Word *.doc file >> > (Word2K/2003) is a Mail Merge master file? >> > >> > Is it then possible to retrieve or find by inspection the mail merge >> > data field references used in the mail merge master file? >> > >> > We do not need to change anything, we just want to check if a given > file >> > is a valid mail merge master and matches a given and known "table >> > structure", i. e. uses only a given set of mail merge data field >> > references. (validation) >> > >> > Up to now, our validation just checks the file extension and does > not >> > execute any introspection. >> > >> > Thanks for answers, >> > -- >> > Dipl.-Inform. Christian Gosch, PMI PMP >> > Systems Architecture, Project Management >> > >> > inovex GmbH >> > Büro Pforzheim >> > Karlsruher Strasse 71 >> > D-75179 Pforzheim >> > Tel: +49 (0)7231 3191-85 >> > Fax: +49 (0)7231 3191-91 >> > [email protected] >> > www.inovex.de >> > >> > Sitz der Gesellschaft: Pforzheim >> > AG Mannheim, HRB 502126 >> > Geschäftsführer: Stephan Müller >> > >> > >> > >> > > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: [email protected] >> > For additional commands, e-mail: [email protected] >> > >> > >> > >> >> -- >> View this message in context: > http://www.nabble.com/Q%3A-How-to-check-if- >> a-Word-.doc-file-is-a-mail-merge-master-file--tp22220571p22228552.html >> Sent from the POI - User mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> >> !DSPAM:49a6ca9e326666883415967! >> >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > > -- View this message in context: http://www.nabble.com/Q%3A-How-to-check-if-a-Word-.doc-file-is-a-mail-merge-master-file--tp22220571p22241109.html Sent from the POI - User mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
