Hi Mark,
I could not achieve it, I think in giving up and I am going to try it
with OpenOffice API, using UNO. Thanks for your effort and answers.
Yours, Fabi.
Fabián Avilés Martínez
Área de desarrollo de software /
Software Development Area
GMV SOLUCIONES
GLOBALES INTERNET, S.A.
Avda. Américo Vespucio
Edificio Cartuja, Bloque E, 1ª Pta.
E-41092 Sevilla
Tel. +34 95 408 80 60
Fax +34 95 408 12 33
www.gmv.com
www.gmv-sgi.com
Antes de imprimir este mensaje, asegúrate de que es necesario. Proteger el
medio ambiente está también en tu mano
-----Mensaje original-----
De: MSB [mailto:[email protected]]
Enviado el: martes, 24 de noviembre de 2009 16:51
Para: [email protected]
Asunto: RE: Modify word document
I have had the chance to play around with some code and I have to admit that
I was wrong, on two counts.
Firstly, if you do drill down to the level of the CharacterRun and perform a
replacement operation there, you will not retain the formatting applied to
the text, further more, it seems to fail completely; no replacements will be
made in the document at all. To have the search term be successfully
replaced, you DO need to operate at the Pargraph level.
Secondly, if the search term is shorter than the replacement term, then HWPF
will throw an exception. It seems quite happy to work if the replacement
term is equal to or longer - in terms of the number of characters - than the
search term.
Please see the code I have attached below;
/* ====================================================================
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==================================================================== */
package newsearchreplace;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.HashMap;
import java.util.Set;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.CharacterRun;
/**
*
* @author win Mark Beardsley [msb at apache.org]
* @version 1.00
*/
public class SearchReplace {
private HashMap<String, String> searchTerms = null;
private Set<String> searchKeys = null;
private HWPFDocument wordDocument = null;
public SearchReplace() {
searchTerms = new HashMap<String, String>();
// The first String is the text that will be searched for, the
second is what will be used to
// replace it. Of course, it is possible to create more than one
search term, replacement text
// pairing.
searchTerms.put("replace", "tester");
searchKeys = searchTerms.keySet();
}
public void openTemplate(String filename) throws FileNotFoundException,
IOException {
File file = null;
FileInputStream fis = null;
try {
file = new File(filename);
fis = new FileInputStream(file);
this.wordDocument = new HWPFDocument(fis);
}
finally {
if(fis != null) {
try {
fis.close();
fis = null;
}
catch(Exception ex) {
// I G N O R E
}
}
}
}
public void searchAndReplace() {
Range docRange = this.wordDocument.getRange();
int numParas = docRange.numParagraphs();
for(int i = 0; i < numParas; i++) {
Paragraph para = docRange.getParagraph(i);
int numCharRuns = para.numCharacterRuns();
for(int j = 0; j < numCharRuns; j++) {
CharacterRun charRun = para.getCharacterRun(j);
String text = charRun.text();
for(String key : this.searchKeys) {
if(text.contains(key)) {
String replacementTerm = this.searchTerms.get(key);
charRun.replaceText(replacementTerm, key);
System.out.println("Found: " + key + " in " + text +
". Will replace with: " + replacementTerm);
}
}
}
}
}
public void searchReplace() {
Range docRange = this.wordDocument.getRange();
int numParas = docRange.numParagraphs();
for(int i = 0; i < numParas; i++) {
Paragraph para = docRange.getParagraph(i);
String text = para.text();
for(String key : this.searchKeys) {
if(text.contains(key)) {
String replacementTerm = this.searchTerms.get(key);
para.replaceText(key, replacementTerm);
}
}
}
}
public void saveResults(String filename) throws FileNotFoundException,
IOException {
File file = null;
FileOutputStream fos = null;
try {
file = new File(filename);
fos = new FileOutputStream(file);
this.wordDocument.write(fos);
}
finally {
if(fos != null) {
try {
fos.close();
fos = null;
}
catch(Exception ex) {
// I G N O R E
}
}
}
}
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
try {
SearchReplace sr = new SearchReplace();
sr.openTemplate("C:/temp/Test Document.doc");
sr.searchAndReplace();
//sr.searchReplace();
sr.saveResults("C:/temp/New Updated Document.doc");
}
catch(Exception ex) {
System.out.println("Caught an: " + ex.getClass().getName());
System.out.println("Message: " + ex.getMessage());
System.out.println("Stacktrace follows............");
ex.printStackTrace(System.out);
}
}
}
More particularly, look at the main method. If you comment out the
sr.searchAndReplace() and un-comment the sr.searchReplace() line, then the
code will work successfully. But, and this is a BIG but, it will only work
if you compile and run it against 3.2 FINAL of the API. I have found that
later versions seem to 'drop' or lose the formatting information completely;
to convince yourself of this, just modify the main method so that it
contains only these lines of code;
SearchReplace sr = new SearchReplace();
sr.openTemplate("C:/temp/Test Document.doc");
sr.saveResults("C:/temp/New Updated Document.doc");
If you run that against versions later than 3.2 FINAL, you should see that
the copy of the original document that this produces loses all of it's
formatting.
Yours
Mark B
PS. I guess that it should go without saying, you will need to replace the
paths to and document names passed to the openTemplate() and saveResults()
methods to point to locations and files that exist on your machine.
PPS Forgive the lack of comments please. I hope that the it is apparant just
what the methods do.
Fabián Avilés Martínez wrote:
>
> Hi, as I told you, I have tried it, but with the same result, the
> resulting file is corrupted, that is what MSWord says. My next approach is
> to create a copy file, and do modifications within this file. My problem
> is that I do not know how to save modifications done in the charRuns of
> the paragraphs, what I mean is to persist modifications done in the
> resulting file, without have to coopy it, calling
> document.write(outputStream)
>
> My code is:
>
> public File processFile(final InputStream is, final Map<String, String>
> replacementText) throws IOException {
> Set<String> keys = replacementText.keySet();
> try {
> // Makes a copy of the file.
> File res = copyfile(is);
> InputStream auxIs = new FileInputStream(res);
> POIFSFileSystem poifs = new POIFSFileSystem(auxIs);
> HWPFDocument document = new HWPFDocument(poifs);
> Range range = document.getRange();
>
> for (int i = 0; i < range.numParagraphs(); i++) {
> Paragraph paragraph = range.getParagraph(i);
> int numCharRuns = paragraph.numCharacterRuns();
> for (int j = 0; j < numCharRuns; j++) {
> CharacterRun charRun = paragraph.getCharacterRun(j);
> for (Iterator<String> it = keys.iterator();
> it.hasNext();) {
> String key = it.next();
> if (charRun.text().contains(key)) {
> String value = replacementText.get(key);
> charRun.replaceText(key, value);
> range = document.getRange();
> paragraph = range.getParagraph(i);
> charRun = paragraph.getCharacterRun(j);
> }
> }
> }
> }
> is.close();
> return res;
> } catch (IOException e) {
> logger.error("Error procesando el fichero WORD: " + e);
> throw new IOException("Error procesando el fichero WORD");
> } finally {
> if (is != null) {
> is.close();
> }
> }
> }
>
>
> Thanks in advance, Fabi.
>
> -----Mensaje original-----
> De: MSB [mailto:[email protected]]
> Enviado el: martes, 24 de noviembre de 2009 8:43
> Para: [email protected]
> Asunto: Re: Modify word document
>
>
> You have not dug down far enough into the structure of the document yet I
> am
> afraid - all of the formatting information is stopred (encapsulated)
> within
> the CharacterRun class and you need to perform the repllacements at that
> level.
>
> I do not have any suitable code at hand as I type this so what follows
> will
> need to be converted into Java and tested;
>
> Open the Word document.
> Get the overall Range for the document.
> Get the number of Paragraph objects the Range contains.
> Iterate through the Pargraphs and for each Pargraph
> Get the CharacterRun(s) the Paragraph contains.
> Call the method to replace the search term with the replacement text
> on
> the CharacterRun
> Save the modified document away again.
>
> You do however face a couple of problems with this. It has been a long
> time
> since I tried to write a search and replace routine using HWPF and I could
> not get it to work if the replacement text was longer that the search
> term.
> In that case, HWPF threw an exception and would not allow me to complete
> the
> process; but that problem could well have been addressed by now as it was
> well known and caused by faulty bounds checking within the Range class.
> Only
> testing will prove or disprove this for you I am afraid.
>
> Secondly, the CharacterRun class encapsulates a piece of text with common
> properties. So, imagine that we are searching for the phrase 'search term'
> and that the word 'search' has been emboldened whilst the word 'term' has
> been left as normal text, then my suggested approach will not work. That
> is
> because the words search and term will be held in different
> CharacterRun(s).
> If you do hit this problem, then I am afraid you will have to write code
> that searches for the term at the Paragraph level and that identifies
> where
> the search terms can be found and recovers the CharacterRun(s) that
> encapsulate them. Once you have these, you can modify the runs or create
> and
> substitute new ones but I have to admit that I have never tried to do this
> myself. Instead I chose to automate Word using OLE and to explore the
> possibilities offered by OpenOffices UNO interface. Both options did work
> but threw up other problems that proved more limiting (in terms of
> architecture and platform). If you can get it to work, HWPF offers the
> better solution IMO.
>
> Yours
>
> Mark B
>
>
> Fabián Avilés Martínez wrote:
>>
>> Hi all,
>> I have a Word document, as a template: In this template there are some
>> tokenized words, which have to be modified and the result has to be saved
>> into another file. The original file has some properties, like header and
>> footer, images, etc. The resulting file has to be the same, but with the
>> modified words. I am trying it with the code below, but it does not work.
>>
>> public ByteArrayOutputStream processFile(final InputStream is, final
>> Map<String, String> replacementText)
>> throws IOException {
>> Set<String> keys = replacementText.keySet();
>> try {
>> POIFSFileSystem poifs = new POIFSFileSystem(is);
>> HWPFDocument document = new HWPFDocument(poifs);
>> Range range = document.getRange();
>>
>> for (int i = 0; i < range.numParagraphs(); i++) {
>> String newTxt = range.getParagraph(i).text();
>> String oldTxt = range.getParagraph(i).text();
>> for (Iterator<String> it = keys.iterator();
>> it.hasNext();)
>> {
>> String key = it.next();
>> if (newTxt.contains(key)) {
>> newTxt = replacePlaceholders(key,
>> replacementText.get(key), newTxt);
>> }
>> }
>> if (!oldTxt.equals(newTxt)) {
>> range.getParagraph(i).replaceText(oldTxt, newTxt);
>> }
>> }
>>
>> // Save the document away.
>> ByteArrayOutputStream bos = new ByteArrayOutputStream();
>> document.write(bos);
>> bos.flush();
>> bos.close();
>> return bos;
>> } catch (IOException e) {
>> logger.error("Error procesando el fichero WORD: " + e);
>> throw new IOException("Error procesando el fichero WORD");
>> } finally {
>> if (is != null) {
>> is.close();
>> }
>> }
>> }
>>
>> Any help, please?
>>
>> Thanks in advance, Fabi.
>>
>>
>>
>> ______________________
>> This message including any attachments may contain confidential
>> information, according to our Information Security Management System,
>> and intended solely for a specific individual to whom they are
>> addressed.
>> Any unauthorised copy, disclosure or distribution of this message
>> is strictly forbidden. If you have received this transmission in error,
>> please notify the sender immediately and delete it.
>>
>> ______________________
>> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
>> puede contener informacion clasificada por su emisor como confidencial
>> en el marco de su Sistema de Gestion de Seguridad de la
>> Informacion siendo para uso exclusivo del destinatario, quedando
>> prohibida su divulgacion copia o distribucion a terceros sin la
>> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
>> erroneamente, se ruega lo notifique al remitente y proceda a su borrado.
>> Gracias por su colaboracion.
>>
>> ______________________
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>>
>
> --
> View this message in context:
> http://old.nabble.com/Modify-word-document-tp26480450p26491636.html
> Sent from the POI - User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
> ______________________
> This message including any attachments may contain confidential
> information, according to our Information Security Management System,
> and intended solely for a specific individual to whom they are addressed.
> Any unauthorised copy, disclosure or distribution of this message
> is strictly forbidden. If you have received this transmission in error,
> please notify the sender immediately and delete it.
>
> ______________________
> Este mensaje, y en su caso, cualquier fichero anexo al mismo,
> puede contener informacion clasificada por su emisor como confidencial
> en el marco de su Sistema de Gestion de Seguridad de la
> Informacion siendo para uso exclusivo del destinatario, quedando
> prohibida su divulgacion copia o distribucion a terceros sin la
> autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
> erroneamente, se ruega lo notifique al remitente y proceda a su borrado.
> Gracias por su colaboracion.
>
> ______________________
>
>
>
--
View this message in context:
http://old.nabble.com/Modify-word-document-tp26480450p26498333.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
______________________
This message including any attachments may contain confidential
information, according to our Information Security Management System,
and intended solely for a specific individual to whom they are addressed.
Any unauthorised copy, disclosure or distribution of this message
is strictly forbidden. If you have received this transmission in error,
please notify the sender immediately and delete it.
______________________
Este mensaje, y en su caso, cualquier fichero anexo al mismo,
puede contener informacion clasificada por su emisor como confidencial
en el marco de su Sistema de Gestion de Seguridad de la
Informacion siendo para uso exclusivo del destinatario, quedando
prohibida su divulgacion copia o distribucion a terceros sin la
autorizacion expresa del remitente. Si Vd. ha recibido este mensaje
erroneamente, se ruega lo notifique al remitente y proceda a su borrado.
Gracias por su colaboracion.
______________________