Hi Mark, Thanks for sending this program.
I tried this program with POI 3.2 final version, which iam currently using.
CharaterRun doesnt behave consistently the same way, sometimes it splits the
paragraph text into more number of Character run, sometimes it doesnt split
and i see the whole paragraph text in one character run. So the search text
is not getting replaced.
Is there anyway to solve this issue?
On Sat, Aug 8, 2009 at 6:34 AM, MSB <[email protected]> wrote:
>
> Here is the HWPF based code that I put together to play around with. It was
> written a very long time ago so I am not sure what testing I undertook and
> exactly what the results were but I have run it this morning just to ensure
> that
> it does not crash the PC and all seems to be well. This section has been
> cut
> from a much larger class that is full of other test code that I play around
> with peridocically. Everything you need is there I believe but on the
> off-chance
> that it calls another method whose source I have neglected to include, just
> drop
> an email to the list please.
>
> Currently, I am running POI version 3.5 beta 7 on a PC operating under
> Windows XP SP2. Office 2007 is installed now and it seems able to open the
> files this code produces quite happily. In the back of my mind, I seem to
> remember that the files produced by some search and replace code I put
> together could be opened but not modified; I tested that problem this
> morning
> and the files this code produces seem fine, I can open them, make changes
> and
> then save the results again. But do please be prepared for problems like
> that.
>
> Again, can I emphasise this is test code; it is scruffy and there are going
> to
> be variables I put in there so that I could monitor the progress of the
> code
> by dumping messages to the screen. As you go through, if something seems to
> be superfluous, then this is likely the reason and you can comment it out
> or
> delete it.
>
> Good luck and I do hope it all works. If you have any problems, just drop a
> message onto the list.
>
> Yours
>
> Mark B
>
>
> import org.apache.poi.poifs.filesystem.POIFSFileSystem;
> import org.apache.poi.hwpf.HWPFDocument;
> import org.apache.poi.hwpf.usermodel.Range;
> import org.apache.poi.hwpf.usermodel.Paragraph;
> import org.apache.poi.hwpf.usermodel.CharacterRun;
>
> import java.io.File;
> import java.io.FileOutputStream;
> import java.io.FileInputStream;
> import java.io.BufferedOutputStream;
> import java.io.BufferedInputStream;
> import java.util.HashMap;
> import java.util.Iterator;
>
> /**
> * This code is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR
> * CONDITIONS OF ANY KIND, either express or implied. It is not intended to
> * be used in a 'production' environment without undergoing rigorous
> testing.
> *
> * With that out of the way, an instance of this class can be used to
> search
> for
> * and replace Strings of text within a Word document. To see how the code
> may
> * be used, look into the main() method for examples.
> *
> * Note the replacements made by the code contained within this class
> ignore
> * any formatting that may have been applied to the text that is replaced.
> That
> * is to say that if the text was originally formatted to use the Arial
> font,
> * was sized to 24 points, emboldened, underlined and red in colour, then
> all
> * of this will be lost if it is replaced. Further if any text is replaced
> in a
> * Paragraph, all the formatting applied to that Paragraph's contents is
> likely
> * to be lost.
> *
> * @author Mark Beardsley [msb at apache.org]
> * @version 1.00 8th August 2009 (cannot remember when originally put
> together)
> */
> public class SearchReplace {
>
>
> /**
> * Search for and replace a single occurrence of a string of text
> within a
> * Word document.
> *
> * Note that no checks are made on the parameter's values; that is
> to say
> * that the file named in the InputFilename parameter will not be
> checked
> * to ensure the file exists and neither of the searchTerm nor
> * replacementTerm parameters will be checked to ensure they are not
> null.
> * Also, note that I have never tested passing the same String to
> the
> * inputFilename and outputFilename parameters but cannot see why
> that
> * should not be possible.
> *
> * @param inputFilename An instance of the String class that
> encapsulates
> * the name of and path to a Word document
> which is
> * in the binary (OLE2CDF) format. The contents
> of
> this
> * document will be searched for occurrences of
> the
> * search term.
> * @param outputFilename An instance of the String class that
> encapsulates
> * the name of and path to a Word document
> which is
> * in the binary (OLE2CDF) format. This
> document will
> * contain the results of the search and
> replace
> * operation.
> * @param searchTerm An instance of the String class that
> encapsulates a
> * series of characters, a word or words. The
> document
> * will be searched for occurrences of this
> String.
> * @param replacementTerm An instance of the String class that
> contains a
> * series of characters, a word or words. The
> String
> * encapsulated by the searchTerm parameter
> will be
> * replaced by the 'contents' of this
> parameter.
> *
> */
> public void searchAndReplace(String inputFilename,
> String outputFilename,
> String searchTerm,
> String replacementText) {
>
> File inputFile = null;
> File outputFile = null;
> FileInputStream fileIStream = null;
> FileOutputStream fileOStream = null;
> BufferedInputStream bufIStream = null;
> BufferedOutputStream bufOStream = null;
> POIFSFileSystem fileSystem = null;
> HWPFDocument document = null;
> Range docRange = null;
> Paragraph paragraph = null;
> CharacterRun charRun = null;
> int numParagraphs = 0;
> int numCharRuns = 0;
> String text = null;
>
> try {
> // Create an instance of the POIFSFileSystem class and
> // attach it to the Word document using an InputStream.
> inputFile = new File(inputFilename);
> fileIStream = new FileInputStream(inputFile);
> bufIStream = new BufferedInputStream(fileIStream);
> fileSystem = new POIFSFileSystem(bufIStream);
> document = new HWPFDocument(fileSystem);
>
> // Get the overall Range object for the document. Note the
> // use of the getRange() method and not the getOverallRange()
> // method, this is just historic - when the code was originally
> // written, I do not believe the latter method was part of the
> API.
> docRange = document.getRange();
>
> // Get the number of Paragraph(s) in the overall range and
> iterate
> // through them
> numParagraphs = docRange.numParagraphs();
> for(int i = 0; i < numParagraphs; i++) {
>
> // Get a Paragraph and recover the text from it. This step
> is
> far from
> // necessary and I think I only got the text so that I could
> print
> // it to screen as a diagnostic check to ensure that the
> Paragraph
> // contained the text I was searching for. Experiment with
> this.
> paragraph = docRange.getParagraph(i);
> text = paragraph.text();
>
> // Get the number of CharacterRuns in the Paragraph
> numCharRuns = paragraph.numCharacterRuns();
> for(int j = 0; j < numCharRuns; j++) {
>
> // Get a character run and recover it's text - note
> that
> // the same text variable is used as for the
> Paragraph
> above.
> // So, it MUST be safe to remove the text =
> paragraph.text()
> // line above.
> charRun = paragraph.getCharacterRun(j);
> text = charRun.text();
>
> // Check to see if the text of the CharacterRun contains
> the
> // search term. If it does, find out where that term
> starts
> // and call the replaceText() method passing the index.
> // Maybe this is the key difference between what we are
> // doing.
> if(text.contains(searchTerm)) {
> int start = text.indexOf(searchTerm);
> charRun.replaceText(searchTerm, replacementText,
> start);
> }
> }
> }
>
> // Close the InputStream
> bufIStream.close();
> bufIStream = null;
>
> // Open an OutputStream and write the document away.
> outputFile = new File(outputFilename);
> fileOStream = new FileOutputStream(outputFile);
> bufOStream = new BufferedOutputStream(fileOStream);
>
> document.write(bufOStream);
>
> }
> catch(Exception ex) {
> System.out.println("Caught an: " + ex.getClass().getName());
> System.out.println("Message: " + ex.getMessage());
> System.out.println("Stacktrace follows.............");
> ex.printStackTrace(System.out);
> }
> finally {
> if(bufOStream != null) {
> try {
> //bufOStream.flush();
> bufOStream.close();
> bufOStream = null;
> }
> catch(Exception ex) {
>
> }
> }
> if(bufIStream != null) {
> try {
> bufIStream.close();
> bufIStream = null;
> }
> catch(Exception ex) {
> // I G N O R E //
> }
> }
> }
>
> }
>
> /**
> * Search for and replace a single occurrence of a string of text
> within a
> * Word document.
> *
> * Note that no checks are made on the parameter's values; that is
> to say
> * that the file named in the InputFilename parameter will not be
> checked
> * to ensure the file exists and neither of the searchTerm nor
> * replacementTerm pare,eters will be checked to ensure they are not
> null.
> * Also, note that I have never tested passing the same String to
> the
> * inputFilename and outputFilename parameters but cannot see why
> that
> * should not be possible.
> *
> * @param inputFilename An instance of the String class that
> encapsulates
> * the name of and path to a Word document
> which is
> * in the binary (OLE2CDF) format. The contents
> of
> this
> * document will be searched for occurrences of
> the
> * search term.
> * @param outputFilename An instance of the String class that
> encapsulates
> * the name of and path to a Word document
> which is
> * in the binary (OLE2CDF) format. This
> document will
> * contain the results of the search and
> replace
> * operation.
> * @param replacements An instance of the java.util.HashMap class
> that
> * contains a series of key, value pairs. Each
> key
> * is an instance of the String class that
> encapsulates
> * a series of characters, a word or words that
> the
> * code will search for and the accompanying
> value is
> * also an instance of the String class that
> likewise
> * encapsulates a series of characters, a word
> or
> words.
> * The 'contents' of the value's String will be
> used to
> * replace the contents of the key's String if
> an
> * occurrence of the latter is found.
> */
> public void searchAndReplace(String inputFilename,
> String outputFilename,
> HashMap<String, String> replacements) {
>
> File inputFile = null;
> File outputFile = null;
> FileInputStream fileIStream = null;
> FileOutputStream fileOStream = null;
> BufferedInputStream bufIStream = null;
> BufferedOutputStream bufOStream = null;
> POIFSFileSystem fileSystem = null;
> HWPFDocument document = null;
> Range docRange = null;
> Paragraph paragraph = null;
> CharacterRun charRun = null;
> Set<String> keySet = null;
> Iterator<String> keySetIterator = null;
> int numParagraphs = 0;
> int numCharRuns = 0;
> String text = null;
> String key = null;
> String value = null;
>
> try {
> // Create an instance of the POIFSFileSystem class and
> // attach it to the Word document using an InputStream.
> inputFile = new File(inputFilename);
> fileIStream = new FileInputStream(inputFile);
> bufIStream = new BufferedInputStream(fileIStream);
> fileSystem = new POIFSFileSystem(bufIStream);
> document = new HWPFDocument(fileSystem);
>
> // Get a reference to the overall Range for the
> document
> // and discover how many Paragraphs objects there
> are
> // in the document.
> docRange = document.getRange();
> numParagraphs = docRange.numParagraphs();
>
> // Recover a Set of the keys in the HashMap
> keySet = replacements.keySet();
>
> // Step through each Paragraph
> for(int i = 0; i < numParagraphs; i++) {
> paragraph = docRange.getParagraph(i);
> // This line can almost certainly be removed - see
> // the comments in the method above.
> text = paragraph.text();
>
> // Get the number of CharacterRuns in the Paragraph
> // and step through each one.
> numCharRuns = paragraph.numCharacterRuns();
> for(int j = 0; j < numCharRuns; j++) {
> charRun = paragraph.getCharacterRun(j);
>
> // Get the text from the CharacterRun and recover an
> // Iterator to step through the Set of keys.
> text = charRun.text();
> keySetIterator = keySet.iterator();
> while(keySetIterator.hasNext()) {
>
> // Get the key - which is also the search term - and
> // check to see if it can be found within the
> // CharacterRuns text.
> key = keySetIterator.next();
> if(text.contains(key)) {
>
> // If the search term was found in the text,
> get
> the
> // matching value from the HashMap, find out
> whereabouts
> // in the CharacterRuns text the search term
> is
> // and call the replaceText() method to
> substitute
> // the replacement term for the search term.
> value = replacements.get(key);
> int start = text.indexOf(key);
> charRun.replaceText(key, value, start);
>
> // Note that this code was added to test whether
> // it was possible to replace multiple
> occurrences
> // of the search term. I cannot remember if I
> tested
> // it but believe that it did work; either way,
> // it could be tested now and if succeeds, then
> the
> // searchAndReplace() method above could be
> modified
> // to include this.
> docRange = document.getRange();
> paragraph = docRange.getParagraph(i);
> charRun = paragraph.getCharacterRun(j);
> text = charRun.text();
> }
> }
> }
> }
>
> // Close the InputStream
> bufIStream.close();
> bufIStream = null;
>
> // Open an OutputStream and save the modified document away.
> outputFile = new File(outputFilename);
> fileOStream = new FileOutputStream(outputFile);
> bufOStream = new BufferedOutputStream(fileOStream);
> document.write(bufOStream);
> }
> catch(Exception ex) {
> System.out.println("Caught an: " + ex.getClass().getName());
> System.out.println("Message: " + ex.getMessage());
> System.out.println("Stacktrace follows.............");
> ex.printStackTrace(System.out);
> }
> finally {
> if(bufIStream != null) {
> try {
> bufIStream.close();
> bufIStream = null;
> }
> catch(Exception ex) {
> // I G N O R E //
> }
> }
> if(bufOStream != null) {
> try {
> bufOStream.flush();
> bufOStream.close();
> bufOStream = null;
> }
> catch(Exception ex) {
>
> }
> }
> }
>
> }
>
> /**
> * The main entry point to the program demonstrating how the code
> may
> * be utilised.
> *
> * @param args An array of type String containing argumnets passed
> to the
> * program on execution.
> */
> public static void main(String[] args) {
> SearchReplace replacer = new SearchReplace();
>
> // To serach for and replace single items. Note, the code
> has not, at
> // least as far as I can remember, been tested by passing
> the same
> // file to both the searchTerm and replacementTerm
> parameters. It ought
> // to work but has NOT been tested I believe.
> replacer.searchAndReplace("Document.doc", //
> Source Document
> "Replaced Document.doc", //
> Result Document
> "search term", //
> Search term
> "replacement term"); //
> Replacement term
>
> // To search for and replace a series of items
> HashMap<String, String> searchTerms = new HashMap<String,
> String>();
> searchTerms.put("search term 1", "replacement term 1");
> searchTerms.put("search term 2", "replacement term 2");
> searchTerms.put("search term 3", "replacement term 3");
> searchTerms.put("search term 4", "replacement term 4");
>
> replacer.searchAndReplace("Document.doc", // Source
> Document
> "Replaced Document.doc", // Result
> Document
> searchTerms) //
> Search/replacement items
> }
> }
>
>
>
> karthik-33 wrote:
> >
> > Thanks for the reply mark.
> > I dont think i need to preserve text formatting, but i would like to try
> > your code and see how it works.
> > I think that would help me too.
> >
> > I cant go the open office since my business requirement is to use
> > microsoft
> > word documents.
> > I will be using this search and replace function in the same PC as that
> of
> > the application.
> >
> > If you can send me that code, i will try and let u know how it works.
> >
> > Thanks
> > Karthik
> >
> >
> > On Fri, Aug 7, 2009 at 11:49 AM, MSB <[email protected]> wrote:
> >
> >>
> >> Can I ask two questions please?
> >>
> >> Do you need to preserve the formatting applied to the text? If not, then
> >> I
> >> think that somewhere I have a piece of HWPF code that does a search and
> >> replace. I am not at all certain about the state of the code and cannot
> >> remember if I hit the same problem as you - and I may well have - but I
> >> am
> >> willing to look it out if you think it might help.
> >>
> >> Secondly, do you have to use HWPF/XWPF? The API is still immature and it
> >> is
> >> really only suitable for realtively simple tasks. Better alternatives
> >> might
> >> be OpenOffice which you can 'control' through it's UNO API or Word
> itself
> >> that can be manipulated using OLE. You can ONLY use OLE if you are
> >> working
> >> on a windows based PC and you have Word installed on that PC. OpenOffice
> >> is
> >> more flexible but it still cannot be used - at least as far as I am
> aware
> >> -
> >> as a document server, so it is best to have that application installed
> on
> >> the PC you will be using for the search/replace operation.
> >>
> >> Yours
> >>
> >> Mark B
> >>
> >>
> >> karthik-33 wrote:
> >> >
> >> > I have microsoft office 2007 and while saving the document, i save it
> >> as
> >> > microsoft 2003 document.
> >> > Iam trying to replace the text using replaceText method in Paragraph.
> >> > It works fine when the replacement text and search text are of equal
> >> > length.
> >> > It corrupts the document, when the length of the string is either
> >> greater
> >> > or
> >> > less.
> >> > If anyone has gone through the issue and resolved or have any idea.
> >> Please
> >> > let me know, it will be useful for me..
> >> > Iam not sure what is causing the problem to corrupt the document
> >> >
> >> > Code is:
> >> >
> >> > String replaceTxt = "Replacement";
> >> > String searchText = "Orginial";
> >> > POIFSFileSystem ps = new POIFSFileSystem (new
> >> > FileInputStream("C:/Document.doc"));
> >> > HWPFDocument doc = new HWPFDocument ();
> >> > Range range = doc.getRange();
> >> > for(int x=0;x<range.numSections();x++)
> >> > {
> >> > Section s = range.getSection(x);
> >> > for(int y=0;y<s.numParagraphs();y++)
> >> > {
> >> > Paragraph p = s.getParagraph(y);
> >> > String paraText = p.text();
> >> > int offset = paraText.indexOf(searchText );
> >> > if(offset != -1)
> >> > {
> >> > p.replaceText(searchText,replaceTxt,offset);
> >> >
> >> > }
> >> > }
> >> >
> >> > }
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Replace-Text-Problem-%28Document-Corrupt%29---POI-HWPFDocument-tp24864855p24867251.html
> >> Sent from the POI - User mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >>
> >
> >
> > --
> > karthik
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Replace-Text-Problem-%28Document-Corrupt%29---POI-HWPFDocument-tp24864855p24876942.html
> Sent from the POI - User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
--
karthik