I tried to post again yesterday but Nabble refused to connect and accept the
message for some reason.

What I wanted to say was, do not bother with that piece of code. It exhibits
the old problem where an exception will be thrown if the replacement term is
longer than the serach term; trying to replace 'test' for example with
'tested' will break the code.

I am going to carry on looking because I am certian that I had some code
that used a more 'primitive' approach and it did seem to work. If I remember
correctly, I grabbed the text of the Pargraph into a String and then used
java code to perform any serach and replace operations; then, if any
replecamenets hade been made, I simply over-wrote the complete contents of
the Pragraph. There are still a couple of DVD's of code that I need to lok
through and if I find it, I wil post. You could always try to put something
together yourself to test the theory if you have the time. Do no know how
familiar you are with String handling is java but the indexOf() and
substring() methods formed the basis of the technique.

Yours

Mark B

PS. The only way to replace terms if they spill across CharacterRun
boundaries would be to take a step up the class hierarchy and operate on the
Paragraph object.


karthik-33 wrote:
> 
> Hi Mark, Thanks for sending this program.
> I tried this program with POI 3.2 final version, which iam currently
> using.
> CharaterRun doesnt behave consistently the same way, sometimes it splits
> the
> paragraph text into more number of Character run, sometimes it doesnt
> split
> and i see the whole paragraph text in one character run. So the search
> text
> is not getting replaced.
> Is there anyway to solve this issue?
> On Sat, Aug 8, 2009 at 6:34 AM, MSB <[email protected]> wrote:
> 
>>
>> Here is the HWPF based code that I put together to play around with. It
>> was
>> written a very long time ago so I am not sure what testing I undertook
>> and
>> exactly what the results were but I have run it this morning just to
>> ensure
>> that
>> it does not crash the PC and all seems to be well. This section has been
>> cut
>> from a much larger class that is full of other test code that I play
>> around
>> with peridocically. Everything you need is there I believe but on the
>> off-chance
>> that it calls another method whose source I have neglected to include,
>> just
>> drop
>> an email to the list please.
>>
>> Currently, I am running POI version 3.5 beta 7 on a PC operating under
>> Windows XP SP2. Office 2007 is installed now and it seems able to open
>> the
>> files this code produces quite happily. In the back of my mind, I seem to
>> remember that the files produced by some search and replace code I put
>> together could be opened but not modified; I tested that problem this
>> morning
>> and the files this code produces seem fine, I can open them, make changes
>> and
>> then save the results again. But do please be prepared for problems like
>> that.
>>
>> Again, can I emphasise this is test code; it is scruffy and there are
>> going
>> to
>> be variables I put in there so that I could monitor the progress of the
>> code
>> by dumping messages to the screen. As you go through, if something seems
>> to
>> be superfluous, then this is likely the reason and you can comment it out
>> or
>> delete it.
>>
>> Good luck and I do hope it all works. If you have any problems, just drop
>> a
>> message onto the list.
>>
>> Yours
>>
>> Mark B
>>
>>
>> import org.apache.poi.poifs.filesystem.POIFSFileSystem;
>> import org.apache.poi.hwpf.HWPFDocument;
>> import org.apache.poi.hwpf.usermodel.Range;
>> import org.apache.poi.hwpf.usermodel.Paragraph;
>> import org.apache.poi.hwpf.usermodel.CharacterRun;
>>
>> import java.io.File;
>> import java.io.FileOutputStream;
>> import java.io.FileInputStream;
>> import java.io.BufferedOutputStream;
>> import java.io.BufferedInputStream;
>> import java.util.HashMap;
>> import java.util.Iterator;
>>
>> /**
>>  * This code is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR
>>  * CONDITIONS OF ANY KIND, either express or implied. It is not intended
>> to
>>  * be used in a 'production' environment without undergoing rigorous
>> testing.
>>  *
>>  * With that out of the way, an instance of this class can be used to
>> search
>> for
>>  * and replace Strings of text within a Word document. To see how the
>> code
>> may
>>  * be used, look into the main() method for examples.
>>  *
>>  * Note the replacements made by the code contained within this class
>> ignore
>>  * any formatting that may have been applied to the text that is
>> replaced.
>> That
>>  * is to say that if the text was originally formatted to use the Arial
>> font,
>>  * was sized to 24 points, emboldened, underlined and red in colour, then
>> all
>>  * of this will be lost if it is replaced. Further if any text is
>> replaced
>> in a
>>  * Paragraph, all the formatting applied to that Paragraph's contents is
>> likely
>>  * to be lost.
>>  *
>>  * @author Mark Beardsley [msb at apache.org]
>>  * @version 1.00 8th August 2009 (cannot remember when originally put
>> together)
>>  */
>> public class SearchReplace {
>>
>>
>>        /**
>>         * Search for and replace a single occurrence of a string of text
>> within a
>>         * Word document.
>>         *
>>         * Note that no checks are made on the parameter's values; that is
>> to say
>>         * that the file named in the InputFilename parameter will not be
>> checked
>>         * to ensure the file exists and neither of the searchTerm nor
>>         * replacementTerm parameters will be checked to ensure they are
>> not
>> null.
>>         * Also, note that I have never tested passing the same String to
>> the
>>         * inputFilename and outputFilename parameters but cannot see why
>> that
>>         * should not be possible.
>>         *
>>         * @param inputFilename An instance of the String class that
>> encapsulates
>>         *                      the name of and path to a Word document
>> which is
>>         *                      in the binary (OLE2CDF) format. The
>> contents
>> of
>> this
>>         *                      document will be searched for occurrences
>> of
>> the
>>         *                      search term.
>>         * @param outputFilename An instance of the String class that
>> encapsulates
>>         *                       the name of and path to a Word document
>> which is
>>         *                       in the binary (OLE2CDF) format. This
>> document will
>>         *                       contain the results of the search and
>> replace
>>         *                       operation.
>>         * @param searchTerm An instance of the String class that
>> encapsulates a
>>         *                   series of characters, a word or words. The
>> document
>>         *                   will be searched for occurrences of this
>> String.
>>         * @param replacementTerm An instance of the String class that
>> contains a
>>         *                        series of characters, a word or words.
>> The
>> String
>>         *                        encapsulated by the searchTerm parameter
>> will be
>>         *                        replaced by the 'contents' of this
>> parameter.
>>         *
>>         */
>>        public void searchAndReplace(String inputFilename,
>>                                 String outputFilename,
>>                                 String searchTerm,
>>                                 String replacementText) {
>>
>>        File inputFile = null;
>>        File outputFile = null;
>>        FileInputStream fileIStream = null;
>>        FileOutputStream fileOStream = null;
>>        BufferedInputStream bufIStream = null;
>>        BufferedOutputStream bufOStream = null;
>>        POIFSFileSystem fileSystem = null;
>>        HWPFDocument document = null;
>>        Range docRange = null;
>>        Paragraph paragraph = null;
>>        CharacterRun charRun = null;
>>        int numParagraphs = 0;
>>        int numCharRuns = 0;
>>        String text = null;
>>
>>        try {
>>            // Create an instance of the POIFSFileSystem class and
>>            // attach it to the Word document using an InputStream.
>>            inputFile = new File(inputFilename);
>>            fileIStream = new FileInputStream(inputFile);
>>            bufIStream = new BufferedInputStream(fileIStream);
>>            fileSystem = new POIFSFileSystem(bufIStream);
>>            document = new HWPFDocument(fileSystem);
>>
>>            // Get the overall Range object for the document. Note the
>>            // use of the getRange() method and not the getOverallRange()
>>            // method, this is just historic - when the code was
>> originally
>>            // written, I do not believe the latter method was part of the
>> API.
>>            docRange = document.getRange();
>>
>>            // Get the number of Paragraph(s) in the overall range and
>> iterate
>>            // through them
>>            numParagraphs = docRange.numParagraphs();
>>            for(int i = 0; i < numParagraphs; i++) {
>>
>>                // Get a Paragraph and recover the text from it. This step
>> is
>> far from
>>                // necessary and I think I only got the text so that I
>> could
>> print
>>                // it to screen as a diagnostic check to ensure that the
>> Paragraph
>>                // contained the text I was searching for. Experiment with
>> this.
>>                paragraph = docRange.getParagraph(i);
>>                text = paragraph.text();
>>
>>                // Get the number of CharacterRuns in the Paragraph
>>                numCharRuns = paragraph.numCharacterRuns();
>>                for(int j = 0; j < numCharRuns; j++) {
>>
>>                        // Get a character run and recover it's text -
>> note
>> that
>>                        // the same text variable is used as for the
>> Paragraph
>> above.
>>                        // So, it MUST be safe to remove the text =
>> paragraph.text()
>>                        // line above.
>>                    charRun = paragraph.getCharacterRun(j);
>>                    text = charRun.text();
>>
>>                    // Check to see if the text of the CharacterRun
>> contains
>> the
>>                    // search term. If it does, find out where that term
>> starts
>>                    // and call the replaceText() method passing the
>> index.
>>                    // Maybe this is the key difference between what we
>> are
>>                    // doing.
>>                    if(text.contains(searchTerm)) {
>>                        int start = text.indexOf(searchTerm);
>>                        charRun.replaceText(searchTerm, replacementText,
>> start);
>>                    }
>>                }
>>            }
>>
>>            // Close the InputStream
>>            bufIStream.close();
>>            bufIStream = null;
>>
>>            // Open an OutputStream and write the document away.
>>            outputFile = new File(outputFilename);
>>            fileOStream = new FileOutputStream(outputFile);
>>            bufOStream = new BufferedOutputStream(fileOStream);
>>
>>            document.write(bufOStream);
>>
>>        }
>>        catch(Exception ex) {
>>            System.out.println("Caught an: " + ex.getClass().getName());
>>            System.out.println("Message: " + ex.getMessage());
>>            System.out.println("Stacktrace follows.............");
>>            ex.printStackTrace(System.out);
>>        }
>>        finally {
>>            if(bufOStream != null) {
>>                try {
>>                    //bufOStream.flush();
>>                    bufOStream.close();
>>                    bufOStream = null;
>>                }
>>                catch(Exception ex) {
>>
>>                }
>>            }
>>            if(bufIStream != null) {
>>                try {
>>                    bufIStream.close();
>>                    bufIStream = null;
>>                }
>>                catch(Exception ex) {
>>                    // I G N O R E //
>>                }
>>            }
>>        }
>>
>>    }
>>
>>    /**
>>         * Search for and replace a single occurrence of a string of text
>> within a
>>         * Word document.
>>         *
>>         * Note that no checks are made on the parameter's values; that is
>> to say
>>         * that the file named in the InputFilename parameter will not be
>> checked
>>         * to ensure the file exists and neither of the searchTerm nor
>>         * replacementTerm pare,eters will be checked to ensure they are
>> not
>> null.
>>         * Also, note that I have never tested passing the same String to
>> the
>>         * inputFilename and outputFilename parameters but cannot see why
>> that
>>         * should not be possible.
>>         *
>>         * @param inputFilename An instance of the String class that
>> encapsulates
>>         *                      the name of and path to a Word document
>> which is
>>         *                      in the binary (OLE2CDF) format. The
>> contents
>> of
>> this
>>         *                      document will be searched for occurrences
>> of
>> the
>>         *                      search term.
>>         * @param outputFilename An instance of the String class that
>> encapsulates
>>         *                       the name of and path to a Word document
>> which is
>>         *                       in the binary (OLE2CDF) format. This
>> document will
>>         *                       contain the results of the search and
>> replace
>>         *                       operation.
>>         * @param replacements An instance of the java.util.HashMap class
>> that
>>         *                     contains a series of key, value pairs. Each
>> key
>>         *                     is an instance of the String class that
>> encapsulates
>>         *                     a series of characters, a word or words
>> that
>> the
>>         *                     code will search for and the accompanying
>> value is
>>         *                     also an instance of the String class that
>> likewise
>>         *                     encapsulates a series of characters, a word
>> or
>> words.
>>         *                     The 'contents' of the value's String will
>> be
>> used to
>>         *                     replace the contents of the key's String if
>> an
>>         *                     occurrence of the latter is found.
>>         */
>>    public void searchAndReplace(String inputFilename,
>>                                 String outputFilename,
>>                                 HashMap<String, String> replacements) {
>>
>>        File inputFile = null;
>>        File outputFile = null;
>>        FileInputStream fileIStream = null;
>>        FileOutputStream fileOStream = null;
>>        BufferedInputStream bufIStream = null;
>>        BufferedOutputStream bufOStream = null;
>>        POIFSFileSystem fileSystem = null;
>>        HWPFDocument document = null;
>>        Range docRange = null;
>>        Paragraph paragraph = null;
>>        CharacterRun charRun = null;
>>        Set<String> keySet = null;
>>        Iterator<String> keySetIterator = null;
>>        int numParagraphs = 0;
>>        int numCharRuns = 0;
>>        String text = null;
>>        String key = null;
>>        String value = null;
>>
>>        try {
>>            // Create an instance of the POIFSFileSystem class and
>>            // attach it to the Word document using an InputStream.
>>            inputFile = new File(inputFilename);
>>            fileIStream = new FileInputStream(inputFile);
>>            bufIStream = new BufferedInputStream(fileIStream);
>>            fileSystem = new POIFSFileSystem(bufIStream);
>>            document = new HWPFDocument(fileSystem);
>>
>>                        // Get a reference to the overall Range for the
>> document
>>                        // and discover how many Paragraphs objects there
>> are
>>                        // in the document.
>>            docRange = document.getRange();
>>            numParagraphs = docRange.numParagraphs();
>>
>>            // Recover a Set of the keys in the HashMap
>>            keySet = replacements.keySet();
>>
>>            // Step through each Paragraph
>>            for(int i = 0; i < numParagraphs; i++) {
>>                paragraph = docRange.getParagraph(i);
>>                // This line can almost certainly be removed - see
>>                // the comments in the method above.
>>                text = paragraph.text();
>>
>>                // Get the number of CharacterRuns in the Paragraph
>>                // and step through each one.
>>                numCharRuns = paragraph.numCharacterRuns();
>>                for(int j = 0; j < numCharRuns; j++) {
>>                    charRun = paragraph.getCharacterRun(j);
>>
>>                    // Get the text from the CharacterRun and recover an
>>                    // Iterator to step through the Set of keys.
>>                    text = charRun.text();
>>                    keySetIterator = keySet.iterator();
>>                    while(keySetIterator.hasNext()) {
>>
>>                        // Get the key - which is also the search term -
>> and
>>                        // check to see if it can be found within the
>>                        // CharacterRuns text.
>>                        key = keySetIterator.next();
>>                        if(text.contains(key)) {
>>
>>                                // If the search term was found in the
>> text,
>> get
>> the
>>                                // matching value from the HashMap, find
>> out
>> whereabouts
>>                                // in the CharacterRuns text the search
>> term
>> is
>>                                // and call the replaceText() method to
>> substitute
>>                                // the replacement term for the search
>> term.
>>                            value = replacements.get(key);
>>                            int start = text.indexOf(key);
>>                            charRun.replaceText(key, value, start);
>>
>>                            // Note that this code was added to test
>> whether
>>                            // it was possible to replace multiple
>> occurrences
>>                            // of the search term. I cannot remember if I
>> tested
>>                            // it but believe that it did work; either
>> way,
>>                            // it could be tested now and if succeeds,
>> then
>> the
>>                            // searchAndReplace() method above could be
>> modified
>>                            // to include this.
>>                            docRange = document.getRange();
>>                            paragraph = docRange.getParagraph(i);
>>                            charRun = paragraph.getCharacterRun(j);
>>                            text = charRun.text();
>>                        }
>>                    }
>>                }
>>            }
>>
>>            // Close the InputStream
>>            bufIStream.close();
>>            bufIStream = null;
>>
>>            // Open an OutputStream and save the modified document away.
>>            outputFile = new File(outputFilename);
>>            fileOStream = new FileOutputStream(outputFile);
>>            bufOStream = new BufferedOutputStream(fileOStream);
>>            document.write(bufOStream);
>>        }
>>        catch(Exception ex) {
>>            System.out.println("Caught an: " + ex.getClass().getName());
>>            System.out.println("Message: " + ex.getMessage());
>>            System.out.println("Stacktrace follows.............");
>>            ex.printStackTrace(System.out);
>>        }
>>        finally {
>>            if(bufIStream != null) {
>>                try {
>>                    bufIStream.close();
>>                    bufIStream = null;
>>                }
>>                catch(Exception ex) {
>>                    // I G N O R E //
>>                }
>>            }
>>            if(bufOStream != null) {
>>                try {
>>                    bufOStream.flush();
>>                    bufOStream.close();
>>                    bufOStream = null;
>>                }
>>                catch(Exception ex) {
>>
>>                }
>>            }
>>        }
>>
>>    }
>>
>>        /**
>>         * The main entry point to the program demonstrating how the code
>> may
>>         * be utilised.
>>         *
>>         * @param args An array of type String containing argumnets passed
>> to the
>>         *             program on execution.
>>         */
>>        public static void main(String[] args) {
>>                SearchReplace replacer = new SearchReplace();
>>
>>                // To serach for and replace single items. Note, the code
>> has not, at
>>                // least as far as I can remember, been tested by passing
>> the same
>>                // file to both the searchTerm and replacementTerm
>> parameters. It ought
>>                // to work but has NOT been tested I believe.
>>                replacer.searchAndReplace("Document.doc",            //
>> Source Document
>>                        "Replaced Document.doc",                        
>> //
>> Result Document
>>                        "search term",                                  
>> //
>> Search term
>>                        "replacement term");                            
>> //
>> Replacement term
>>
>>                // To search for and replace a series of items
>>                HashMap<String, String> searchTerms = new HashMap<String,
>> String>();
>>                searchTerms.put("search term 1", "replacement term 1");
>>                searchTerms.put("search term 2", "replacement term 2");
>>                searchTerms.put("search term 3", "replacement term 3");
>>                searchTerms.put("search term 4", "replacement term 4");
>>
>>                replacer.searchAndReplace("Document.doc",    // Source
>> Document
>>                        "Replaced Document.doc",                 // Result
>> Document
>>                        searchTerms)                             //
>> Search/replacement items
>>         }
>> }
>>
>>
>>
>> karthik-33 wrote:
>> >
>> > Thanks for the reply mark.
>> > I dont think i need to preserve text formatting, but i would like to
>> try
>> > your code and see how it works.
>> > I think that would help me too.
>> >
>> > I cant go the open office since my business requirement is to use
>> > microsoft
>> > word documents.
>> > I will be using this search and replace function in the same PC as that
>> of
>> > the application.
>> >
>> > If you can send me that code, i will try and let u know how it works.
>> >
>> > Thanks
>> > Karthik
>> >
>> >
>> > On Fri, Aug 7, 2009 at 11:49 AM, MSB <[email protected]> wrote:
>> >
>> >>
>> >> Can I ask two questions please?
>> >>
>> >> Do you need to preserve the formatting applied to the text? If not,
>> then
>> >> I
>> >> think that somewhere I have a piece of HWPF code that does a search
>> and
>> >> replace. I am not at all certain about the state of the code and
>> cannot
>> >> remember if I hit the same problem as you - and I may well have - but
>> I
>> >> am
>> >> willing to look it out if you think it might help.
>> >>
>> >> Secondly, do you have to use HWPF/XWPF? The API is still immature and
>> it
>> >> is
>> >> really only suitable for realtively simple tasks. Better alternatives
>> >> might
>> >> be OpenOffice which you can 'control' through it's UNO API or Word
>> itself
>> >> that can be manipulated using OLE. You can ONLY use OLE if you are
>> >> working
>> >> on a windows based PC and you have Word installed on that PC.
>> OpenOffice
>> >> is
>> >> more flexible but it still cannot be used - at least as far as I am
>> aware
>> >> -
>> >> as a document server, so it is best to have that application installed
>> on
>> >> the PC you will be using for the search/replace operation.
>> >>
>> >> Yours
>> >>
>> >> Mark B
>> >>
>> >>
>> >> karthik-33 wrote:
>> >> >
>> >> > I have microsoft office 2007 and while saving the document, i save
>> it
>> >> as
>> >> > microsoft 2003 document.
>> >> > Iam trying to replace the text using replaceText method in
>> Paragraph.
>> >> > It works fine when the replacement text and search text are of equal
>> >> > length.
>> >> > It corrupts the document, when the length of the string is either
>> >> greater
>> >> > or
>> >> > less.
>> >> > If anyone has gone through the issue and resolved or have any idea.
>> >> Please
>> >> > let me know, it will be useful for me..
>> >> > Iam not sure what is causing the problem to corrupt the document
>> >> >
>> >> > Code is:
>> >> >
>> >> > String replaceTxt = "Replacement";
>> >> > String searchText = "Orginial";
>> >> > POIFSFileSystem ps = new POIFSFileSystem (new
>> >> > FileInputStream("C:/Document.doc"));
>> >> > HWPFDocument doc = new HWPFDocument ();
>> >> > Range range = doc.getRange();
>> >> > for(int x=0;x<range.numSections();x++)
>> >> > {
>> >> >     Section s = range.getSection(x);
>> >> >     for(int y=0;y<s.numParagraphs();y++)
>> >> >     {
>> >> >         Paragraph p = s.getParagraph(y);
>> >> >         String paraText = p.text();
>> >> >         int offset = paraText.indexOf(searchText );
>> >> >         if(offset != -1)
>> >> >         {
>> >> >              p.replaceText(searchText,replaceTxt,offset);
>> >> >
>> >> >         }
>> >> >     }
>> >> >
>> >> > }
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/Replace-Text-Problem-%28Document-Corrupt%29---POI-HWPFDocument-tp24864855p24867251.html
>> >> Sent from the POI - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>> >>
>> >
>> >
>> > --
>> > karthik
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Replace-Text-Problem-%28Document-Corrupt%29---POI-HWPFDocument-tp24864855p24876942.html
>>  Sent from the POI - User mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
> 
> 
> -- 
> karthik
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Replace-Text-Problem-%28Document-Corrupt%29---POI-HWPFDocument-tp24864855p24884574.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to