Couldn't you just run an XSLT against the document.xml file and convert it
to text?  Then you would simply run the converted text through your existing
code.  Or am I missing something?

Regards,

Mark F


On Tue, Mar 8, 2011 at 6:04 AM, John C <[email protected]> wrote:

>
> I would like to manipulate text at the word level in a docx file based upon
> neighboring words (1 word to the left, 1 word to the right).
> With a txt file this process is very simple. Now I would like to do the
> same with docx files and then later doc files.
> I spent quite a bit of time searching for an example to solve this problem
> that I could reproduce however to no avail. Therefore I thought of a
> possible hack to achieve this and would like some feedback.
> Assuming each word has a consistent styling...1. Change the file extension
> from .docx to .zip2. Unzip the file3. Open the word folder and open the
> document.xml file (I assume this is where all the content is?)4. Ignore the
> content contained in "<...>" and concatenate each remaining fragment making
> sure to separate fragments with a space. 5. Split the single concatenated
> string into words, then map each word to it's desired form.6. Go back
> through the original document.xml and change each word to it's mapped
> value.7. Change the file extension from .zip to .docx8. Finished
> To reiterate, I am yet to try this approach as it's merely an idea.
> Hopefully someone can set me in the right direction. It's also note
> important in this case to separate titles from paragraphs if that makes
> things easier.
> Thanks

Reply via email to