Martin Burrow wrote:
> Hi everyone,
> 
>  
> 
> I'm interested in using the POI package in order to extract content from
> a MS Word document.  I've managed to get it do to this, but the
> extracted text is stripped of all style information, just plain text,
> e.g.
> 
>  
> 
> The quick brown fox jumps over the lazy dog.
> 
>  
> 
> What I'm looking to do is also show which text is in bold or italics.
> So for example it would output:
> 
>  
> 
> The [b]quick[/b] brown fox [i]jumps over[/i] the lazy dog.
> 
>  
> 
> Or failing this, can the document be outputted as an XML document that
> also contains style information?
> 

Hello Martin,

i'm currently working on this problem too. But i think POI is currently
not ready for our wishes :(

I've found 2 other solutions for the doc2xml-problem:

1. a python-skript called doc2xml

        * http://pair.mbl.ca/doc2xml/
        * GPL
        * this skript can read word 97, word 2000 and word 2002
        * the xml-output contain all stylsheets!

2. the libwv

        * http://wvware.sourceforge.net/
        * GPL
        * currently i'm writing a java-wrapper (jni) for this library
        * libwv is used in kword and abiword
        
>  
> 
> Is there any way of doing this using the standard POI package?  I
> believe this would definitely be possible using POI/HWPF?  I visited the
> HWPF project page but couldn't see where to download the source code -
> could someone point me in the right direction?
> 

I'm using the svn of POI:
http://svn.apache.org/repos/asf/jakarta/poi/trunk/src
Details on the poi-site (http://jakarta.apache.org/poi/)

Regards
Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

Reply via email to