At the moment i am just targeting the HWPF to read and parse MS word documents with the goal of transforming the document into XML. Ideally i would like to do this line per line from top to bottom in the docuemnt so the resulting xml structure is similar to the original document.
I have come across the StyleDescription class which can give certain style information such as headings and normal text, and ListEntry aListEntry = (ListEntry) p; which works if the text is bullet points but throws an error if the text is not bulleted, i can see the code getting quite messy if i have to try and catch an error when ever im testing for tables, lists etc. I wonder is there a way to test paragraphs if they are of type list and is opening or closing the list or something like that. As for developing HWPF, id be very grateful if i could develop those features :) Best Mark MSB wrote: > > Not easilly, no. By this, I mean that there is no method you can call to > say, for example, print out all of the information aboout this section of > the document. > > But, you can get at detailed information by digging around a little in the > various methods but a lot does depend on exactly how you want to process > the document. It is possible for example to get at all of the tables in > the document or all of the pictures but these method calls remove some of > the context; you cannot tell what comes before or after the picture/table > for example. If you have a good search through the posts in the list, you > will be able to find some code we put together that allows you to get at > the tables - just for an example - as they occur in the document; it is > simply a matter of asking whether the Pagagraph object appeared in a table > cell or not. > > If you can be more precise about exactly what information you want > printing out about each different type of object then it may be possible > to give you a better answer. Further, it is important to know which type > of file you are targeting - binary (.doc) or OpenXML (.docx) - as HWPF and > XWPF have different capabilities. Finally, you do need to be aware that > HWPF in particular is still a very immature API that is in need of a lot > of development; if you would be willing to undertake that work and develop > those areas that you require, I am certain that there will be a lot of > grateful users. > > Yours > > Mark B > > > markl16 wrote: >> >> Hi everyone, >> >> Im just researching Apache POI at the moment. I have done some simple >> Java programs, reading in a Word Document and printing out the text etc. >> >> Im just wondering is it possible to get style information based on each >> paragraph in the word document such as POI printing out if the paragraph >> is a Title header, or a list of bullet points, or an image, table etc. I >> have come accross range.getgetCharacterRun() which can provide some info >> such as font type but im looking for more deatiled information as >> mentioned above. >> >> Any feedback appreciated. >> >> Best >> Mark >> >> >> > > -- View this message in context: http://old.nabble.com/Extract-Text-with-style-type-information-tp27209960p27222890.html Sent from the POI - User mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
