Thanks for the reply, I understand what you are after a little better now.

As far as I am aware, formatting information is not exposed by the Paragraph
class but by the CharacterRun - org.apache.poi.hwpf.usermodel.CharacterRun -
class. By no means am I an expert but I think that as the Word document is
parsed by HWPF, if and when the formatting applied to a piece of text
changes then it - the text - will be encapsulated within an instance of the
CharacterRun class. That class provides methods that allow you to get at the
colour of the text, the name and size of the font used, and so on. To get at
the CharacterRun(s) in the document you would do something like this;

HWPFDocument doc = new HWPFDocument(new FileInputStream(new
File("C:\\temp\\test.doc")));
Range range = doc.getRange();
int numCharRuns = doc.numCharacterRuns();
CharacterRun charRun = null;
for(int i = 0; i < numCharRuns; i++) {
   charRun = doc.getCharacterRun(i);
}

Then once you have the CharacterRun, you should be able to interrogate that
object for lots of information - have a look at the javadoc to see all of
the available methods. After obtaining the info, you ought to be able to use
iText to create the pdf file for you. My only concern is whether working
through the document in this manner will allow you to accurately re-create
it using iText; I guess that only a test will tell us this.

The reason I asked about the nature of the research paper was that I wanted
to get some idea of the sort of characters that are included. Forgive me
please as I am 'mathmatically challenged' and do not know the terms to
describe the sort of operators found in mathmatical expressions, but I
feared that we may be dealing with those - knowing that the research paper
is plain text removes that fear.

Have a run with this and see how it works for you - I hope it may be able to
return some of the characters you were not seeing before. If not, we may
need to look at other options. Should this fail again, is it possible for
you to let me have a copy - assuming there is no proprietary information
contained within it that should not be seen by anyone outside of your
institution - of the sort of document you are working with? That way, I can
experiment with it myself; for example, I have OpenOffice on my PC and
NetBeans configured so that I can create and run applications that use
Universal Network Objects (OpenOffice's API).


nikhil n-2 wrote:
> 
> hii,
> 
> i am new to hwpf.i am working on a project where i am supposed to read a
> research paper in ieee format from a doc file and convert it into a pdf
> file
> in a customized format.
> to do that i need to know the font size variations in the text.i am unable
> to read char's like pi,sigma etc present in equations.
> 
> thank u.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p22953001.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to