Thanks a lot sir for all the information.chars that may be present in a equation in a research paper are greek letters like pi,sigma,epsilon etc.they can be created in a microsoft word document as it provides options to insert such chars.but my doubt is how can i retrieve those chars from the doc file by using hwpf.even if i am successfull in retrieving,i should be able to write them in a pdf file using itext.once again thank u.
On Wed, Apr 8, 2009 at 9:01 PM, MSB <[email protected]> wrote: > > Thanks for the reply, I understand what you are after a little better now. > > As far as I am aware, formatting information is not exposed by the > Paragraph > class but by the CharacterRun - org.apache.poi.hwpf.usermodel.CharacterRun > - > class. By no means am I an expert but I think that as the Word document is > parsed by HWPF, if and when the formatting applied to a piece of text > changes then it - the text - will be encapsulated within an instance of the > CharacterRun class. That class provides methods that allow you to get at > the > colour of the text, the name and size of the font used, and so on. To get > at > the CharacterRun(s) in the document you would do something like this; > > HWPFDocument doc = new HWPFDocument(new FileInputStream(new > File("C:\\temp\\test.doc"))); > Range range = doc.getRange(); > int numCharRuns = doc.numCharacterRuns(); > CharacterRun charRun = null; > for(int i = 0; i < numCharRuns; i++) { > charRun = doc.getCharacterRun(i); > } > > Then once you have the CharacterRun, you should be able to interrogate that > object for lots of information - have a look at the javadoc to see all of > the available methods. After obtaining the info, you ought to be able to > use > iText to create the pdf file for you. My only concern is whether working > through the document in this manner will allow you to accurately re-create > it using iText; I guess that only a test will tell us this. > > The reason I asked about the nature of the research paper was that I wanted > to get some idea of the sort of characters that are included. Forgive me > please as I am 'mathmatically challenged' and do not know the terms to > describe the sort of operators found in mathmatical expressions, but I > feared that we may be dealing with those - knowing that the research paper > is plain text removes that fear. > > Have a run with this and see how it works for you - I hope it may be able > to > return some of the characters you were not seeing before. If not, we may > need to look at other options. Should this fail again, is it possible for > you to let me have a copy - assuming there is no proprietary information > contained within it that should not be seen by anyone outside of your > institution - of the sort of document you are working with? That way, I can > experiment with it myself; for example, I have OpenOffice on my PC and > NetBeans configured so that I can create and run applications that use > Universal Network Objects (OpenOffice's API). > > > nikhil n-2 wrote: > > > > hii, > > > > i am new to hwpf.i am working on a project where i am supposed to read a > > research paper in ieee format from a doc file and convert it into a pdf > > file > > in a customized format. > > to do that i need to know the font size variations in the text.i am > unable > > to read char's like pi,sigma etc present in equations. > > > > thank u. > > > > > > -- > View this message in context: > http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p22953001.html > Sent from the POI - User mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
