so,are there any classes which can retrieve these type of chars from the doc file.sorry for the late reply.
On Thu, Apr 9, 2009 at 12:47 AM, MSB <[email protected]> wrote: > > Yes, I know the sort of think you mean now - when using Word I remember > having the option to open a complicated looking dialog box that allowed me > to insert characters like the copyright and trademark symbols. I would have > expected that if they could be placed into a Word document then they are > encoded somewhere and available to us. My only doubts here surround Words > use of Unicode - if it uses Unicode then everything should be OK. > > Also, I made another discovery tonight whilst playing with some code. If > you > remember my previous post, I got the CharacterRun(s) from the documents > high > level Range object. This does not have to be the case. You can do something > like this; > > > HWPFDocument doc = new HWPFDocument(new FileInputStream(new > File("C;\\temp\\test.doc"))); > Range = doc.getRange(); > int numParagraphs = range.numParagraphs(); > for(int i = 0; i < numParagraphs; i++) { > Paragraph para = range.getParagraph(i); > int numCharRuns = para.numCharacterRuns(); > for(int j = 0; J , numCharRuns; j++) { > CharacterRun charRun = para.getCharacterRun(j); > .......... > } > } > > That would allow you to create new paragraphs ini the pdf file when you > need > to - if I remember correctly, pdf files contain markedup text organised > inot > paragraphs with the /par tag - and build each from the contents of the > character runs. > > > nikhil n-2 wrote: > > > > Thanks a lot sir for all the information.chars that may be present in a > > equation in a research paper are greek letters like pi,sigma,epsilon > > etc.they can be created in a microsoft word document as it provides > > options > > to insert such chars.but my doubt is how can i retrieve those chars from > > the > > doc file by using hwpf.even if i am successfull in retrieving,i should be > > able to write them in a pdf file using itext.once again thank u. > > > > On Wed, Apr 8, 2009 at 9:01 PM, MSB <[email protected]> wrote: > > > >> > >> Thanks for the reply, I understand what you are after a little better > >> now. > >> > >> As far as I am aware, formatting information is not exposed by the > >> Paragraph > >> class but by the CharacterRun - > >> org.apache.poi.hwpf.usermodel.CharacterRun > >> - > >> class. By no means am I an expert but I think that as the Word document > >> is > >> parsed by HWPF, if and when the formatting applied to a piece of text > >> changes then it - the text - will be encapsulated within an instance of > >> the > >> CharacterRun class. That class provides methods that allow you to get at > >> the > >> colour of the text, the name and size of the font used, and so on. To > get > >> at > >> the CharacterRun(s) in the document you would do something like this; > >> > >> HWPFDocument doc = new HWPFDocument(new FileInputStream(new > >> File("C:\\temp\\test.doc"))); > >> Range range = doc.getRange(); > >> int numCharRuns = doc.numCharacterRuns(); > >> CharacterRun charRun = null; > >> for(int i = 0; i < numCharRuns; i++) { > >> charRun = doc.getCharacterRun(i); > >> } > >> > >> Then once you have the CharacterRun, you should be able to interrogate > >> that > >> object for lots of information - have a look at the javadoc to see all > of > >> the available methods. After obtaining the info, you ought to be able to > >> use > >> iText to create the pdf file for you. My only concern is whether working > >> through the document in this manner will allow you to accurately > >> re-create > >> it using iText; I guess that only a test will tell us this. > >> > >> The reason I asked about the nature of the research paper was that I > >> wanted > >> to get some idea of the sort of characters that are included. Forgive me > >> please as I am 'mathmatically challenged' and do not know the terms to > >> describe the sort of operators found in mathmatical expressions, but I > >> feared that we may be dealing with those - knowing that the research > >> paper > >> is plain text removes that fear. > >> > >> Have a run with this and see how it works for you - I hope it may be > able > >> to > >> return some of the characters you were not seeing before. If not, we may > >> need to look at other options. Should this fail again, is it possible > for > >> you to let me have a copy - assuming there is no proprietary information > >> contained within it that should not be seen by anyone outside of your > >> institution - of the sort of document you are working with? That way, I > >> can > >> experiment with it myself; for example, I have OpenOffice on my PC and > >> NetBeans configured so that I can create and run applications that use > >> Universal Network Objects (OpenOffice's API). > >> > >> > >> nikhil n-2 wrote: > >> > > >> > hii, > >> > > >> > i am new to hwpf.i am working on a project where i am supposed to read > >> a > >> > research paper in ieee format from a doc file and convert it into a > pdf > >> > file > >> > in a customized format. > >> > to do that i need to know the font size variations in the text.i am > >> unable > >> > to read char's like pi,sigma etc present in equations. > >> > > >> > thank u. > >> > > >> > > >> > >> -- > >> View this message in context: > >> > http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p22953001.html > >> Sent from the POI - User mailing list archive at Nabble.com. > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > >> > > > > > > -- > View this message in context: > http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p22957496.html > Sent from the POI - User mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
