so,are there any classes which can retrieve these type of chars from the doc
file.sorry for the late reply.

On Thu, Apr 9, 2009 at 12:47 AM, MSB <[email protected]> wrote:

>
> Yes, I know the sort of think you mean now - when using Word I remember
> having the option to open a complicated looking dialog box that allowed me
> to insert characters like the copyright and trademark symbols. I would have
> expected that if they could be placed into a Word document then they are
> encoded somewhere and available to us. My only doubts here surround Words
> use of Unicode - if it uses Unicode then everything should be OK.
>
> Also, I made another discovery tonight whilst playing with some code. If
> you
> remember my previous post, I got the CharacterRun(s) from the documents
> high
> level Range object. This does not have to be the case. You can do something
> like this;
>
>
> HWPFDocument doc = new HWPFDocument(new FileInputStream(new
> File("C;\\temp\\test.doc")));
> Range = doc.getRange();
> int numParagraphs = range.numParagraphs();
> for(int i = 0; i < numParagraphs; i++) {
>   Paragraph para = range.getParagraph(i);
>   int numCharRuns = para.numCharacterRuns();
>   for(int j = 0; J , numCharRuns; j++) {
>      CharacterRun charRun = para.getCharacterRun(j);
>      ..........
>   }
> }
>
> That would allow you to create new paragraphs ini the pdf file when you
> need
> to - if I remember correctly, pdf files contain markedup text organised
> inot
> paragraphs with the /par tag - and build each from the contents of the
> character runs.
>
>
> nikhil n-2 wrote:
> >
> > Thanks a lot sir for all the information.chars that may be present in a
> > equation in a research paper are greek letters like pi,sigma,epsilon
> > etc.they can be created in a microsoft word document as it provides
> > options
> > to insert such chars.but my doubt is how can i retrieve those chars from
> > the
> > doc file by using hwpf.even if i am successfull in retrieving,i should be
> > able to write them in a pdf file using itext.once again thank u.
> >
> > On Wed, Apr 8, 2009 at 9:01 PM, MSB <[email protected]> wrote:
> >
> >>
> >> Thanks for the reply, I understand what you are after a little better
> >> now.
> >>
> >> As far as I am aware, formatting information is not exposed by the
> >> Paragraph
> >> class but by the CharacterRun -
> >> org.apache.poi.hwpf.usermodel.CharacterRun
> >> -
> >> class. By no means am I an expert but I think that as the Word document
> >> is
> >> parsed by HWPF, if and when the formatting applied to a piece of text
> >> changes then it - the text - will be encapsulated within an instance of
> >> the
> >> CharacterRun class. That class provides methods that allow you to get at
> >> the
> >> colour of the text, the name and size of the font used, and so on. To
> get
> >> at
> >> the CharacterRun(s) in the document you would do something like this;
> >>
> >> HWPFDocument doc = new HWPFDocument(new FileInputStream(new
> >> File("C:\\temp\\test.doc")));
> >> Range range = doc.getRange();
> >> int numCharRuns = doc.numCharacterRuns();
> >> CharacterRun charRun = null;
> >> for(int i = 0; i < numCharRuns; i++) {
> >>   charRun = doc.getCharacterRun(i);
> >> }
> >>
> >> Then once you have the CharacterRun, you should be able to interrogate
> >> that
> >> object for lots of information - have a look at the javadoc to see all
> of
> >> the available methods. After obtaining the info, you ought to be able to
> >> use
> >> iText to create the pdf file for you. My only concern is whether working
> >> through the document in this manner will allow you to accurately
> >> re-create
> >> it using iText; I guess that only a test will tell us this.
> >>
> >> The reason I asked about the nature of the research paper was that I
> >> wanted
> >> to get some idea of the sort of characters that are included. Forgive me
> >> please as I am 'mathmatically challenged' and do not know the terms to
> >> describe the sort of operators found in mathmatical expressions, but I
> >> feared that we may be dealing with those - knowing that the research
> >> paper
> >> is plain text removes that fear.
> >>
> >> Have a run with this and see how it works for you - I hope it may be
> able
> >> to
> >> return some of the characters you were not seeing before. If not, we may
> >> need to look at other options. Should this fail again, is it possible
> for
> >> you to let me have a copy - assuming there is no proprietary information
> >> contained within it that should not be seen by anyone outside of your
> >> institution - of the sort of document you are working with? That way, I
> >> can
> >> experiment with it myself; for example, I have OpenOffice on my PC and
> >> NetBeans configured so that I can create and run applications that use
> >> Universal Network Objects (OpenOffice's API).
> >>
> >>
> >> nikhil n-2 wrote:
> >> >
> >> > hii,
> >> >
> >> > i am new to hwpf.i am working on a project where i am supposed to read
> >> a
> >> > research paper in ieee format from a doc file and convert it into a
> pdf
> >> > file
> >> > in a customized format.
> >> > to do that i need to know the font size variations in the text.i am
> >> unable
> >> > to read char's like pi,sigma etc present in equations.
> >> >
> >> > thank u.
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p22953001.html
> >> Sent from the POI - User mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p22957496.html
> Sent from the POI - User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to