hello sir,

with the help of the piece of code given by you,i have been working on a doc
file to get the font size etc.

HWPFDocument doc = new HWPFDocument(new FileInputStream(new
File("C:\\temp\\test.doc")));
Range range = doc.getRange();
int numCharRuns = doc.numCharacterRuns();
CharacterRun charRun = null;
for(int i = 0; i < numCharRuns; i++) {
  charRun = doc.getCharacterRun(i);
}

as i parse through the doc file and store each paragraph in an array of
strings,i would like to know the font size of each paragraph,so that i will
be able  to know whether it is a heading or some text under a heading
etc.with the help of the methods present in the api,i was able to know the
different font size's present in the doc file,but unable to associate them
with the text.in the piece of code given by you,i was unable to find the
method numCharacterRuns(),so please mention the class name in which it is
present.

thank you.

On Mon, Apr 13, 2009 at 12:26 PM, MSB <[email protected]> wrote:

>
> I must admit that I always thought images would be the best - maybe the
> only
> way - to deal with complex mathmatical/scientific formulae and that is why
> I
> asked what sort of formulae you were dealing with and whether Word had been
> modified to include add-ons such as Rapid-Pi, MS Equation Editor or
> something similar. I harboured doubts about HWPF's ability to handle a
> files
> produced by a modified version of Word - even though it could extract the
> text and any images from an OLE2CDF file for you - as I could not see how
> they would have been modified to accomodate formulae. I am guessing the
> formula add-ons produce images that can be inserted into a Word document
> and
> which they - the add-ons - can retrieve and edit.
>
> If you are dealing with simple symbols - by this I mean symbols that occupy
> just a single line as does Pi for example - then it seems to be a question
> of Font with iText. For example, I found the following two lines of code;
>
> bfArial = BaseFont.createFont("c:\\windows\\fonts\\times.ttf",
>     BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
> font = new com.lowagie.text.Font(bfArial, 20);
>
> gave me a Font that would happilly render the pi r squared formula to a pdf
> document.
>
> Whereas, I could not find the correct encoding to make this line of code do
> the same;
>
> font = FontFactory.getFont(FontFactory.TIMES_ROMAN , "UTF-8",
>     BaseFont.EMBEDDED);
>
> Thankfully, your tutor has offered you the flexibility to examine other
> approaches to solving the problem. All the best.
>
>
> nikhil n-2 wrote:
> >
> > yeah,we explored itext and found that there aren't any classes to handle
> > such char's.the user can input his research paper in two ways.one is the
> > way
> > that i told to you earlier ie uploading a doc file.other way is to copy
> > and
> > paste every paragraph in the paper in the text boxes provided by us.if
> > there
> > are any equations then we ask the user to extract the entire equation
> from
> > the paper as an image and upload it in the same way he does with any
> > image.so,if at all i want to write equations onto a pdf file using
> itext,i
> > have to upload it as an image.thanks for all the information provided to
> > me.
> >
> > On Sat, Apr 11, 2009 at 6:44 PM, MSB <[email protected]> wrote:
> >
> >>
> >> Now it's getting interesting!!!
> >>
> >> HWPF is entirely blamelss in all of this it seems to me. It is
> recovering
> >> all of the characters from the Word document as I was able to
> demonstrate
> >> when I successfully created an rtf document. The problem comes when
> >> trying
> >> to create the pdf document from the Java String that we get through
> HWPF.
> >> It
> >> seems that iText cannot use such to create pdf files and as yet I do not
> >> know why. If there is an iText forum - and I think that there is - it
> may
> >> be
> >> worthwhile posting there to see if anyone has any suggestions. In the
> >> meantime, I am going to have a dig around on the internet to see if
> there
> >> are any suggestions.
> >>
> >> PS OpenOffice will successfully convert Word documents into pdf files -
> >> special characters included. That could offer an alternative approach
> for
> >> you if the HWPF/iText combination cannot be persuaded to work - though I
> >> cannot think why it would not.
> >>
> >>
> >>
> >> nikhil n-2 wrote:
> >> >
> >> > so,are there any classes which can retrieve these type of chars from
> >> the
> >> > doc
> >> > file.sorry for the late reply.
> >> >
> >> > On Thu, Apr 9, 2009 at 12:47 AM, MSB <[email protected]>
> wrote:
> >> >
> >> >>
> >> >> Yes, I know the sort of think you mean now - when using Word I
> >> remember
> >> >> having the option to open a complicated looking dialog box that
> >> allowed
> >> >> me
> >> >> to insert characters like the copyright and trademark symbols. I
> would
> >> >> have
> >> >> expected that if they could be placed into a Word document then they
> >> are
> >> >> encoded somewhere and available to us. My only doubts here surround
> >> Words
> >> >> use of Unicode - if it uses Unicode then everything should be OK.
> >> >>
> >> >> Also, I made another discovery tonight whilst playing with some code.
> >> If
> >> >> you
> >> >> remember my previous post, I got the CharacterRun(s) from the
> >> documents
> >> >> high
> >> >> level Range object. This does not have to be the case. You can do
> >> >> something
> >> >> like this;
> >> >>
> >> >>
> >> >> HWPFDocument doc = new HWPFDocument(new FileInputStream(new
> >> >> File("C;\\temp\\test.doc")));
> >> >> Range = doc.getRange();
> >> >> int numParagraphs = range.numParagraphs();
> >> >> for(int i = 0; i < numParagraphs; i++) {
> >> >>   Paragraph para = range.getParagraph(i);
> >> >>   int numCharRuns = para.numCharacterRuns();
> >> >>   for(int j = 0; J , numCharRuns; j++) {
> >> >>      CharacterRun charRun = para.getCharacterRun(j);
> >> >>      ..........
> >> >>   }
> >> >> }
> >> >>
> >> >> That would allow you to create new paragraphs ini the pdf file when
> >> you
> >> >> need
> >> >> to - if I remember correctly, pdf files contain markedup text
> >> organised
> >> >> inot
> >> >> paragraphs with the /par tag - and build each from the contents of
> the
> >> >> character runs.
> >> >>
> >> >>
> >> >> nikhil n-2 wrote:
> >> >> >
> >> >> > Thanks a lot sir for all the information.chars that may be present
> >> in
> >> a
> >> >> > equation in a research paper are greek letters like
> pi,sigma,epsilon
> >> >> > etc.they can be created in a microsoft word document as it provides
> >> >> > options
> >> >> > to insert such chars.but my doubt is how can i retrieve those chars
> >> >> from
> >> >> > the
> >> >> > doc file by using hwpf.even if i am successfull in retrieving,i
> >> should
> >> >> be
> >> >> > able to write them in a pdf file using itext.once again thank u.
> >> >> >
> >> >> > On Wed, Apr 8, 2009 at 9:01 PM, MSB <[email protected]>
> >> wrote:
> >> >> >
> >> >> >>
> >> >> >> Thanks for the reply, I understand what you are after a little
> >> better
> >> >> >> now.
> >> >> >>
> >> >> >> As far as I am aware, formatting information is not exposed by the
> >> >> >> Paragraph
> >> >> >> class but by the CharacterRun -
> >> >> >> org.apache.poi.hwpf.usermodel.CharacterRun
> >> >> >> -
> >> >> >> class. By no means am I an expert but I think that as the Word
> >> >> document
> >> >> >> is
> >> >> >> parsed by HWPF, if and when the formatting applied to a piece of
> >> text
> >> >> >> changes then it - the text - will be encapsulated within an
> >> instance
> >> >> of
> >> >> >> the
> >> >> >> CharacterRun class. That class provides methods that allow you to
> >> get
> >> >> at
> >> >> >> the
> >> >> >> colour of the text, the name and size of the font used, and so on.
> >> To
> >> >> get
> >> >> >> at
> >> >> >> the CharacterRun(s) in the document you would do something like
> >> this;
> >> >> >>
> >> >> >> HWPFDocument doc = new HWPFDocument(new FileInputStream(new
> >> >> >> File("C:\\temp\\test.doc")));
> >> >> >> Range range = doc.getRange();
> >> >> >> int numCharRuns = doc.numCharacterRuns();
> >> >> >> CharacterRun charRun = null;
> >> >> >> for(int i = 0; i < numCharRuns; i++) {
> >> >> >>   charRun = doc.getCharacterRun(i);
> >> >> >> }
> >> >> >>
> >> >> >> Then once you have the CharacterRun, you should be able to
> >> interrogate
> >> >> >> that
> >> >> >> object for lots of information - have a look at the javadoc to see
> >> all
> >> >> of
> >> >> >> the available methods. After obtaining the info, you ought to be
> >> able
> >> >> to
> >> >> >> use
> >> >> >> iText to create the pdf file for you. My only concern is whether
> >> >> working
> >> >> >> through the document in this manner will allow you to accurately
> >> >> >> re-create
> >> >> >> it using iText; I guess that only a test will tell us this.
> >> >> >>
> >> >> >> The reason I asked about the nature of the research paper was that
> >> I
> >> >> >> wanted
> >> >> >> to get some idea of the sort of characters that are included.
> >> Forgive
> >> >> me
> >> >> >> please as I am 'mathmatically challenged' and do not know the
> terms
> >> to
> >> >> >> describe the sort of operators found in mathmatical expressions,
> >> but
> >> I
> >> >> >> feared that we may be dealing with those - knowing that the
> >> research
> >> >> >> paper
> >> >> >> is plain text removes that fear.
> >> >> >>
> >> >> >> Have a run with this and see how it works for you - I hope it may
> >> be
> >> >> able
> >> >> >> to
> >> >> >> return some of the characters you were not seeing before. If not,
> >> we
> >> >> may
> >> >> >> need to look at other options. Should this fail again, is it
> >> possible
> >> >> for
> >> >> >> you to let me have a copy - assuming there is no proprietary
> >> >> information
> >> >> >> contained within it that should not be seen by anyone outside of
> >> your
> >> >> >> institution - of the sort of document you are working with? That
> >> way,
> >> >> I
> >> >> >> can
> >> >> >> experiment with it myself; for example, I have OpenOffice on my PC
> >> and
> >> >> >> NetBeans configured so that I can create and run applications that
> >> use
> >> >> >> Universal Network Objects (OpenOffice's API).
> >> >> >>
> >> >> >>
> >> >> >> nikhil n-2 wrote:
> >> >> >> >
> >> >> >> > hii,
> >> >> >> >
> >> >> >> > i am new to hwpf.i am working on a project where i am supposed
> to
> >> >> read
> >> >> >> a
> >> >> >> > research paper in ieee format from a doc file and convert it
> into
> >> a
> >> >> pdf
> >> >> >> > file
> >> >> >> > in a customized format.
> >> >> >> > to do that i need to know the font size variations in the text.i
> >> am
> >> >> >> unable
> >> >> >> > to read char's like pi,sigma etc present in equations.
> >> >> >> >
> >> >> >> > thank u.
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >> >> --
> >> >> >> View this message in context:
> >> >> >>
> >> >>
> >>
> http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p22953001.html
> >> >> >> Sent from the POI - User mailing list archive at Nabble.com.
> >> >> >>
> >> >> >>
> >> >> >>
> >> ---------------------------------------------------------------------
> >> >> >> To unsubscribe, e-mail: [email protected]
> >> >> >> For additional commands, e-mail: [email protected]
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >>
> >>
> http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p22957496.html
> >> >> Sent from the POI - User mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: [email protected]
> >> >> For additional commands, e-mail: [email protected]
> >> >>
> >> >>
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p23001069.html
> >> Sent from the POI - User mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p23018668.html
> Sent from the POI - User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to