sorry sir,i overlooked the other mail which u sent me in which the code

HWPFDocument doc = new HWPFDocument(new FileInputStream(new
File("C;\\temp\\test.doc")));
Range = doc.getRange();
int numParagraphs = range.numParagraphs();
for(int i = 0; i < numParagraphs; i++) {
  Paragraph para = range.getParagraph(i);
  int numCharRuns = para.numCharacterRuns();
  for(int j = 0; J , numCharRuns; j++) {
     CharacterRun charRun = para.getCharacterRun(j);
     ..........
  }
}

is given,using which i was able to find the font size of each paragraph.half
of my job is done.

thank you.

On Tue, Apr 14, 2009 at 11:24 PM, nikhil n <[email protected]> wrote:

> hello sir,
>
> with the help of the piece of code given by you,i have been working on a
> doc file to get the font size etc.
>
> HWPFDocument doc = new HWPFDocument(new FileInputStream(new
> File("C:\\temp\\test.doc")));
> Range range = doc.getRange();
> int numCharRuns = doc.numCharacterRuns();
> CharacterRun charRun = null;
> for(int i = 0; i < numCharRuns; i++) {
>   charRun = doc.getCharacterRun(i);
> }
>
> as i parse through the doc file and store each paragraph in an array of
> strings,i would like to know the font size of each paragraph,so that i will
> be able  to know whether it is a heading or some text under a heading
> etc.with the help of the methods present in the api,i was able to know the
> different font size's present in the doc file,but unable to associate them
> with the text.in the piece of code given by you,i was unable to find the
> method numCharacterRuns(),so please mention the class name in which it is
> present.
>
> thank you.
>
>
> On Mon, Apr 13, 2009 at 12:26 PM, MSB <[email protected]> wrote:
>
>>
>> I must admit that I always thought images would be the best - maybe the
>> only
>> way - to deal with complex mathmatical/scientific formulae and that is why
>> I
>> asked what sort of formulae you were dealing with and whether Word had
>> been
>> modified to include add-ons such as Rapid-Pi, MS Equation Editor or
>> something similar. I harboured doubts about HWPF's ability to handle a
>> files
>> produced by a modified version of Word - even though it could extract the
>> text and any images from an OLE2CDF file for you - as I could not see how
>> they would have been modified to accomodate formulae. I am guessing the
>> formula add-ons produce images that can be inserted into a Word document
>> and
>> which they - the add-ons - can retrieve and edit.
>>
>> If you are dealing with simple symbols - by this I mean symbols that
>> occupy
>> just a single line as does Pi for example - then it seems to be a question
>> of Font with iText. For example, I found the following two lines of code;
>>
>> bfArial = BaseFont.createFont("c:\\windows\\fonts\\times.ttf",
>>     BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
>> font = new com.lowagie.text.Font(bfArial, 20);
>>
>> gave me a Font that would happilly render the pi r squared formula to a
>> pdf
>> document.
>>
>> Whereas, I could not find the correct encoding to make this line of code
>> do
>> the same;
>>
>> font = FontFactory.getFont(FontFactory.TIMES_ROMAN , "UTF-8",
>>     BaseFont.EMBEDDED);
>>
>> Thankfully, your tutor has offered you the flexibility to examine other
>> approaches to solving the problem. All the best.
>>
>>
>> nikhil n-2 wrote:
>> >
>> > yeah,we explored itext and found that there aren't any classes to handle
>> > such char's.the user can input his research paper in two ways.one is the
>> > way
>> > that i told to you earlier ie uploading a doc file.other way is to copy
>> > and
>> > paste every paragraph in the paper in the text boxes provided by us.if
>> > there
>> > are any equations then we ask the user to extract the entire equation
>> from
>> > the paper as an image and upload it in the same way he does with any
>> > image.so,if at all i want to write equations onto a pdf file using
>> itext,i
>> > have to upload it as an image.thanks for all the information provided to
>> > me.
>> >
>> > On Sat, Apr 11, 2009 at 6:44 PM, MSB <[email protected]> wrote:
>> >
>> >>
>> >> Now it's getting interesting!!!
>> >>
>> >> HWPF is entirely blamelss in all of this it seems to me. It is
>> recovering
>> >> all of the characters from the Word document as I was able to
>> demonstrate
>> >> when I successfully created an rtf document. The problem comes when
>> >> trying
>> >> to create the pdf document from the Java String that we get through
>> HWPF.
>> >> It
>> >> seems that iText cannot use such to create pdf files and as yet I do
>> not
>> >> know why. If there is an iText forum - and I think that there is - it
>> may
>> >> be
>> >> worthwhile posting there to see if anyone has any suggestions. In the
>> >> meantime, I am going to have a dig around on the internet to see if
>> there
>> >> are any suggestions.
>> >>
>> >> PS OpenOffice will successfully convert Word documents into pdf files -
>> >> special characters included. That could offer an alternative approach
>> for
>> >> you if the HWPF/iText combination cannot be persuaded to work - though
>> I
>> >> cannot think why it would not.
>> >>
>> >>
>> >>
>> >> nikhil n-2 wrote:
>> >> >
>> >> > so,are there any classes which can retrieve these type of chars from
>> >> the
>> >> > doc
>> >> > file.sorry for the late reply.
>> >> >
>> >> > On Thu, Apr 9, 2009 at 12:47 AM, MSB <[email protected]>
>> wrote:
>> >> >
>> >> >>
>> >> >> Yes, I know the sort of think you mean now - when using Word I
>> >> remember
>> >> >> having the option to open a complicated looking dialog box that
>> >> allowed
>> >> >> me
>> >> >> to insert characters like the copyright and trademark symbols. I
>> would
>> >> >> have
>> >> >> expected that if they could be placed into a Word document then they
>> >> are
>> >> >> encoded somewhere and available to us. My only doubts here surround
>> >> Words
>> >> >> use of Unicode - if it uses Unicode then everything should be OK.
>> >> >>
>> >> >> Also, I made another discovery tonight whilst playing with some
>> code.
>> >> If
>> >> >> you
>> >> >> remember my previous post, I got the CharacterRun(s) from the
>> >> documents
>> >> >> high
>> >> >> level Range object. This does not have to be the case. You can do
>> >> >> something
>> >> >> like this;
>> >> >>
>> >> >>
>> >> >> HWPFDocument doc = new HWPFDocument(new FileInputStream(new
>> >> >> File("C;\\temp\\test.doc")));
>> >> >> Range = doc.getRange();
>> >> >> int numParagraphs = range.numParagraphs();
>> >> >> for(int i = 0; i < numParagraphs; i++) {
>> >> >>   Paragraph para = range.getParagraph(i);
>> >> >>   int numCharRuns = para.numCharacterRuns();
>> >> >>   for(int j = 0; J , numCharRuns; j++) {
>> >> >>      CharacterRun charRun = para.getCharacterRun(j);
>> >> >>      ..........
>> >> >>   }
>> >> >> }
>> >> >>
>> >> >> That would allow you to create new paragraphs ini the pdf file when
>> >> you
>> >> >> need
>> >> >> to - if I remember correctly, pdf files contain markedup text
>> >> organised
>> >> >> inot
>> >> >> paragraphs with the /par tag - and build each from the contents of
>> the
>> >> >> character runs.
>> >> >>
>> >> >>
>> >> >> nikhil n-2 wrote:
>> >> >> >
>> >> >> > Thanks a lot sir for all the information.chars that may be present
>> >> in
>> >> a
>> >> >> > equation in a research paper are greek letters like
>> pi,sigma,epsilon
>> >> >> > etc.they can be created in a microsoft word document as it
>> provides
>> >> >> > options
>> >> >> > to insert such chars.but my doubt is how can i retrieve those
>> chars
>> >> >> from
>> >> >> > the
>> >> >> > doc file by using hwpf.even if i am successfull in retrieving,i
>> >> should
>> >> >> be
>> >> >> > able to write them in a pdf file using itext.once again thank u.
>> >> >> >
>> >> >> > On Wed, Apr 8, 2009 at 9:01 PM, MSB <[email protected]>
>> >> wrote:
>> >> >> >
>> >> >> >>
>> >> >> >> Thanks for the reply, I understand what you are after a little
>> >> better
>> >> >> >> now.
>> >> >> >>
>> >> >> >> As far as I am aware, formatting information is not exposed by
>> the
>> >> >> >> Paragraph
>> >> >> >> class but by the CharacterRun -
>> >> >> >> org.apache.poi.hwpf.usermodel.CharacterRun
>> >> >> >> -
>> >> >> >> class. By no means am I an expert but I think that as the Word
>> >> >> document
>> >> >> >> is
>> >> >> >> parsed by HWPF, if and when the formatting applied to a piece of
>> >> text
>> >> >> >> changes then it - the text - will be encapsulated within an
>> >> instance
>> >> >> of
>> >> >> >> the
>> >> >> >> CharacterRun class. That class provides methods that allow you to
>> >> get
>> >> >> at
>> >> >> >> the
>> >> >> >> colour of the text, the name and size of the font used, and so
>> on.
>> >> To
>> >> >> get
>> >> >> >> at
>> >> >> >> the CharacterRun(s) in the document you would do something like
>> >> this;
>> >> >> >>
>> >> >> >> HWPFDocument doc = new HWPFDocument(new FileInputStream(new
>> >> >> >> File("C:\\temp\\test.doc")));
>> >> >> >> Range range = doc.getRange();
>> >> >> >> int numCharRuns = doc.numCharacterRuns();
>> >> >> >> CharacterRun charRun = null;
>> >> >> >> for(int i = 0; i < numCharRuns; i++) {
>> >> >> >>   charRun = doc.getCharacterRun(i);
>> >> >> >> }
>> >> >> >>
>> >> >> >> Then once you have the CharacterRun, you should be able to
>> >> interrogate
>> >> >> >> that
>> >> >> >> object for lots of information - have a look at the javadoc to
>> see
>> >> all
>> >> >> of
>> >> >> >> the available methods. After obtaining the info, you ought to be
>> >> able
>> >> >> to
>> >> >> >> use
>> >> >> >> iText to create the pdf file for you. My only concern is whether
>> >> >> working
>> >> >> >> through the document in this manner will allow you to accurately
>> >> >> >> re-create
>> >> >> >> it using iText; I guess that only a test will tell us this.
>> >> >> >>
>> >> >> >> The reason I asked about the nature of the research paper was
>> that
>> >> I
>> >> >> >> wanted
>> >> >> >> to get some idea of the sort of characters that are included.
>> >> Forgive
>> >> >> me
>> >> >> >> please as I am 'mathmatically challenged' and do not know the
>> terms
>> >> to
>> >> >> >> describe the sort of operators found in mathmatical expressions,
>> >> but
>> >> I
>> >> >> >> feared that we may be dealing with those - knowing that the
>> >> research
>> >> >> >> paper
>> >> >> >> is plain text removes that fear.
>> >> >> >>
>> >> >> >> Have a run with this and see how it works for you - I hope it may
>> >> be
>> >> >> able
>> >> >> >> to
>> >> >> >> return some of the characters you were not seeing before. If not,
>> >> we
>> >> >> may
>> >> >> >> need to look at other options. Should this fail again, is it
>> >> possible
>> >> >> for
>> >> >> >> you to let me have a copy - assuming there is no proprietary
>> >> >> information
>> >> >> >> contained within it that should not be seen by anyone outside of
>> >> your
>> >> >> >> institution - of the sort of document you are working with? That
>> >> way,
>> >> >> I
>> >> >> >> can
>> >> >> >> experiment with it myself; for example, I have OpenOffice on my
>> PC
>> >> and
>> >> >> >> NetBeans configured so that I can create and run applications
>> that
>> >> use
>> >> >> >> Universal Network Objects (OpenOffice's API).
>> >> >> >>
>> >> >> >>
>> >> >> >> nikhil n-2 wrote:
>> >> >> >> >
>> >> >> >> > hii,
>> >> >> >> >
>> >> >> >> > i am new to hwpf.i am working on a project where i am supposed
>> to
>> >> >> read
>> >> >> >> a
>> >> >> >> > research paper in ieee format from a doc file and convert it
>> into
>> >> a
>> >> >> pdf
>> >> >> >> > file
>> >> >> >> > in a customized format.
>> >> >> >> > to do that i need to know the font size variations in the
>> text.i
>> >> am
>> >> >> >> unable
>> >> >> >> > to read char's like pi,sigma etc present in equations.
>> >> >> >> >
>> >> >> >> > thank u.
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >> >> --
>> >> >> >> View this message in context:
>> >> >> >>
>> >> >>
>> >>
>> http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p22953001.html
>> >> >> >> Sent from the POI - User mailing list archive at Nabble.com.
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> ---------------------------------------------------------------------
>> >> >> >> To unsubscribe, e-mail: [email protected]
>> >> >> >> For additional commands, e-mail: [email protected]
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >>
>> >> >> --
>> >> >> View this message in context:
>> >> >>
>> >>
>> http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p22957496.html
>> >> >> Sent from the POI - User mailing list archive at Nabble.com.
>> >> >>
>> >> >>
>> >> >>
>> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: [email protected]
>> >> >> For additional commands, e-mail: [email protected]
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p23001069.html
>> >> Sent from the POI - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p23018668.html
>> Sent from the POI - User mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

Reply via email to