yeah,we explored itext and found that there aren't any classes to handle
such char's.the user can input his research paper in two ways.one is the way
that i told to you earlier ie uploading a doc file.other way is to copy and
paste every paragraph in the paper in the text boxes provided by us.if there
are any equations then we ask the user to extract the entire equation from
the paper as an image and upload it in the same way he does with any
image.so,if at all i want to write equations onto a pdf file using itext,i
have to upload it as an image.thanks for all the information provided to me.

On Sat, Apr 11, 2009 at 6:44 PM, MSB <[email protected]> wrote:

>
> Now it's getting interesting!!!
>
> HWPF is entirely blamelss in all of this it seems to me. It is recovering
> all of the characters from the Word document as I was able to demonstrate
> when I successfully created an rtf document. The problem comes when trying
> to create the pdf document from the Java String that we get through HWPF.
> It
> seems that iText cannot use such to create pdf files and as yet I do not
> know why. If there is an iText forum - and I think that there is - it may
> be
> worthwhile posting there to see if anyone has any suggestions. In the
> meantime, I am going to have a dig around on the internet to see if there
> are any suggestions.
>
> PS OpenOffice will successfully convert Word documents into pdf files -
> special characters included. That could offer an alternative approach for
> you if the HWPF/iText combination cannot be persuaded to work - though I
> cannot think why it would not.
>
>
>
> nikhil n-2 wrote:
> >
> > so,are there any classes which can retrieve these type of chars from the
> > doc
> > file.sorry for the late reply.
> >
> > On Thu, Apr 9, 2009 at 12:47 AM, MSB <[email protected]> wrote:
> >
> >>
> >> Yes, I know the sort of think you mean now - when using Word I remember
> >> having the option to open a complicated looking dialog box that allowed
> >> me
> >> to insert characters like the copyright and trademark symbols. I would
> >> have
> >> expected that if they could be placed into a Word document then they are
> >> encoded somewhere and available to us. My only doubts here surround
> Words
> >> use of Unicode - if it uses Unicode then everything should be OK.
> >>
> >> Also, I made another discovery tonight whilst playing with some code. If
> >> you
> >> remember my previous post, I got the CharacterRun(s) from the documents
> >> high
> >> level Range object. This does not have to be the case. You can do
> >> something
> >> like this;
> >>
> >>
> >> HWPFDocument doc = new HWPFDocument(new FileInputStream(new
> >> File("C;\\temp\\test.doc")));
> >> Range = doc.getRange();
> >> int numParagraphs = range.numParagraphs();
> >> for(int i = 0; i < numParagraphs; i++) {
> >>   Paragraph para = range.getParagraph(i);
> >>   int numCharRuns = para.numCharacterRuns();
> >>   for(int j = 0; J , numCharRuns; j++) {
> >>      CharacterRun charRun = para.getCharacterRun(j);
> >>      ..........
> >>   }
> >> }
> >>
> >> That would allow you to create new paragraphs ini the pdf file when you
> >> need
> >> to - if I remember correctly, pdf files contain markedup text organised
> >> inot
> >> paragraphs with the /par tag - and build each from the contents of the
> >> character runs.
> >>
> >>
> >> nikhil n-2 wrote:
> >> >
> >> > Thanks a lot sir for all the information.chars that may be present in
> a
> >> > equation in a research paper are greek letters like pi,sigma,epsilon
> >> > etc.they can be created in a microsoft word document as it provides
> >> > options
> >> > to insert such chars.but my doubt is how can i retrieve those chars
> >> from
> >> > the
> >> > doc file by using hwpf.even if i am successfull in retrieving,i should
> >> be
> >> > able to write them in a pdf file using itext.once again thank u.
> >> >
> >> > On Wed, Apr 8, 2009 at 9:01 PM, MSB <[email protected]> wrote:
> >> >
> >> >>
> >> >> Thanks for the reply, I understand what you are after a little better
> >> >> now.
> >> >>
> >> >> As far as I am aware, formatting information is not exposed by the
> >> >> Paragraph
> >> >> class but by the CharacterRun -
> >> >> org.apache.poi.hwpf.usermodel.CharacterRun
> >> >> -
> >> >> class. By no means am I an expert but I think that as the Word
> >> document
> >> >> is
> >> >> parsed by HWPF, if and when the formatting applied to a piece of text
> >> >> changes then it - the text - will be encapsulated within an instance
> >> of
> >> >> the
> >> >> CharacterRun class. That class provides methods that allow you to get
> >> at
> >> >> the
> >> >> colour of the text, the name and size of the font used, and so on. To
> >> get
> >> >> at
> >> >> the CharacterRun(s) in the document you would do something like this;
> >> >>
> >> >> HWPFDocument doc = new HWPFDocument(new FileInputStream(new
> >> >> File("C:\\temp\\test.doc")));
> >> >> Range range = doc.getRange();
> >> >> int numCharRuns = doc.numCharacterRuns();
> >> >> CharacterRun charRun = null;
> >> >> for(int i = 0; i < numCharRuns; i++) {
> >> >>   charRun = doc.getCharacterRun(i);
> >> >> }
> >> >>
> >> >> Then once you have the CharacterRun, you should be able to
> interrogate
> >> >> that
> >> >> object for lots of information - have a look at the javadoc to see
> all
> >> of
> >> >> the available methods. After obtaining the info, you ought to be able
> >> to
> >> >> use
> >> >> iText to create the pdf file for you. My only concern is whether
> >> working
> >> >> through the document in this manner will allow you to accurately
> >> >> re-create
> >> >> it using iText; I guess that only a test will tell us this.
> >> >>
> >> >> The reason I asked about the nature of the research paper was that I
> >> >> wanted
> >> >> to get some idea of the sort of characters that are included. Forgive
> >> me
> >> >> please as I am 'mathmatically challenged' and do not know the terms
> to
> >> >> describe the sort of operators found in mathmatical expressions, but
> I
> >> >> feared that we may be dealing with those - knowing that the research
> >> >> paper
> >> >> is plain text removes that fear.
> >> >>
> >> >> Have a run with this and see how it works for you - I hope it may be
> >> able
> >> >> to
> >> >> return some of the characters you were not seeing before. If not, we
> >> may
> >> >> need to look at other options. Should this fail again, is it possible
> >> for
> >> >> you to let me have a copy - assuming there is no proprietary
> >> information
> >> >> contained within it that should not be seen by anyone outside of your
> >> >> institution - of the sort of document you are working with? That way,
> >> I
> >> >> can
> >> >> experiment with it myself; for example, I have OpenOffice on my PC
> and
> >> >> NetBeans configured so that I can create and run applications that
> use
> >> >> Universal Network Objects (OpenOffice's API).
> >> >>
> >> >>
> >> >> nikhil n-2 wrote:
> >> >> >
> >> >> > hii,
> >> >> >
> >> >> > i am new to hwpf.i am working on a project where i am supposed to
> >> read
> >> >> a
> >> >> > research paper in ieee format from a doc file and convert it into a
> >> pdf
> >> >> > file
> >> >> > in a customized format.
> >> >> > to do that i need to know the font size variations in the text.i am
> >> >> unable
> >> >> > to read char's like pi,sigma etc present in equations.
> >> >> >
> >> >> > thank u.
> >> >> >
> >> >> >
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >>
> >>
> http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p22953001.html
> >> >> Sent from the POI - User mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: [email protected]
> >> >> For additional commands, e-mail: [email protected]
> >> >>
> >> >>
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p22957496.html
> >> Sent from the POI - User mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p23001069.html
> Sent from the POI - User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to