Yes, I know the sort of think you mean now - when using Word I remember
having the option to open a complicated looking dialog box that allowed me
to insert characters like the copyright and trademark symbols. I would have
expected that if they could be placed into a Word document then they are
encoded somewhere and available to us. My only doubts here surround Words
use of Unicode - if it uses Unicode then everything should be OK.

Also, I made another discovery tonight whilst playing with some code. If you
remember my previous post, I got the CharacterRun(s) from the documents high
level Range object. This does not have to be the case. You can do something
like this;


HWPFDocument doc = new HWPFDocument(new FileInputStream(new
File("C;\\temp\\test.doc")));
Range = doc.getRange();
int numParagraphs = range.numParagraphs();
for(int i = 0; i < numParagraphs; i++) {
   Paragraph para = range.getParagraph(i);
   int numCharRuns = para.numCharacterRuns();
   for(int j = 0; J , numCharRuns; j++) {
      CharacterRun charRun = para.getCharacterRun(j);
      ..........
   }
}

That would allow you to create new paragraphs ini the pdf file when you need
to - if I remember correctly, pdf files contain markedup text organised inot
paragraphs with the /par tag - and build each from the contents of the
character runs.


nikhil n-2 wrote:
> 
> Thanks a lot sir for all the information.chars that may be present in a
> equation in a research paper are greek letters like pi,sigma,epsilon
> etc.they can be created in a microsoft word document as it provides
> options
> to insert such chars.but my doubt is how can i retrieve those chars from
> the
> doc file by using hwpf.even if i am successfull in retrieving,i should be
> able to write them in a pdf file using itext.once again thank u.
> 
> On Wed, Apr 8, 2009 at 9:01 PM, MSB <[email protected]> wrote:
> 
>>
>> Thanks for the reply, I understand what you are after a little better
>> now.
>>
>> As far as I am aware, formatting information is not exposed by the
>> Paragraph
>> class but by the CharacterRun -
>> org.apache.poi.hwpf.usermodel.CharacterRun
>> -
>> class. By no means am I an expert but I think that as the Word document
>> is
>> parsed by HWPF, if and when the formatting applied to a piece of text
>> changes then it - the text - will be encapsulated within an instance of
>> the
>> CharacterRun class. That class provides methods that allow you to get at
>> the
>> colour of the text, the name and size of the font used, and so on. To get
>> at
>> the CharacterRun(s) in the document you would do something like this;
>>
>> HWPFDocument doc = new HWPFDocument(new FileInputStream(new
>> File("C:\\temp\\test.doc")));
>> Range range = doc.getRange();
>> int numCharRuns = doc.numCharacterRuns();
>> CharacterRun charRun = null;
>> for(int i = 0; i < numCharRuns; i++) {
>>   charRun = doc.getCharacterRun(i);
>> }
>>
>> Then once you have the CharacterRun, you should be able to interrogate
>> that
>> object for lots of information - have a look at the javadoc to see all of
>> the available methods. After obtaining the info, you ought to be able to
>> use
>> iText to create the pdf file for you. My only concern is whether working
>> through the document in this manner will allow you to accurately
>> re-create
>> it using iText; I guess that only a test will tell us this.
>>
>> The reason I asked about the nature of the research paper was that I
>> wanted
>> to get some idea of the sort of characters that are included. Forgive me
>> please as I am 'mathmatically challenged' and do not know the terms to
>> describe the sort of operators found in mathmatical expressions, but I
>> feared that we may be dealing with those - knowing that the research
>> paper
>> is plain text removes that fear.
>>
>> Have a run with this and see how it works for you - I hope it may be able
>> to
>> return some of the characters you were not seeing before. If not, we may
>> need to look at other options. Should this fail again, is it possible for
>> you to let me have a copy - assuming there is no proprietary information
>> contained within it that should not be seen by anyone outside of your
>> institution - of the sort of document you are working with? That way, I
>> can
>> experiment with it myself; for example, I have OpenOffice on my PC and
>> NetBeans configured so that I can create and run applications that use
>> Universal Network Objects (OpenOffice's API).
>>
>>
>> nikhil n-2 wrote:
>> >
>> > hii,
>> >
>> > i am new to hwpf.i am working on a project where i am supposed to read
>> a
>> > research paper in ieee format from a doc file and convert it into a pdf
>> > file
>> > in a customized format.
>> > to do that i need to know the font size variations in the text.i am
>> unable
>> > to read char's like pi,sigma etc present in equations.
>> >
>> > thank u.
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p22953001.html
>> Sent from the POI - User mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/font-styles-and-equations-in-word-doc-tp22927872p22957496.html
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to