Hi,

I'm trying to convert PDF into XML and I'm using PDFText2HTML class in
tools as inspiration. I noticed that
PDFText2HTML extends PDFTextStripper which extends LegacyPDFStreamEngine.
The comment sections on top of LegacyPDFStreamEngine says something
peculiar:
  * This class exists only so that we don't break the code of users who
have their own subclasses
  * of PDFTextStripper. It replaces the good implementation of showGlyph in
PDFStreamEngine, with
  * a bad implementation which is backwards compatible.

I looked at the comment of the "good implementation" of showGlyph in
PDFStreamEngine and it says there:
  * Called when a glyph is to be processed.This method is intended for
overriding in subclasses,
  * the default implementation does nothing.

PDFStreamEngine.showGlyph is supposed to be the "good implementation" but
it does nothing
and a subclass should override it. How does this make sense?
If LegacyPDFStreamEngine's showGlyph implementation is incorrect but
PDFTextStripper relies on the incorrect behavior of showGlyph, how does
PDFTextStripper compensate for it?
Does it even matter in PDFTextStripper's case since this class just
extracts text it's not rendering anything?

Any clarification would be helpful.
Thank you.

-Justinus

Reply via email to