Hi everybody

It seems to me that the method getLineSeparator from PDF2XHTML
(package org.apache.tika.parser.pdf) may be improved.

I changed it
from:
    public String getLineSeparator()
    {
        try
        {
            handler.characters("\n");
        } catch(SAXException e) {

        }
        return super.getLineSeparator();
    }


to:
    public String getLineSeparator()
    {
        try
        {
            handler.element("br", "");
        } catch(SAXException e) {

        }
        return super.getLineSeparator();
    }

the resulting html is more pretty.

I hope this post could help someone.

see you,
Giunad.

-- 
If we have learned one thing from the history of invention and discovery,
it is that in the long run - and often in the short one - the most
daring prophecies seem laughably conservative.
Arthur C. Clarke, The Exploration of Space

Reply via email to