I answered, asked to have a look at your file (upload to a sharehoster), and mentioned that your config file is suspicious.

Tilman

Am 20.12.2019 um 19:06 schrieb Lu Sun:
Dear PDFBox Dev Team,

Hope this message finds you well.

Just wanted to raise this for your attention. Please can you provide any solutions on the parsing order issue? Attached is my config file, an example of pdf file and my parsing results.

Thanks so much in advance. Wish you and your team a Merry Christmas and Happy New Year.

Regards,
Luke

On Tue, 17 Dec 2019 at 12:34, Tim Allison <talli...@apache.org <mailto:talli...@apache.org>> wrote:

    PDFBox Colleagues,
      Any recommendations?

    On Mon, Dec 16, 2019 at 7:05 AM Lu Sun <vistax...@gmail.com
    <mailto:vistax...@gmail.com>> wrote:

        Dear Tika Dev Team,

        Hope this email finds you well.

        I have been actively using Tika for pdf file reading. One
        issue I found is the parsing order. As shown in attached
        image, the parsing order of pdf file is not  based on position
        of texts.

        As suggested in this github link
        <https://github.com/chrismattmann/tika-python/issues/266>, I
        used a customized config file (see attached), hoping to solve
        the issue. But this has not worked out. If any chance, can you
        please review this issue, and provide any insights or solutions?

        Thanks so much in advance.

        Regards,

        Luke


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


Reply via email to