Thanks Josh, I was actually researching quite heavily, and found
myself on the #ghostscript channel @ freenode

They pointed me to MuPDF (one of there projects), and it seems like
the "pdfdraw" example project is something to work from, either
directly; or through parsing XML output from it.

However, if this doesn't suit your needs, please tell me why, as I
might have the same problem, and then I'll join forces! :]

On Wed, Oct 12, 2011 at 3:44 AM, Josh Richardson <[email protected]> wrote:
> Thanks for the pointer, Glad.
>
> FYI, I am also interested in being able to analyze document structure.
> Our first step is to put the text back together, since in many PDFs, it is
> not logically organized in the original PDF.  pdf2html has a "coalesce"
> function which is the starting point for us.  We have made some
> improvements on it which are not yet contributed back -- so let me know if
> you want the source and/or if you want to join forces.
>
> --josh
>
> On 10/11/11 12:31 AM, "Glad Deschrijver" <[email protected]>
> wrote:
>
>>On Tuesday 11 October 2011, Alec Taylor wrote:
>>> Good afternoon,
>>>
>>> Do you have some recommends and/or sample code for comparing textual
>>> and geometric layout information across pages?
>>>
>>> Basically I'm trying to realise patterns within documents, e.g., page
>>> numbers, header and footers, title, column information &etc; using the
>>> capabilities of the Poppler PDF library.
>>
>>Not sure that it will help you much, but you can have a look at DiffPDF
>>which
>>uses poppler to compare two PDF files page by page (both textually and
>>visually):
>>http://www.qtrac.eu/diffpdf.html
>>
>>Best regards,
>>Glad
>>
>>--
>> Everything that is really great and inspiring is created by
>> the individual who can labor in freedom.
>>      -- Albert Einstein, Out of My Later Years (1950)
>>
>>_______________________________________________
>>poppler mailing list
>>[email protected]
>>http://lists.freedesktop.org/mailman/listinfo/poppler
>>
>
>
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to