> what are the downsides/upsides of following the content stream order?
>
Depends on whether you know something about the PDF producers that you are 
getting content from.

If all the PDFs that you are trying to process are coming from modern, well 
written, products then you are probably fine.  However, poorly made PDF 
creators will produce PDFs that will end up resulting in garbage from your 
extraction process.

Leonard


On 5/17/19, 3:14 AM, "poppler on behalf of Massimo Redaelli" 
<[email protected] on behalf of [email protected]> 
wrote:

    On Thu, May 16, 2019, 8:08 PM Albert Astals Cid <[email protected]> wrote:
    
    > > Are there reasons not to use it?
    >
    > The man page explains the reason not to use it.
    
    
    Yes, I should have asked: what are the downsides/upsides of following
    the content stream order?
    
    But i guess I'm mainly asking:
    
    > Is the option going to be deprecated, or can we count on it being
    > there for the foreseeable future?
    
    
    M.
    
    On Thu, May 16, 2019 at 6:08 PM Albert Astals Cid <[email protected]> wrote:
    >
    > El dijous, 16 de maig de 2019, a les 17:00:27 CEST, Massimo Redaelli va 
escriure:
    > > Hey all.
    > >
    > > Question regarding pdftotext.
    > >
    > > The help says that `raw` is not recommended anymore, but for all PDFs
    > > I tried it actually gives better results than the default mode, by
    > > which I mean that paragraphs are not interrupted by extraneous text,
    > > like headers or boxes.
    > > (I do have to handle hyphenated words, but that looks easy.)
    > >
    > > Is the option going to be deprecated, or can we count on it being
    > > there for the foreseeable future?
    > > Are there reasons not to use it?
    >
    > The man page explains the reason not to use it.
    >
    > Cheers,
    >   Albert
    >
    > >
    > > Thanks!
    > >
    > >
    >
    >
    >
    >
    > _______________________________________________
    > poppler mailing list
    > [email protected]
    > 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fpoppler&amp;data=02%7C01%7Clrosenth%40adobe.com%7C272a49cb5c3d414e939508d6da973c9c%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C636936740422101367&amp;sdata=fg4lR%2FIlWWvLrtUsUbTtAI6yLYBDgR8F16oScaibohM%3D&amp;reserved=0
    
    
    
    -- 
    M.
    _______________________________________________
    poppler mailing list
    [email protected]
    
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fpoppler&amp;data=02%7C01%7Clrosenth%40adobe.com%7C272a49cb5c3d414e939508d6da973c9c%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C636936740422101367&amp;sdata=fg4lR%2FIlWWvLrtUsUbTtAI6yLYBDgR8F16oScaibohM%3D&amp;reserved=0

_______________________________________________
poppler mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to