Seems we're going to follow that up in https://gitlab.freedesktop.org/poppler/poppler/-/issues/1076
El dilluns, 3 de maig de 2021, a les 15:22:19 (CEST), Jeroen Ooms va escriure: > I maintain R bindings called pdftools, mostly used for extracting text > from scientific documents. The bindings wrap the C++ API, in > particular we convert pdf to text using poppler::page::text() with > physical_layout. > > Recently users have started to report changes in behaviour with newer > versions of poppler, in particular wrt whitespace. For example, all > pages are now terminated end with an '\f' symbol which was not the > case before. On Windows, linebreaks are now converted as '\r\n' > instead of just '\n' as before (we use mingw-w64 compilers). And also, > some documents that would contain a single linebreak in e.g. poppler > 0.73, now have 4 or 5 linebreaks on the same place with the latest > poppler. > > I had a look at the changelog but I couldn't find any notes of this. > Are these expected changes? The new behavior is causing some existing > pipelines to break, where people were using e.g. line offsets to > extract fragments of the text. > _______________________________________________ > poppler mailing list > poppler@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/poppler > _______________________________________________ poppler mailing list poppler@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/poppler