We are using poppler for parsing and indexing scientific articles. For
this purpose I wrote some bindings to poppler-cpp for the R
programming language. A few questions:
- Many of our pdf files give parsing errors, such as "Failed to get
object num from hint tables" or "Expected the optional cont
I am trying to get the same (or similar) text output from the c++ interface
as when using the 'pdftotext' utility without the -layout option.
However raw_order_layout gives malformed output (no text at all for most
pages):
ustring str = p->text(p->page_rect(), page::raw_order_layout);
An exampl
The pkgconfig file for poppler does not contain the configured dependencies
required for static linking:
> pkg-config --libs --static poppler-cpp
-L/usr/local/Cellar/poppler/0.41.0/lib -lpoppler-cpp -lpoppler
This is certainly incomplete. Correct output (in my case) would be
something alo
On Wed, Mar 2, 2016 at 10:15 PM, Albert Astals Cid wrote:
> Maybe you can have a look? The code of pdftotext is pretty small so looking at
> the cpp frontend and looking what's wrong should not be very hard.
I don't quite understand what is going on in TextOutputDev(), but one
thing that stands o
When extracting text from a landscape pdf file using the cpp
interface, text at the far right of the page does not get extracted .I
think the problem is that page.text() always assumes portrait
orientation and hence underestimates the width of the page:
p->text()
p->text(p->page_rect())
Is th
On Tue, Mar 8, 2016 at 2:34 PM, Jeroen Ooms wrote:
> When extracting text from a landscape pdf file using the cpp
> interface, text at the far right of the page does not get extracted .I
> think the problem is that page.text() always assumes portrait
> orientation and hence undere
Is there an option in poppler-cpp 'page_renderer' to output RGBA
instead of BGRA?
___
poppler mailing list
poppler@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/poppler
When rendering a black and white pdf file using
poppler::page_renderer, the image always comes out as
image::format_argb32 rather than image::format_mono.
Therefore the png is 4x larger than expected. Is this expected or do I
manually need to set the image format somewhere in the renderer?
My ful
If the user enters an incorrect password when reading a protected pdf via
document::load_from_raw_data() an error is printed to stdout
error: Incorrect password
However the load_from_raw_data() does not raise an exception and returns a
valid *document. However this document segfaults when we cal
Some more debugging and a bug report regarding this issue:
https://bugs.freedesktop.org/show_bug.cgi?id=101385
On Sun, Jun 11, 2017 at 1:51 PM, Jeroen Ooms wrote:
> If the user enters an incorrect password when reading a protected pdf via
> document::load_from_raw_data() an error is prin
Some users of the R bindings have requested a way to extract
hyperlinks from a pdf file. However it seems that currently this
functionality is only available in the qt api, but not in the cpp api?
___
poppler mailing list
poppler@lists.freedesktop.org
htt
A user has reported an issue [1] with a pdf incorrectly rendering on
windows. Poppler gives the following which is probably the reason that
the dots do not render correctly:
> Warning: error: Couldn't find a font for 'ZapfDingbats'
I have tried building poppler --with-font-configuration=win32 a
On Wed, Sep 6, 2017 at 9:10 AM, Jonathan Kew wrote:
> On 05/09/2017 21:03, Albert Astals Cid wrote:
>>
>> ./GlobalParamsWin.cc:102:{"ZapfDingbats", "d05l.pfb",
>> "wingding.ttf", gTrue},
>>
>> This is the substitution table, i guess those files are not available on a
>> moderm Win
On Wed, Sep 6, 2017 at 7:38 PM, Albert Astals Cid wrote:
>> The solution would be to ensure d05l.pfb is available, but whose
>> responsibility should that be -- the poppler library, the client
>> application that uses poppler, or the individual end user?
>
> Not poppler, we're not in the font
On Tue, Oct 3, 2017 at 12:00 AM, Albert Astals Cid wrote:
>
> * cmake is now the default build system
> * autotools based build system has been removed
>
After upgrading, homebrew no longer ships static libs :( Is there a way to
make cmake produce both static and shared libs?
__
On Tue, Oct 3, 2017 at 11:40 PM, Albert Astals Cid wrote:
>
> You should probably open a bug since i don't think this is something that
> is
> going to get fixed fast, unless maybe you can just workaround it by
> building
> it twice? (i know it sucks build stuff twice)
>
OK I have prepared and te
Several projects use static builds of poppler-cpp to ship standalone
pdf applications, but since the switch to cmake it is no longer
possible to build static libs.
Setting -DBUILD_SHARED_LIBS=OFF in cmake only builds a static
libpoppler.a, however libpoppler-cpp still gets built as a dynamic
libra
I maintain the poppler bindings for R which work on Windows, MacOS and
Linux. However Chineese users on Windows/Mac have reported that
poppler doesn't find the share data files:
error: Missing language pack for 'Adobe-CNS1' mapping
Where exactly does poppler look for the 'share' directory? Is t
On Wed, Nov 1, 2017 at 3:20 PM, Jason Crain wrote:
> On Mac and Linux the path is hardcoded at compilation time. It's
> generally in /usr/share/poppler on Linux. Not sure what the standard is
> on Mac. On Windows it looks in \share\poppler relative to the
> installation directory.
OK that is
On Wed, Nov 1, 2017 at 3:51 PM, Jason Crain wrote
>
> I don't know how you use poppler in your project, but you may also have
> the option of passing in the path when you construct the GlobalParams
> object.
Hmm that may be the function I am looking for but I don't understand
where I should pass
On Thu, Nov 2, 2017 at 12:23 AM, Albert Astals Cid wrote:
> Email here or use bugzillla.
OK, attached is the patch.
staticlib.patch
Description: Binary data
___
poppler mailing list
poppler@lists.freedesktop.org
https://lists.freedesktop.org/mailman/l
On Wed, Nov 1, 2017 at 9:31 PM, Jeroen Ooms wrote:
> Hmm that may be the function I am looking for but I don't understand
> where I should pass a GlobalParams object when reading a pdf file. I
> tried setting the 'globalParams' global when loading the R package but
>
Would it be possible to mention the R package 'pdftools' [1] on the
poppler website [2] under programs using poppler? The R package quite
popular among (data) scientists to extract text and data from pdf
documents such scientific papers or public records.
[1] https://cran.r-project.org/package=pdf
On Mon, Nov 13, 2017 at 12:10 AM, Albert Astals Cid wrote:
>
> El dimarts, 7 de novembre de 2017, a les 11:29:25 CET, Jeroen Ooms va
> escriure:
> > Would it be possible to mention the R package 'pdftools' [1] on the
> > poppler website [2] under programs usi
Is there a method in poppler-cpp to extract text from a pdf document,
including the position of each text box? Currently we use page->text()
with page::physical_layout which gives all text per page, but I need
more detailed information about each text box per page.
_
After recent changes (after 0.62) the master branch no longer builds
on macos. The issue is that the statbuf struct does not have an
"st_mtim" field on macos:
[ 0%] Building CXX object CMakeFiles/poppler.dir/goo/gfile.cc.o
/Users/jeroen/Desktop/popplergit/goo/gfile.cc:690:34: error: no member
nam
On Sun, Feb 11, 2018 at 12:11 PM, Albert Astals Cid wrote:
> You're never assigning to tv_nsec in there but still use it in a comparison,
> that needs fixing.
You are right. I think we should compare modification time only by
seconds. The standard definition of 'struct stat' only specifies
st_cti
We have been working on a .travis configuration file to automatically
test poppler feature branches on various linux and osx configurations.
Perhaps this may be interesting to other poppler users as well.
The example ".travis.yml" file can be copied from:
https://github.com/jeroen/poppler/blob/tra
On Mon, Feb 12, 2018 at 3:04 PM, Ihar Filipau wrote:
> On 2/12/18, Jeroen Ooms wrote:
>> On Sun, Feb 11, 2018 at 12:11 PM, Albert Astals Cid wrote:
>>> You're never assigning to tv_nsec in there but still use it in a
>>> comparison,
>>> that needs fixin
I'm testing the new page::text_list() function but I run into an old
problem where the conversion of the ustring to UTF-8 doesn't do what I
expect:
byte_array buf = x.to_utf8();
std::string y(buf.begin(), buf.end());
const char * str = y.c_str();
The resulting char * is not UTF-8. It contai
wrong here?
On Mon, Mar 5, 2018 at 3:10 PM, Jeroen Ooms wrote:
> I'm testing the new page::text_list() function but I run into an old
> problem where the conversion of the ustring to UTF-8 doesn't do what I
> expect:
>
> byte_array buf = x.to_utf8();
>
On Tue, Mar 6, 2018 at 10:31 AM, Adam Reichold
wrote:
> Hello mpsuzuki,
>
> from a glance at the code, it seems page::text uses ustring::from_utf8
> to convert Poppler's GooString into ustring which seems correct if
> GlobalParams::textEncoding has its default value of "UTF-8" .
I don't understan
Currently pkg-config does not correctly list the dependency libs for
static linking when running with --static:
pkg-config --libs --static poppler-cpp
-lpoppler-cpp -lpoppler
The output of --static should also include the recursive dependencies
such as -lcairo -llcms2 -lopenjp2 -ltiff. For
Thanks everyone for the work on this issue, really appreciate the
input. Also excited about mpsuzuki's suggestion to include font data
with the text_list, this will be super helpful.
I have updated and cleaned my example code a little bit to make it
easier to test these issues. The updated test pr
On Thu, Mar 22, 2018 at 8:53 AM, suzuki toshiya
wrote:
> Dear Jeroen,
>
> Please check https://github.com/mpsuzuki/poppler/tree/for-travis whether it
> can serve for you.
Yes this works! I now get:
pkg-config --libs-only-l poppler-cpp
-lpoppler-cpp
pkg-config --libs-only-l --static poppler-cp
On Sun, Mar 25, 2018 at 5:39 AM, suzuki toshiya
wrote:
> My fix consists from 2 parts.
>
> part 1)
> I replaced all detail::unicode_GooString_to_ustring() by ustring::from_utf8(),
> this was suggested by Adam.
>
> https://github.com/mpsuzuki/poppler/commit/7404f5effa8e303399e5101d54ff954ee5153e44
On Mon, Mar 26, 2018 at 10:06 PM, Albert Astals Cid wrote:
> El diumenge, 25 de març de 2018, a les 5:39:18 CEST, suzuki toshiya va
> escriure:
>> Hi all,
>>
>> Finally I think I found the root of issue and I can propose a fix.
>> pre-patch situation is like this:
>> https://travis-ci.org/mpsuzuki
FYI the encoding problems still exist in the master branch today. I am
very interested in this patch by mpsuzuki, what can we do to move this
forward?
On Wed, Mar 28, 2018 at 2:26 PM, suzuki toshiya
wrote:
> Dear Adam,
>
> Adam Reichold wrote:
>>> I see. where is the appropriate place to a
On Sun, Aug 19, 2018 at 11:23 PM, Albert Astals Cid wrote:
> El dilluns, 12 de febrer de 2018, a les 16:03:35 CEST, Jeroen Ooms va
> escriure:
>> We have been working on a .travis configuration file to automatically
>> test poppler feature branches on various linux and
> El dilluns, 2 d’abril de 2018, a les 10:22:51 CEST, suzuki toshiya va
> escriure:
> > if it is not the time to put "image_list" into cpp frontend
> It is ok, actually i know someone else that wanted to do that.
FWIW, if Suzuki is still interested, I would be very happy to extract
images via the
I maintain the poppler bindings for the R programming language and get
a lot of bug reports about corrupted text extracted with poppler.
Below a minimal example that illustrates the problem:
git clone https://github.com/jeroen/popplertest
cd popplertest
g++ -std=c++11 encoding.cpp -o encodin
On Sun, Dec 2, 2018 at 12:51 PM Adam Reichold wrote:
>
> Hello,
>
> Am 02.12.18 um 00:06 schrieb Albert Astals Cid:
> > El dissabte, 1 de desembre de 2018, a les 23:20:46 CET, Jeroen Ooms va
> > escriure:
> >> I maintain the poppler bindings for the R programmi
On Tue, Dec 4, 2018 at 4:44 PM Ranjan Ghosh wrote:
>
> Hi all,
>
> I'm desperately trying to create a fully static build without any
> dependencies. I already got pretty far (IMHO) and build lots and lots of
> other dependent libaries statically (cairo, freetype etc.) without
> encountering any ma
On Wed, Dec 5, 2018 at 5:12 PM Ranjan Ghosh wrote:
>
> Hmm. I think it doesnt work that easily. Actually, I'm trying to build a
> static pdf2svg which users poppler in turn. I tried to follow your
> advice and installed libcairo-dev, libopenjp2-7-dev, libjpeg-dev, etc.
> and then simply compiled p
A researcher who is using the R bindings to analyze large numbers of
scientific papers has asked me advice on the following:
When extracting results from scientific pdf, sometimes math symbols
cannot be extracted because symbols are encoded with a custom font
called Mathematical-Pi [1]. An example
I maintain R bindings called pdftools, mostly used for extracting text
from scientific documents. The bindings wrap the C++ API, in
particular we convert pdf to text using poppler::page::text() with
physical_layout.
Recently users have started to report changes in behaviour with newer
versions of
46 matches
Mail list logo