PDFBox Colleagues,
Any recommendations?
Best,
Tim
-----Original Message-----
From: Andisa Dewi [mailto:[email protected]]
Sent: Monday, February 27, 2017 5:32 AM
To: [email protected]
Subject: Extracting vector graphics from pdf
Hello guys,
I'm currently extracting images from a whole lot of pdf files, however some of
images (or figures) are somehow not extracted. I'm thinking it might have to do
with the fact that those images are vector graphics (as usually the case in a
lot of scientific papers). My question is, is it possible to extract vector
graphics from pdfs using Tika?
I attached an example of the pdf (here for example, all images are extracted
except Figure 2).
The way I'm extracting the images are the same as in the example code:
Parser parser = new AutoDetectParser();
Metadata m = new Metadata();
ParseContext c = new ParseContext();
ContentHandler h = new BodyContentHandler(-1); PDFParserConfig pdfConfig = new
PDFParserConfig(); pdfConfig.setExtractInlineImages(true);
c.set(PDFParserConfig.class, pdfConfig); c.set(Parser.class, parser);
EmbeddedDocumentExtractor ex = new MyEmbeddedDocumentExtractor(c);
c.set(EmbeddedDocumentExtractor.class, ex); parser.parse(inputstream, h, m, c);
Thanks!
Regards,
Eli
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]