PDFBox Colleagues,
  Any recommendations?

          Best,

                 Tim

-----Original Message-----
From: Andisa Dewi [mailto:[email protected]] 
Sent: Monday, February 27, 2017 5:32 AM
To: [email protected]
Subject: Extracting vector graphics from pdf

Hello guys,

I'm currently extracting images from a whole lot of pdf files, however some of 
images (or figures) are somehow not extracted. I'm thinking it might have to do 
with the fact that those images are vector graphics (as usually the case in a 
lot of scientific papers). My question is, is it possible to extract vector 
graphics from pdfs using Tika?

I attached an example of the pdf (here for example, all images are extracted 
except Figure 2).

The way I'm extracting the images are the same as in the example code:

Parser parser = new AutoDetectParser();
Metadata m = new Metadata();
ParseContext c = new ParseContext();
ContentHandler h = new BodyContentHandler(-1); PDFParserConfig pdfConfig = new 
PDFParserConfig(); pdfConfig.setExtractInlineImages(true);
c.set(PDFParserConfig.class, pdfConfig); c.set(Parser.class, parser); 
EmbeddedDocumentExtractor ex = new MyEmbeddedDocumentExtractor(c); 
c.set(EmbeddedDocumentExtractor.class, ex); parser.parse(inputstream, h, m, c);


Thanks!

Regards,

Eli

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to