Yes, PDFBox can do this. -- John
> On 14 Jan 2015, at 23:48, Stefan Falk <[email protected]> wrote: > > Hi John! > > Yes, clipping the PDF is basically what I would like to do! So would pdfbox > the best choice for this? I have looked a lot for a library but it does not > seem that there are many open source tools out there. > > My target is a program that allows to clip PDFs in order to create a composed > PDF out of all the clips and maybe you could tell me if pdfbox would be the > best choice for such a task. > > @fairly difficult: Well yes, I was quite astonished to find out that > extracting content from a PDF is actually a scientific topic :D > > Best regards, > Stefan > >> On 2015-01-15 03:21, John Hewson wrote: >> Hi Stefan >> >> What you’re describing is actually fairly difficult due to the complexity of >> the PDF operators, we have a special processor for text in PDFBox, but it is >> not necessarily accurate. >> >> If you’re just trying to embed pages from existing PDFs into new PDFs then >> the SuperimposePage example which comes with PDFBox might already serve your >> needs. If you specify a custom BBox for the FormXObject, then you can use >> that to clip the page - which sounds like what you want. Please note that >> this technique still embeds all of the original page contents, so its not >> suitable for removing private or sensitive data, but otherwise it’s fine. >> >> If you have PDFs which PDFReader can’t render, please try using the 2.0 >> trunk version of PDFBox, where we have fixed many bugs. >> >> Thanks >> >> -- John >> >>> On 14 Jan 2015, at 15:14, Stefan Falk <[email protected]> wrote: >>> >>> Well, basically just extract it to load it into another PDF but it should >>> be possible e.g. with the mouse. >>> >>> >>>> On 2015-01-14 22:52, Maruan Sahyoun wrote: >>>> what would you like to do with that content? >>>> >>>> BR >>>> Maruan >>>> >>>>> Am 14.01.2015 um 21:42 schrieb Stefan Falk <[email protected]>: >>>>> >>>>> Hello pdfbox people! >>>>> >>>>> I was wondering if anybody can help me with my needs. What I am looking >>>>> for is a possibility to extract the underlying PDF code from a PDF file >>>>> by simply selecting an area with your mouse. >>>>> >>>>> After reading a few things about PDFs I have learned that anything that >>>>> has to do with extraction anything from a PDF can be a quite hard task. >>>>> >>>>> So I was wondering if pdfbox could do that somehow. I've taken a rough >>>>> look at the PDFReader and I noticed that there is e.g. >>>>> processTextPosition from the class PageDrawer that seem to allow me to >>>>> get at least the position from Text - am I right in assuming that? >>>>> >>>>> My concrete question would be what is possible with pdfbox regarding this >>>>> matter? E.g. I have a PDF on my drive which text seems to be >>>>> "extractable" by pdfbox on the one hand but on the other hand the >>>>> PDFReader is not able to render any of it. It just renders the images >>>>> (see attachment). >>>>> >>>>> Thank you for your help in advance! >>>>> >>>>> Best regards, >>>>> Stefan >

