You're welcome - and yes we are always interested to get a hand on files which can not be rendered correctly. Please try to open them using Adobe Reader/Acrobat too just to get an idea how they are processed there. Sometimes we get PDFs that are so corrupted that there is not a lot we can do about it.
For all usage questions the users mailing list is fine. If you are sure or think you found a bug please open an issue at https://issues.apache.org/jira/browse/PDFBOX with a test case to reproduce the issue and the PDF in question attached. If you have an idea how to overcome the issue you can also attach a patch for us to review. Good luck with your project and feel free to ask additional questions as they arise. BR Maruan Am 15.01.2015 um 09:18 schrieb Stefan Falk <[email protected]>: > This is awesome! Thank you! > > I will take a close look at it and update to the trunk version too. > > Do you want me to report PDFs that could not be displayed correctly in the > future? > > Best regards, > Stefan > > On 2015-01-15 09:03, Maruan Sahyoun wrote: >> Hi Stefan, >> >> yes, PDFBox is capable of doing this. To crop the page to the dimensions you >> need you can use >> >> PDPage.setCropBox >> [http://pdfbox.apache.org/docs/1.8.8/javadocs/org/apache/pdfbox/pdmodel/PDPage.html#setCropBox(org.apache.pdfbox.pdmodel.common.PDRectangle)] >> As John pointed out, the SuperimposePage example will give you the basics to >> import and 'mount' the page into a new or existing PDF. >> >> Only thing is to get the coordinates from the mouse and translate that to >> the dimensions for the rectangle in PDF. >> >> BR >> Maruan >> >> Am 15.01.2015 um 08:48 schrieb Stefan Falk <[email protected]>: >> >>> Hi John! >>> >>> Yes, clipping the PDF is basically what I would like to do! So would pdfbox >>> the best choice for this? I have looked a lot for a library but it does not >>> seem that there are many open source tools out there. >>> >>> My target is a program that allows to clip PDFs in order to create a >>> composed PDF out of all the clips and maybe you could tell me if pdfbox >>> would be the best choice for such a task. >>> >>> @fairly difficult: Well yes, I was quite astonished to find out that >>> extracting content from a PDF is actually a scientific topic :D >>> >>> Best regards, >>> Stefan >>> >>> On 2015-01-15 03:21, John Hewson wrote: >>>> Hi Stefan >>>> >>>> What you’re describing is actually fairly difficult due to the complexity >>>> of the PDF operators, we have a special processor for text in PDFBox, but >>>> it is not necessarily accurate. >>>> >>>> If you’re just trying to embed pages from existing PDFs into new PDFs then >>>> the SuperimposePage example which comes with PDFBox might already serve >>>> your needs. If you specify a custom BBox for the FormXObject, then you can >>>> use that to clip the page - which sounds like what you want. Please note >>>> that this technique still embeds all of the original page contents, so its >>>> not suitable for removing private or sensitive data, but otherwise it’s >>>> fine. >>>> >>>> If you have PDFs which PDFReader can’t render, please try using the 2.0 >>>> trunk version of PDFBox, where we have fixed many bugs. >>>> >>>> Thanks >>>> >>>> -- John >>>> >>>>> On 14 Jan 2015, at 15:14, Stefan Falk <[email protected]> wrote: >>>>> >>>>> Well, basically just extract it to load it into another PDF but it >>>>> should be possible e.g. with the mouse. >>>>> >>>>> >>>>> On 2015-01-14 22:52, Maruan Sahyoun wrote: >>>>>> what would you like to do with that content? >>>>>> >>>>>> BR >>>>>> Maruan >>>>>> >>>>>> Am 14.01.2015 um 21:42 schrieb Stefan Falk <[email protected]>: >>>>>> >>>>>>> Hello pdfbox people! >>>>>>> >>>>>>> I was wondering if anybody can help me with my needs. What I am looking >>>>>>> for is a possibility to extract the underlying PDF code from a PDF file >>>>>>> by simply selecting an area with your mouse. >>>>>>> >>>>>>> After reading a few things about PDFs I have learned that anything that >>>>>>> has to do with extraction anything from a PDF can be a quite hard task. >>>>>>> >>>>>>> So I was wondering if pdfbox could do that somehow. I've taken a rough >>>>>>> look at the PDFReader and I noticed that there is e.g. >>>>>>> processTextPosition from the class PageDrawer that seem to allow me to >>>>>>> get at least the position from Text - am I right in assuming that? >>>>>>> >>>>>>> My concrete question would be what is possible with pdfbox regarding >>>>>>> this matter? E.g. I have a PDF on my drive which text seems to be >>>>>>> "extractable" by pdfbox on the one hand but on the other hand the >>>>>>> PDFReader is not able to render any of it. It just renders the images >>>>>>> (see attachment). >>>>>>> >>>>>>> Thank you for your help in advance! >>>>>>> >>>>>>> Best regards, >>>>>>> Stefan >> >

