Re: Regarding pdf data extraction
I don't think that class can help you... All you need is the PDFTextStripper class... On Mon, Mar 3, 2014 at 7:15 PM, Divya Muttineni wrote: > I am trying to convert the tabular data from pdf file to text(.txt) file. > In one of the article I came across > org.apache.pdfbox.pdfviewer.PDFPageDrawer. > > Can you please help me how to extend this and override the strokepath() > method. > > > Thank you, > Divya >
Re: Need JBIG2 test image
I have a scanned accident police reports that have people names, addresses and phone numbers in them. I had a problem printing these files with pdfbox and I had to improvise by using a command prompt print utility as a Process. I could maybe give you one if you agree not to release it to the public. Alin On Wed, Mar 12, 2014 at 1:19 PM, Tilman Hausherr wrote: > Hello all, > > I'd need a PDF with JBIG2 encoding that can be distributed. So it should > not have anything on it that is copyrighted, i.e. artwork or a real text. > Just some random lines or a lorem ipsum text. The image should be black & > white, i.e. not have other elements in it that have a color like a > watermark. Some unserviced Xerox copiers might produce such images, or some > software from Adobe, IRIS etc. If you have such a file, sent it to me, > tilman at snafu dot de, not to the list. > > I want to use this PDF for a unit test that checks whether the PDF is > decoded with the JBIG2 plugin. A fail would be an empty image. This way we > check that the JBIG2 plugin is properly attached. > > Tilman > >
Problem With MergeUtility
Hello guys, Has anyone had any problem with this? Any idea why it happens? What would be a good value for pushBackSize so this does not happen? Thanks! Partial stack trace: org.apache.pdfbox.exceptions.WrappedIOException: Could not push back 72940 bytes in order to reparse stream. Try increasing push back buffer using system property org.apache.pdfbox.baseParser.pushBackSize at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:546) at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038) at org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:186)
Re: Problem With MergeUtility
Where? Here's the code that causes that: PDFMergeUtility util = new PDFMergeUtility(); for (File file : set) { try{ if( file.exists() ){ util.addSource(file); } } catch ( Exception e ){ //log e } } util.setDestinationFileName(...); util.mergeDocuments(); On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun wrote: > Hi, > > not a direct answer to your question but could you try > PDDocument.loadNonSeq instead? > > BR > Maruan Sahyoun > > > Am 13.03.2014 um 16:16 schrieb Alin Mazilu : > > > > Hello guys, > > > > > > Has anyone had any problem with this? Any idea why it happens? What would > > be a good value for pushBackSize so this does not happen? Thanks! > > > > > > Partial stack trace: > > > > > > org.apache.pdfbox.exceptions.WrappedIOException: Could not push back > 72940 > > bytes in order to reparse stream. Try increasing push back buffer using > > system property org.apache.pdfbox.baseParser.pushBackSize > > > > > > > >at > > > org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:546) > > > > > > > >at > > org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566) > > > > > > > >at > > org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187) > > > > > > > >at > > org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071) > > > > > > > >at > > org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038) > > > > > > > >at > > > org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:186) >
Re: Problem With MergeUtility
Ok, I will try. In my opinion it would be useful if it had the instance variables protected rather than private, that way the class could be extended as needed, like PDFTextStripper. It my situation I would only have to override mergeDocuments(). Anyway, I will try it. Thank you, Alin On Thu, Mar 13, 2014 at 12:52 PM, Timo Boehme wrote: > Hi, > > as far as I remember PDFMergeUtility is one of the last utilities not > supporting loadNonSeq currently. > > As a workaround get the source of PDFMergeUtility, change PDDocument.load > to PDDocument.loadNonSeq (you may provide null as buffer parameter). > > > Best, > Timo > > > Am 13.03.2014 16:46, schrieb Alin Mazilu: > > Where? Here's the code that causes that: >> >> PDFMergeUtility util = new PDFMergeUtility(); >> >> for (File file : set) { >> try{ >> if( file.exists() ){ >> util.addSource(file); >> } >> } catch ( Exception e ){ >> //log e >> } >> } >> util.setDestinationFileName(...); >> >> util.mergeDocuments(); >> >> >> On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun > >wrote: >> >> Hi, >>> >>> not a direct answer to your question but could you try >>> PDDocument.loadNonSeq instead? >>> >>> BR >>> Maruan Sahyoun >>> >>> Am 13.03.2014 um 16:16 schrieb Alin Mazilu : >>>> >>>> Hello guys, >>>> >>>> >>>> Has anyone had any problem with this? Any idea why it happens? What >>>> would >>>> be a good value for pushBackSize so this does not happen? Thanks! >>>> >>>> >>>> Partial stack trace: >>>> >>>> >>>> org.apache.pdfbox.exceptions.WrappedIOException: Could not push back >>>> >>> 72940 >>> >>>> bytes in order to reparse stream. Try increasing push back buffer using >>>> system property org.apache.pdfbox.baseParser.pushBackSize >>>> >>>> >>>> >>>> at >>>> >>>> org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream( >>> BaseParser.java:546) >>> >>>> >>>> >>>> >>>> at >>>> org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566) >>>> >>>> >>>> >>>> at >>>> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187) >>>> >>>> >>>> >>>> at >>>> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071) >>>> >>>> >>>> >>>> at >>>> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038) >>>> >>>> >>>> >>>> at >>>> >>>> org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments( >>> PDFMergerUtility.java:186) >>> >>> >> > > -- > > Timo Boehme > OntoChem GmbH > H.-Damerow-Str. 4 > 06120 Halle/Saale > T: +49 345 4780474 > F: +49 345 4780471 > timo.boe...@ontochem.com > > _ > > OntoChem GmbH > Geschäftsführer: Dr. Lutz Weber > Sitz: Halle / Saale > Registergericht: Stendal > Registernummer: HRB 215461 > _ > >
Re: Problem With MergeUtility
I know that. No problem. On Thu, Mar 13, 2014 at 2:23 PM, John Hewson wrote: > Hi Alin > > Thanks for your fix. > > > it would be useful if it had the instance > > variables protected rather than private, that way the class could be > > extended as needed, like PDFTextStripper. > > The problem with making fields protected is that it exposes internal > implementation details, > making them part of the public API. This prevents us from making internal > changes in the > future without introducing breaking changes to the public API. > > In the case of PDFTextStripper, there is a strong use case for using a > protected field, > because overriding it is the primary mechanism for custom text extraction. > > Cheers > > -- John > > On 13 Mar 2014, at 10:40, Alin Mazilu wrote: > > > Ok, I will try. In my opinion it would be useful if it had the instance > > variables protected rather than private, that way the class could be > > extended as needed, like PDFTextStripper. It my situation I would only > have > > to override mergeDocuments(). Anyway, I will try it. > > > > Thank you, > > > > Alin > > > > > > On Thu, Mar 13, 2014 at 12:52 PM, Timo Boehme >wrote: > > > >> Hi, > >> > >> as far as I remember PDFMergeUtility is one of the last utilities not > >> supporting loadNonSeq currently. > >> > >> As a workaround get the source of PDFMergeUtility, change > PDDocument.load > >> to PDDocument.loadNonSeq (you may provide null as buffer parameter). > >> > >> > >> Best, > >> Timo > >> > >> > >> Am 13.03.2014 16:46, schrieb Alin Mazilu: > >> > >> Where? Here's the code that causes that: > >>> > >>> PDFMergeUtility util = new PDFMergeUtility(); > >>> > >>> for (File file : set) { > >>> try{ > >>> if( file.exists() ){ > >>> util.addSource(file); > >>> } > >>> } catch ( Exception e ){ > >>> //log e > >>> } > >>> } > >>> util.setDestinationFileName(...); > >>> > >>> util.mergeDocuments(); > >>> > >>> > >>> On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun < > sahy...@fileaffairs.de > >>>> wrote: > >>> > >>> Hi, > >>>> > >>>> not a direct answer to your question but could you try > >>>> PDDocument.loadNonSeq instead? > >>>> > >>>> BR > >>>> Maruan Sahyoun > >>>> > >>>> Am 13.03.2014 um 16:16 schrieb Alin Mazilu : > >>>>> > >>>>> Hello guys, > >>>>> > >>>>> > >>>>> Has anyone had any problem with this? Any idea why it happens? What > >>>>> would > >>>>> be a good value for pushBackSize so this does not happen? Thanks! > >>>>> > >>>>> > >>>>> Partial stack trace: > >>>>> > >>>>> > >>>>> org.apache.pdfbox.exceptions.WrappedIOException: Could not push back > >>>>> > >>>> 72940 > >>>> > >>>>> bytes in order to reparse stream. Try increasing push back buffer > using > >>>>> system property org.apache.pdfbox.baseParser.pushBackSize > >>>>> > >>>>> > >>>>> > >>>>>at > >>>>> > >>>>> org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream( > >>>> BaseParser.java:546) > >>>> > >>>>> > >>>>> > >>>>> > >>>>>at > >>>>> org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566) > >>>>> > >>>>> > >>>>> > >>>>>at > >>>>> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187) > >>>>> > >>>>> > >>>>> > >>>>>at > >>>>> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071) > >>>>> > >>>>> > >>>>> > >>>>>at > >>>>> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038) > >>>>> > >>>>> > >>>>> > >>>>>at > >>>>> > >>>>> org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments( > >>>> PDFMergerUtility.java:186) > >>>> > >>>> > >>> > >> > >> -- > >> > >> Timo Boehme > >> OntoChem GmbH > >> H.-Damerow-Str. 4 > >> 06120 Halle/Saale > >> T: +49 345 4780474 > >> F: +49 345 4780471 > >> timo.boe...@ontochem.com > >> > >> _ > >> > >> OntoChem GmbH > >> Geschäftsführer: Dr. Lutz Weber > >> Sitz: Halle / Saale > >> Registergericht: Stendal > >> Registernummer: HRB 215461 > >> _ > >> > >> > >
Re: PDFTextPositions
You have to extend the PDFTextStripper class and override the processTextPosition(...) method. From there the logic depends on you. You can also override the writePage() method to grab the charactersByArticle Vector and then you would look for your words in there by iterating over it. Basically in both cases you will grab all TextPosition objects and figure out your position and height/width form there. ~Alin On Wed, Apr 2, 2014 at 6:32 PM, Sireesha Chilakamarri < sireesha.chary...@gmail.com> wrote: > Hi, > > I would like to Search and Obtain Text Position (X/Y/Width/height) for the > searched Text. > > Suppose text "Hello_World" appears at different location and on different > pages on the PDF document, I would like to see its X/Y/Width/Height for > every occurence. > > How do I achieve this? > > Thank you, > Sireesha >
Re: PDFTextPositions
Not that I know of. PDFBox provides mostly low level access to the PDF format. The only relatively easy way to do it would be keep the TextPosition objects and also grab the text output of the PDFTextStripper. Then you can search the output (a String) for the position of the word you are looking for and get the position in the PDF Page from the corresponding TextPosition objects. Other than that... I can think of other ways but would take longer to implement. Sorry, I would write a sample, but I'm not at my desk right now. Alin On Wed, Apr 2, 2014 at 7:01 PM, Sireesha Chilakamarri < sireesha.chary...@gmail.com> wrote: > Hi Allin, > > I am able to run the PrintTextLocations example. This gives me the > locations details for every characters. > > Is there a easier way to get coordinates for a Word as a whole, instead of > all its characters? > > To Search for Text, I used a method prescribed in > > http://www.programming-free.com/2012/11/simple-word-search-in-pdf-files-using.html > . > > Is there a easier way to Search for Text as well? > > Are there no direct APIs? > > Thank you, > Sireesha > > > On Wed, Apr 2, 2014 at 3:55 PM, Alin Mazilu wrote: > > > You have to extend the PDFTextStripper class and override the > > processTextPosition(...) method. From there the logic depends on you. You > > can also override the writePage() method to grab the charactersByArticle > > Vector and then you would look for your words in there by iterating over > > it. Basically in both cases you will grab all TextPosition objects and > > figure out your position and height/width form there. > > > > ~Alin > > > > > > On Wed, Apr 2, 2014 at 6:32 PM, Sireesha Chilakamarri < > > sireesha.chary...@gmail.com> wrote: > > > > > Hi, > > > > > > I would like to Search and Obtain Text Position (X/Y/Width/height) for > > the > > > searched Text. > > > > > > Suppose text "Hello_World" appears at different location and on > different > > > pages on the PDF document, I would like to see its X/Y/Width/Height for > > > every occurence. > > > > > > How do I achieve this? > > > > > > Thank you, > > > Sireesha > > > > > >
Re: PDF file characters x and y coordinates
I process about 2000 PDF files daily and I never had had an issue with the coordinates. One piece of advise though: write your own TextPositionComparator. ~Alin On Fri, May 16, 2014 at 8:39 AM, Simer P wrote: > I just needed to confirm this with you guys. > > Can the X and Y coordinates returned in the > processTextPosition(TextPosition text) ever be incorrect ? > > Because it doesn't really matter in what order the text is extracted ... if > the x and y coordinates are accurate then I can rearrange the characters > based on the applications requirements. > > So can the X and Y coordinates every be wrong ? > > Cheers >
Re: Problem with processTextPosition
What are the x and y coordinates of H and W? Alin Mazilu SKE GlobalTech, LLC 3250 West Market St. Suite 307D Fairlawn, OH 44333 Sent from my Galaxy S3 On May 17, 2014 2:42 AM, "DImuthu Upeksha" wrote: > Hi all, > > I was tying to manually feed text position objects to > processTextPosition method in PDFTextStripper class. I created a sub > class of PDFTextStripper and override processStream method. In > processStream method I manually created two text position objects for > words "W" and "H". At the end I passed them to processTextPosition > > processTextPosition(textPosition1); > processTextPosition(textPosition2); > > Then I tested it using > > PDFTextStripper ocrStripper = new PDFOCRTextStripper(); > PDDocument document = PDDocument.load("some pdf file"); > String data = ocrStripper.getText(document); > System.out.println(data); > > Output was : H W > > Then I changed the sequence of passing TextPosition objects in [1] > > processTextPosition(textPosition2); > processTextPosition(textPosition1); > > Output was : WH > > -- > > As far as I understood processTextPosition works with the text > position metadata like x and y co-ordinates of the input text. It > should not depend on the order of the input sequence. But in case It > seems like processTextPosition method works according to order of > input. > Ex. If I input W first, it prints W first without considering it's > actual position. > > Is this the normal behaviour? Or am I missing something here? > > [1] https://gist.github.com/DImuthuUpe/5dcfa9758f017794c649 > -- > Regards > > W.Dimuthu Upeksha > Undergraduate > > Department of Computer Science And Engineering > > University of Moratuwa, Sri Lanka >
Re: Problem with processTextPosition
Hello, I commented on the gist. You have to use setSortByPosition(true) in the constructor right after super(). Be careful with your coordinate system. When you do textPosition1.getY() you get 792 not 0. I don't remember exactly where, but there is a class that uses the lower left corner of the page as the origin (0,0), not the upper left corner as it is natural. I hope that helps. Alin PS Is the OCR going to be pure Java or will you be writing it in other language and use native calls? On Sat, May 17, 2014 at 8:13 AM, DImuthu Upeksha wrote: > Hi Alin, > > You can find my source code from here > https://gist.github.com/DImuthuUpe/5dcfa9758f017794c649 > As you can see I set > X-offset : 0 and Y-offset : 0 for "H" > X-offset : 32 and Y-offset : 0 for "W" > in Text Matrices. Is that enough? Is there other way to set X,Y > co-ordinates? > > > On Sat, May 17, 2014 at 12:18 PM, Alin Mazilu wrote: > > What are the x and y coordinates of H and W? > > > > Alin Mazilu > > SKE GlobalTech, LLC > > 3250 West Market St. Suite 307D > > Fairlawn, OH 44333 > > > > Sent from my Galaxy S3 > > On May 17, 2014 2:42 AM, "DImuthu Upeksha" > > wrote: > > > >> Hi all, > >> > >> I was tying to manually feed text position objects to > >> processTextPosition method in PDFTextStripper class. I created a sub > >> class of PDFTextStripper and override processStream method. In > >> processStream method I manually created two text position objects for > >> words "W" and "H". At the end I passed them to processTextPosition > >> > >> processTextPosition(textPosition1); > >> processTextPosition(textPosition2); > >> > >> Then I tested it using > >> > >> PDFTextStripper ocrStripper = new PDFOCRTextStripper(); > >> PDDocument document = PDDocument.load("some pdf file"); > >> String data = ocrStripper.getText(document); > >> System.out.println(data); > >> > >> Output was : H W > >> > >> Then I changed the sequence of passing TextPosition objects in [1] > >> > >> processTextPosition(textPosition2); > >> processTextPosition(textPosition1); > >> > >> Output was : WH > >> > >> -- > >> > >> As far as I understood processTextPosition works with the text > >> position metadata like x and y co-ordinates of the input text. It > >> should not depend on the order of the input sequence. But in case It > >> seems like processTextPosition method works according to order of > >> input. > >> Ex. If I input W first, it prints W first without considering it's > >> actual position. > >> > >> Is this the normal behaviour? Or am I missing something here? > >> > >> [1] https://gist.github.com/DImuthuUpe/5dcfa9758f017794c649 > >> -- > >> Regards > >> > >> W.Dimuthu Upeksha > >> Undergraduate > >> > >> Department of Computer Science And Engineering > >> > >> University of Moratuwa, Sri Lanka > >> > > > > -- > Regards > > W.Dimuthu Upeksha > Undergraduate > > Department of Computer Science And Engineering > > University of Moratuwa, Sri Lanka >
Re: [DISCUSS] Switch to java 1.6
Hello, I got one: JavaFX. I use PDFBox in projects that use JavaFX 1.7/1.8. Alin On Sun, Apr 28, 2013 at 1:35 PM, Andreas Lehmkuehler wrote: > Hi, > > there was already a discussion about switching to java 1.6. As this is a > very > important topic I'd like to move the discussion to a separate thread. > > There are a lot of good reasons to switch to java 1.6 and until now > everybody > agrees to do the switch. > > Is there anybody who has at least one good reason not to go on and switch > to > java 1.6? > > BR > Andreas Lehmkühler >
Re: [DISCUSS] Switch to java 1.6
JavaFX has become part of Java main download in version 1.7 and it will have the version number of Java. I am using PDFBox 1.7.1 in all my projects at the moment. My initial response was because I misread the "switching" to java 1.6 part, and I thought that future versions of PDFBox would not work on any other versions of Java. I am going to have to make something like a PDF plugin for the JavaFX WebView controller and a PDF viewer for on JavaFX technology and I got scared, because I really like PDFBox and I don't want to change to another library. It turns out that I can breath normally now... :)) On Tue, Apr 30, 2013 at 1:03 PM, Thomas Chojecki wrote: > > Zitat von Alin Mazilu : > > Hello, >> > Hi, > > > I got one: JavaFX. I use PDFBox in projects that use JavaFX 1.7/1.8. >> > I try to find this JavaFX version to see what Java version it need, but I > can't figure out where to download it. > Wikipedia [1] did not list such a version. Can you please provide more > detailed informations or test your project with an JRE 1.6 or higher? > > The next big question is, did you use the latest pdfbox version in your > project? If there are no problems you can stay at the 1.8.1 version. > > So please give us more detailes. > > > Best Regards > Thomas > > > [1] http://en.wikipedia.org/wiki/**JavaFX<http://en.wikipedia.org/wiki/JavaFX> > >
PDF Text Highlight
Hello all, I have a bit of a situation on my hands. Here it is: I have a bunch of PDF files sitting in a folder somewhere. What I have to do is search all of them for certain names and highlight those names with a yellow marker-like background and then I have to send all PDFs to a printer. I have done the searching and text extraction and the printing, but for the life of me, I can't figure out how to do the highlighting. What makes it even harder is that I have hundreds of these PDFs per day and human interaction is out of the question. It has to be a push of a button. Any ideas? I appreciate it. Alin Mazilu
Re: PDF Text Highlight
Thank you very much! It does work. The only thing is that you have to use yellowStream.getCOSObject() instead of yellowStream in your last line. Also, the PDPageContentStream.fillRect( x, y, w, h) method uses the bottom left corner of the page as the origin (0,0) which is different from the PDF standard -- the upper left corner. But that's not a problem as it's fixable with simple arithmetic. Thank you so much for your help. It would have taken me a long time to figure it out on my own, if ever. Alin Mazilu On Fri, Jul 26, 2013 at 6:19 PM, Fred Hansen wrote: > Caveat: I've not tried this; nor anything like it. I am answering because > figuring out how to do it was a challenge. > > Presumably your program has variables 'page' and 'document' where the > rectangle goes and variables llx, lly, w, and h delimiting the rectangle. > > Here's some code that might work. (UNTESTED) > > // first construct a stream that draws a yellow rectangle > // at the desired coordinates, but on a temporary page > PDPage tempPage = new PDPage(); > PDPageContentStream tempStream = new PDPageContentStream(document, > tempPage); > tempStream.setNonStrokingColor(0,255,255);//a version of yellow > tempStream.fillRect(llx, lly, w, h); // where to put rect > tempStream.close(); > > // now get a handle on the stream (I hope it is not an array) > PDStream yellowStream = tempPage.getContents(); > > // get the contents of the page > COSDictionary dict = page.getCOSDictionary(); > COSBase pageStream = dict.getDictionaryObject("Contents"); > > // make sure the contents are a COSArray > COSArray pageStreamArray; > if (pageStream instanceof COSStream) { > pageStreamArray = new COSArray(); > pageStreamArray.add(pageStream); > dict.setItem("Contents", pageStreamArray); > } > else pageStreamArray = (COSArray)pageStream; > > // now we add yellowStream at the front of page.getContents() > // (in front so text is later drawn on top of it) > pageStreamArray.add(0, yellowStream ); > > -- > *From:* Alin Mazilu > *To:* dev@pdfbox.apache.org > *Sent:* Friday, July 26, 2013 12:33 PM > *Subject:* PDF Text Highlight > > Hello all, > > I have a bit of a situation on my hands. Here it is: I have a bunch of PDF > files sitting in a folder somewhere. What I have to do is search all of > them for certain names and highlight those names with a yellow marker-like > background and then I have to send all PDFs to a printer. > > I have done the searching and text extraction and the printing, but for the > life of me, I can't figure out how to do the highlighting. What makes it > even harder is that I have hundreds of these PDFs per day and human > interaction is out of the question. It has to be a push of a button. > > Any ideas? I appreciate it. > > Alin Mazilu > > >
Re: PDFTextStripper's writeLine() must be protected!
Hello, I would venture to guess that if you need to override that method you probably need to do something more complicated than just finding out where a line starts and where it ends. Because if you just need to get the beginning and end of each line, you can override setLineSeparator() and all the setXxxStart() and setXxxEnd() and then grab the "output" which is protected and you have access to. If you set the line separator, the paragraph start and end, the page start and end, etc., you can make out easily where the lines start and end. Perhaps if you gave a little more detail about what it is you are trying to accomplish, my help could be a little more meaningful. I've been using pdfbox for a long time in quite a few projects and I have never had the need to override writeLine. The library is quite well thought out. Regards, Alin On Fri, Nov 15, 2013 at 9:14 PM, Edson Alves Pereira wrote: > Hello guys, i was just trying to extend PDFTextStripper to capture the > whole line of a page from a simple PDF and it made me face a problem, the > method writeLine() is private making impossible to me distinguish when the > line finish without to go down textPosition and PDF objects. > > It could be protected? > > Regards, > Edson >
Error printing...
Hello all, I am printing some PDFs and I am getting this: Jan 22, 2014 12:07:47 PM org.apache.pdfbox.filter.JBIG2Filter decode SEVERE: Can't find an ImageIO plugin to decode the JBIG2 encoded datastream. Jan 22, 2014 12:07:47 PM org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap getRGBImage SEVERE: Something went wrong ... the pixelmap doesn't contain any data. Jan 22, 2014 12:07:47 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process WARNING: getRGBImage returned NULL Jan 22, 2014 12:07:47 PM org.apache.pdfbox.filter.JBIG2Filter decode SEVERE: Can't find an ImageIO plugin to decode the JBIG2 encoded datastream. Jan 22, 2014 12:07:47 PM org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap getRGBImage SEVERE: Something went wrong ... the pixelmap doesn't contain any data. Jan 22, 2014 12:07:47 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process WARNING: getRGBImage returned NULL Jan 22, 2014 12:07:47 PM org.apache.pdfbox.filter.JBIG2Filter decode SEVERE: Can't find an ImageIO plugin to decode the JBIG2 encoded datastream. Jan 22, 2014 12:07:47 PM org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap getRGBImage SEVERE: Something went wrong ... the pixelmap doesn't contain any data. Jan 22, 2014 12:07:47 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process WARNING: getRGBImage returned NULL Is there a quick way to fix this? Is there a JBIG2 plugin? I really need to fix it today or I'm in trouble. :) Thank you, Alin
Re: Error printing...
Thank you for your quick responses, but the application is a JavaFX self contained application packaged with the JRE and is independent of the JRE installed on the OS. So I think I need to package the JAI libraries but I have no idea how :D Any thoughts? Thank you, Alin On Wed, Jan 22, 2014 at 1:48 PM, John Hewson wrote: > Yes, there is. Simply Google "JBIG2 plugin” and follow the first link, it > will be called "jbig2-imageio". > > -- John > > On 22 Jan 2014, at 09:16, Alin Mazilu wrote: > > > Hello all, > > > > I am printing some PDFs and I am getting this: > > > > Jan 22, 2014 12:07:47 PM org.apache.pdfbox.filter.JBIG2Filter decode > > SEVERE: Can't find an ImageIO plugin to decode the JBIG2 encoded > datastream. > > Jan 22, 2014 12:07:47 PM > > org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap getRGBImage > > SEVERE: Something went wrong ... the pixelmap doesn't contain any data. > > Jan 22, 2014 12:07:47 PM > org.apache.pdfbox.util.operator.pagedrawer.Invoke > > process > > WARNING: getRGBImage returned NULL > > Jan 22, 2014 12:07:47 PM org.apache.pdfbox.filter.JBIG2Filter decode > > SEVERE: Can't find an ImageIO plugin to decode the JBIG2 encoded > datastream. > > Jan 22, 2014 12:07:47 PM > > org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap getRGBImage > > SEVERE: Something went wrong ... the pixelmap doesn't contain any data. > > Jan 22, 2014 12:07:47 PM > org.apache.pdfbox.util.operator.pagedrawer.Invoke > > process > > WARNING: getRGBImage returned NULL > > Jan 22, 2014 12:07:47 PM org.apache.pdfbox.filter.JBIG2Filter decode > > SEVERE: Can't find an ImageIO plugin to decode the JBIG2 encoded > datastream. > > Jan 22, 2014 12:07:47 PM > > org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap getRGBImage > > SEVERE: Something went wrong ... the pixelmap doesn't contain any data. > > Jan 22, 2014 12:07:47 PM > org.apache.pdfbox.util.operator.pagedrawer.Invoke > > process > > WARNING: getRGBImage returned NULL > > > > Is there a quick way to fix this? Is there a JBIG2 plugin? I really need > to > > fix it today or I'm in trouble. :) > > > > Thank you, > > > > Alin > >
Re: Html to Pdf
Since we are suggesting alternatives, I use iText for converting HTML into PDF. Here is an example: http://www.rgagnon.com/javadetails/java-html-to-pdf-using-itext.html Hope that helps, Alin On Fri, Sep 5, 2014 at 1:50 PM, John Hewson wrote: > Rendering HTML is very complex, you basically need to use a modified web > browser. > > You might want to try PhantomJS http://phantomjs.org/screen-capture.html > which can produce PDFs. > > -- John > > On 5 Sep 2014, at 01:08, Emre Türker wrote: > > > Hi, > > > > > > > > I want export to pdf from html text with using PDFBox. How can I do it? > > > > Please help me. > > > > > > > > Emre. > > > >