Re: Html to Pdf

2014-09-05 Thread Alin Mazilu
Since we are suggesting alternatives, I use iText for converting HTML into
PDF. Here is an example:
http://www.rgagnon.com/javadetails/java-html-to-pdf-using-itext.html

Hope that helps,

Alin


On Fri, Sep 5, 2014 at 1:50 PM, John Hewson j...@jahewson.com wrote:

 Rendering HTML is very complex, you basically need to use a modified web
 browser.

 You might want to try PhantomJS http://phantomjs.org/screen-capture.html
 which can produce PDFs.

 -- John

 On 5 Sep 2014, at 01:08, Emre Türker emre.tur...@coretech.com.tr wrote:

  Hi,
 
 
 
  I want export to pdf from html text with using PDFBox. How can I do it?
 
  Please help me.
 
 
 
  Emre.
 




Re: Problem with processTextPosition

2014-05-17 Thread Alin Mazilu
What are the x and y coordinates of H and W?

Alin Mazilu
SKE GlobalTech, LLC
3250 West Market St. Suite 307D
Fairlawn, OH 44333

Sent from my Galaxy S3
On May 17, 2014 2:42 AM, DImuthu Upeksha dimuthu.upeks...@gmail.com
wrote:

 Hi all,

 I was tying to manually feed text position objects to
 processTextPosition method in PDFTextStripper class. I created a sub
 class of PDFTextStripper and override processStream method. In
 processStream method I manually created two text position objects for
 words W and H. At the end I passed them to processTextPosition

 processTextPosition(textPosition1);
 processTextPosition(textPosition2);

 Then I tested it using

 PDFTextStripper ocrStripper = new PDFOCRTextStripper();
 PDDocument document = PDDocument.load(some pdf file);
 String data = ocrStripper.getText(document);
 System.out.println(data);

 Output was : H W

 Then I changed the sequence of passing TextPosition objects in [1]

 processTextPosition(textPosition2);
 processTextPosition(textPosition1);

 Output was : WH

 --

 As far as I understood processTextPosition works with the text
 position metadata like x and y co-ordinates of the input text. It
 should not depend on the order of the input sequence. But in case It
 seems like processTextPosition method works according to order of
 input.
 Ex. If I input W first, it prints W first without considering it's
 actual position.

 Is this the normal behaviour? Or am I missing something here?

 [1] https://gist.github.com/DImuthuUpe/5dcfa9758f017794c649
 --
 Regards

 W.Dimuthu Upeksha
 Undergraduate

 Department of Computer Science And Engineering

 University of Moratuwa, Sri Lanka



Re: Problem with processTextPosition

2014-05-17 Thread Alin Mazilu
Hello,

I commented on the gist. You have to use setSortByPosition(true) in the
constructor right after super(). Be careful with your coordinate system.
When you do textPosition1.getY() you get 792 not 0. I don't remember
exactly where, but there is a class that uses the lower left corner of the
page as the origin (0,0), not the upper left corner as it is natural.

I hope that helps.

Alin

PS Is the OCR going to be pure Java or will you be writing it in other
language and use native calls?


On Sat, May 17, 2014 at 8:13 AM, DImuthu Upeksha dimuthu.upeks...@gmail.com
 wrote:

 Hi Alin,

 You can find my source code from here
 https://gist.github.com/DImuthuUpe/5dcfa9758f017794c649
 As you can see I set
 X-offset : 0 and Y-offset : 0 for H
 X-offset : 32 and Y-offset : 0 for W
 in Text Matrices. Is that enough? Is there other way to set X,Y
 co-ordinates?


 On Sat, May 17, 2014 at 12:18 PM, Alin Mazilu impet...@gmail.com wrote:
  What are the x and y coordinates of H and W?
 
  Alin Mazilu
  SKE GlobalTech, LLC
  3250 West Market St. Suite 307D
  Fairlawn, OH 44333
 
  Sent from my Galaxy S3
  On May 17, 2014 2:42 AM, DImuthu Upeksha dimuthu.upeks...@gmail.com
  wrote:
 
  Hi all,
 
  I was tying to manually feed text position objects to
  processTextPosition method in PDFTextStripper class. I created a sub
  class of PDFTextStripper and override processStream method. In
  processStream method I manually created two text position objects for
  words W and H. At the end I passed them to processTextPosition
 
  processTextPosition(textPosition1);
  processTextPosition(textPosition2);
 
  Then I tested it using
 
  PDFTextStripper ocrStripper = new PDFOCRTextStripper();
  PDDocument document = PDDocument.load(some pdf file);
  String data = ocrStripper.getText(document);
  System.out.println(data);
 
  Output was : H W
 
  Then I changed the sequence of passing TextPosition objects in [1]
 
  processTextPosition(textPosition2);
  processTextPosition(textPosition1);
 
  Output was : WH
 
  --
 
  As far as I understood processTextPosition works with the text
  position metadata like x and y co-ordinates of the input text. It
  should not depend on the order of the input sequence. But in case It
  seems like processTextPosition method works according to order of
  input.
  Ex. If I input W first, it prints W first without considering it's
  actual position.
 
  Is this the normal behaviour? Or am I missing something here?
 
  [1] https://gist.github.com/DImuthuUpe/5dcfa9758f017794c649
  --
  Regards
 
  W.Dimuthu Upeksha
  Undergraduate
 
  Department of Computer Science And Engineering
 
  University of Moratuwa, Sri Lanka
 



 --
 Regards

 W.Dimuthu Upeksha
 Undergraduate

 Department of Computer Science And Engineering

 University of Moratuwa, Sri Lanka



Re: PDF file characters x and y coordinates

2014-05-16 Thread Alin Mazilu
I process about 2000 PDF files daily and I never had had an issue with the
coordinates. One piece of advise though: write your own
TextPositionComparator.

~Alin


On Fri, May 16, 2014 at 8:39 AM, Simer P sime...@gmail.com wrote:

 I just needed to confirm this with you guys.

 Can the X and Y coordinates returned in the
 processTextPosition(TextPosition text) ever be incorrect ?

 Because it doesn't really matter in what order the text is extracted ... if
 the x and y coordinates are accurate then I can rearrange the characters
 based on the applications requirements.

 So can the X and Y coordinates every be wrong ?

 Cheers



Re: PDFTextPositions

2014-04-02 Thread Alin Mazilu
You have to extend the PDFTextStripper class and override the
processTextPosition(...) method. From there the logic depends on you. You
can also override the writePage() method to grab the charactersByArticle
Vector and then you would look for your words in there by iterating over
it. Basically in both cases you will grab all TextPosition objects and
figure out your position and height/width form there.

~Alin


On Wed, Apr 2, 2014 at 6:32 PM, Sireesha Chilakamarri 
sireesha.chary...@gmail.com wrote:

 Hi,

 I would like to Search and Obtain Text Position (X/Y/Width/height) for the
 searched Text.

 Suppose text Hello_World appears at different location and on different
 pages on the PDF document, I would like to see its X/Y/Width/Height for
 every occurence.

 How do I achieve this?

 Thank you,
 Sireesha



Problem With MergeUtility

2014-03-13 Thread Alin Mazilu
Hello guys,


Has anyone had any problem with this? Any idea why it happens? What would
be a good value for pushBackSize so this does not happen? Thanks!


Partial stack trace:


org.apache.pdfbox.exceptions.WrappedIOException: Could not push back 72940
bytes in order to reparse stream. Try increasing push back buffer using
system property org.apache.pdfbox.baseParser.pushBackSize



at
org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:546)



at
org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)



at
org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)



at
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)



at
org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)



at
org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:186)


Re: Problem With MergeUtility

2014-03-13 Thread Alin Mazilu
Where? Here's the code that causes that:

PDFMergeUtility util = new PDFMergeUtility();

for (File file : set) {
try{
if( file.exists() ){
util.addSource(file);
}
} catch ( Exception e ){
   //log e
}
 }
util.setDestinationFileName(...);

util.mergeDocuments();


On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun sahy...@fileaffairs.dewrote:

 Hi,

 not a direct answer to your question but could you try
 PDDocument.loadNonSeq instead?

 BR
 Maruan Sahyoun

  Am 13.03.2014 um 16:16 schrieb Alin Mazilu impet...@gmail.com:
 
  Hello guys,
 
 
  Has anyone had any problem with this? Any idea why it happens? What would
  be a good value for pushBackSize so this does not happen? Thanks!
 
 
  Partial stack trace:
 
 
  org.apache.pdfbox.exceptions.WrappedIOException: Could not push back
 72940
  bytes in order to reparse stream. Try increasing push back buffer using
  system property org.apache.pdfbox.baseParser.pushBackSize
 
 
 
 at
 
 org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:546)
 
 
 
 at
  org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
 
 
 
 at
  org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
 
 
 
 at
  org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
 
 
 
 at
  org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
 
 
 
 at
 
 org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:186)



Re: Problem With MergeUtility

2014-03-13 Thread Alin Mazilu
Ok, I will try. In my opinion it would be useful if it had the instance
variables protected rather than private, that way the class could be
extended as needed, like PDFTextStripper. It my situation I would only have
to override mergeDocuments(). Anyway, I will try it.

Thank you,

Alin


On Thu, Mar 13, 2014 at 12:52 PM, Timo Boehme timo.boe...@ontochem.comwrote:

 Hi,

 as far as I remember PDFMergeUtility is one of the last utilities not
 supporting loadNonSeq currently.

 As a workaround get the source of PDFMergeUtility, change PDDocument.load
 to PDDocument.loadNonSeq  (you may provide null as buffer parameter).


 Best,
 Timo


 Am 13.03.2014 16:46, schrieb Alin Mazilu:

  Where? Here's the code that causes that:

 PDFMergeUtility util = new PDFMergeUtility();

 for (File file : set) {
 try{
 if( file.exists() ){
  util.addSource(file);
 }
  } catch ( Exception e ){
 //log e
  }
   }
 util.setDestinationFileName(...);

 util.mergeDocuments();


 On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun sahy...@fileaffairs.de
 wrote:

  Hi,

 not a direct answer to your question but could you try
 PDDocument.loadNonSeq instead?

 BR
 Maruan Sahyoun

  Am 13.03.2014 um 16:16 schrieb Alin Mazilu impet...@gmail.com:

 Hello guys,


 Has anyone had any problem with this? Any idea why it happens? What
 would
 be a good value for pushBackSize so this does not happen? Thanks!


 Partial stack trace:


 org.apache.pdfbox.exceptions.WrappedIOException: Could not push back

 72940

 bytes in order to reparse stream. Try increasing push back buffer using
 system property org.apache.pdfbox.baseParser.pushBackSize



 at

  org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
 BaseParser.java:546)




 at
 org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)



 at
 org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)



 at
 org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)



 at
 org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)



 at

  org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(
 PDFMergerUtility.java:186)




 --

  Timo Boehme
  OntoChem GmbH
  H.-Damerow-Str. 4
  06120 Halle/Saale
  T: +49 345 4780474
  F: +49 345 4780471
  timo.boe...@ontochem.com

 _

  OntoChem GmbH
  Geschäftsführer: Dr. Lutz Weber
  Sitz: Halle / Saale
  Registergericht: Stendal
  Registernummer: HRB 215461
 _




Re: Problem With MergeUtility

2014-03-13 Thread Alin Mazilu
I know that. No problem.


On Thu, Mar 13, 2014 at 2:23 PM, John Hewson j...@jahewson.com wrote:

 Hi Alin

 Thanks for your fix.

   it would be useful if it had the instance
  variables protected rather than private, that way the class could be
  extended as needed, like PDFTextStripper.

 The problem with making fields protected is that it exposes internal
 implementation details,
 making them part of the public API. This prevents us from making internal
 changes in the
 future without introducing breaking changes to the public API.

 In the case of PDFTextStripper, there is a strong use case for using a
 protected field,
 because overriding it is the primary mechanism for custom text extraction.

 Cheers

 -- John

 On 13 Mar 2014, at 10:40, Alin Mazilu impet...@gmail.com wrote:

  Ok, I will try. In my opinion it would be useful if it had the instance
  variables protected rather than private, that way the class could be
  extended as needed, like PDFTextStripper. It my situation I would only
 have
  to override mergeDocuments(). Anyway, I will try it.
 
  Thank you,
 
  Alin
 
 
  On Thu, Mar 13, 2014 at 12:52 PM, Timo Boehme timo.boe...@ontochem.com
 wrote:
 
  Hi,
 
  as far as I remember PDFMergeUtility is one of the last utilities not
  supporting loadNonSeq currently.
 
  As a workaround get the source of PDFMergeUtility, change
 PDDocument.load
  to PDDocument.loadNonSeq  (you may provide null as buffer parameter).
 
 
  Best,
  Timo
 
 
  Am 13.03.2014 16:46, schrieb Alin Mazilu:
 
  Where? Here's the code that causes that:
 
  PDFMergeUtility util = new PDFMergeUtility();
 
  for (File file : set) {
  try{
  if( file.exists() ){
  util.addSource(file);
  }
  } catch ( Exception e ){
 //log e
  }
   }
  util.setDestinationFileName(...);
 
  util.mergeDocuments();
 
 
  On Thu, Mar 13, 2014 at 11:27 AM, Maruan Sahyoun 
 sahy...@fileaffairs.de
  wrote:
 
  Hi,
 
  not a direct answer to your question but could you try
  PDDocument.loadNonSeq instead?
 
  BR
  Maruan Sahyoun
 
  Am 13.03.2014 um 16:16 schrieb Alin Mazilu impet...@gmail.com:
 
  Hello guys,
 
 
  Has anyone had any problem with this? Any idea why it happens? What
  would
  be a good value for pushBackSize so this does not happen? Thanks!
 
 
  Partial stack trace:
 
 
  org.apache.pdfbox.exceptions.WrappedIOException: Could not push back
 
  72940
 
  bytes in order to reparse stream. Try increasing push back buffer
 using
  system property org.apache.pdfbox.baseParser.pushBackSize
 
 
 
 at
 
  org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(
  BaseParser.java:546)
 
 
 
 
 at
  org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:566)
 
 
 
 at
  org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:187)
 
 
 
 at
  org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
 
 
 
 at
  org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
 
 
 
 at
 
  org.apache.pdfbox.util.PDFMergerUtility.mergeDocuments(
  PDFMergerUtility.java:186)
 
 
 
 
  --
 
  Timo Boehme
  OntoChem GmbH
  H.-Damerow-Str. 4
  06120 Halle/Saale
  T: +49 345 4780474
  F: +49 345 4780471
  timo.boe...@ontochem.com
 
  _
 
  OntoChem GmbH
  Geschäftsführer: Dr. Lutz Weber
  Sitz: Halle / Saale
  Registergericht: Stendal
  Registernummer: HRB 215461
  _
 
 




Re: Need JBIG2 test image

2014-03-12 Thread Alin Mazilu
I have a scanned accident police reports that have people names, addresses
and phone numbers in them. I had a problem printing these files with pdfbox
and I had to improvise by using a command prompt print utility as a
Process. I could maybe give you one if you agree not to release it to the
public.

Alin


On Wed, Mar 12, 2014 at 1:19 PM, Tilman Hausherr thaush...@t-online.dewrote:

 Hello all,

 I'd need a PDF with JBIG2 encoding that can be distributed. So it should
 not have anything on it that is copyrighted, i.e. artwork or a real text.
 Just some random lines or a lorem ipsum text. The image should be black 
 white, i.e. not have other elements in it that have a color like a
 watermark. Some unserviced Xerox copiers might produce such images, or some
 software from Adobe, IRIS etc. If you have such a file, sent it to me,
 tilman at snafu dot de, not to the list.

 I want to use this PDF for a unit test that checks whether the PDF is
 decoded with the JBIG2 plugin. A fail would be an empty image. This way we
 check that the JBIG2 plugin is properly attached.

 Tilman




Re: Regarding pdf data extraction

2014-03-03 Thread Alin Mazilu
I don't think that class can help you... All you need is the
PDFTextStripper class...


On Mon, Mar 3, 2014 at 7:15 PM, Divya Muttineni divyamuttin...@gmail.comwrote:

 I am trying to convert the tabular data from pdf file to text(.txt) file.
 In one of the article I came across
 org.apache.pdfbox.pdfviewer.PDFPageDrawer.

 Can you please help me how to extend this and override the strokepath()
 method.


 Thank you,
 Divya



Error printing...

2014-01-22 Thread Alin Mazilu
Hello all,

I am printing some PDFs and I am getting this:

Jan 22, 2014 12:07:47 PM org.apache.pdfbox.filter.JBIG2Filter decode
SEVERE: Can't find an ImageIO plugin to decode the JBIG2 encoded datastream.
Jan 22, 2014 12:07:47 PM
org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap getRGBImage
SEVERE: Something went wrong ... the pixelmap doesn't contain any data.
Jan 22, 2014 12:07:47 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke
process
WARNING: getRGBImage returned NULL
Jan 22, 2014 12:07:47 PM org.apache.pdfbox.filter.JBIG2Filter decode
SEVERE: Can't find an ImageIO plugin to decode the JBIG2 encoded datastream.
Jan 22, 2014 12:07:47 PM
org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap getRGBImage
SEVERE: Something went wrong ... the pixelmap doesn't contain any data.
Jan 22, 2014 12:07:47 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke
process
WARNING: getRGBImage returned NULL
Jan 22, 2014 12:07:47 PM org.apache.pdfbox.filter.JBIG2Filter decode
SEVERE: Can't find an ImageIO plugin to decode the JBIG2 encoded datastream.
Jan 22, 2014 12:07:47 PM
org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap getRGBImage
SEVERE: Something went wrong ... the pixelmap doesn't contain any data.
Jan 22, 2014 12:07:47 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke
process
WARNING: getRGBImage returned NULL

Is there a quick way to fix this? Is there a JBIG2 plugin? I really need to
fix it today or I'm in trouble. :)

Thank you,

Alin


Re: Error printing...

2014-01-22 Thread Alin Mazilu
Thank you for your quick responses, but the application is a JavaFX self
contained application packaged with the JRE and is independent of the JRE
installed on the OS. So I think I need to package the JAI libraries but I
have no idea how :D Any thoughts?

Thank you,

Alin


On Wed, Jan 22, 2014 at 1:48 PM, John Hewson j...@jahewson.com wrote:

 Yes, there is. Simply Google JBIG2 plugin” and follow the first link, it
 will be called jbig2-imageio.

 -- John

 On 22 Jan 2014, at 09:16, Alin Mazilu impet...@gmail.com wrote:

  Hello all,
 
  I am printing some PDFs and I am getting this:
 
  Jan 22, 2014 12:07:47 PM org.apache.pdfbox.filter.JBIG2Filter decode
  SEVERE: Can't find an ImageIO plugin to decode the JBIG2 encoded
 datastream.
  Jan 22, 2014 12:07:47 PM
  org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap getRGBImage
  SEVERE: Something went wrong ... the pixelmap doesn't contain any data.
  Jan 22, 2014 12:07:47 PM
 org.apache.pdfbox.util.operator.pagedrawer.Invoke
  process
  WARNING: getRGBImage returned NULL
  Jan 22, 2014 12:07:47 PM org.apache.pdfbox.filter.JBIG2Filter decode
  SEVERE: Can't find an ImageIO plugin to decode the JBIG2 encoded
 datastream.
  Jan 22, 2014 12:07:47 PM
  org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap getRGBImage
  SEVERE: Something went wrong ... the pixelmap doesn't contain any data.
  Jan 22, 2014 12:07:47 PM
 org.apache.pdfbox.util.operator.pagedrawer.Invoke
  process
  WARNING: getRGBImage returned NULL
  Jan 22, 2014 12:07:47 PM org.apache.pdfbox.filter.JBIG2Filter decode
  SEVERE: Can't find an ImageIO plugin to decode the JBIG2 encoded
 datastream.
  Jan 22, 2014 12:07:47 PM
  org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap getRGBImage
  SEVERE: Something went wrong ... the pixelmap doesn't contain any data.
  Jan 22, 2014 12:07:47 PM
 org.apache.pdfbox.util.operator.pagedrawer.Invoke
  process
  WARNING: getRGBImage returned NULL
 
  Is there a quick way to fix this? Is there a JBIG2 plugin? I really need
 to
  fix it today or I'm in trouble. :)
 
  Thank you,
 
  Alin




Re: PDFTextStripper's writeLine() must be protected!

2013-11-15 Thread Alin Mazilu
Hello,

I would venture to guess that if you need to override that method you
probably need to do something more complicated than just finding out where
a line starts and where it ends. Because if you just need to get the
beginning and end of each line, you can override setLineSeparator() and all
the setXxxStart() and setXxxEnd() and then grab the output which is
protected and you have access to. If you set the line separator, the
paragraph start and end, the page start and end, etc., you can make out
easily where the lines start and end.

Perhaps if you gave a little more detail about what it is you are trying to
accomplish, my help could be a little more meaningful. I've been using
pdfbox for a long time in quite a few projects and I have never had the
need to override writeLine. The library is quite well thought out.

Regards,

Alin


On Fri, Nov 15, 2013 at 9:14 PM, Edson Alves Pereira lottal...@gmail.comwrote:

 Hello guys, i was just trying to extend PDFTextStripper to capture the
 whole line of a page from a simple PDF and it made me face a problem, the
 method writeLine() is private making impossible to me distinguish when the
 line finish without to go down textPosition and PDF objects.

 It could be protected?

 Regards,
 Edson



Re: PDF Text Highlight

2013-07-27 Thread Alin Mazilu
Thank you very much! It does work. The only thing is that you have to
use yellowStream.getCOSObject() instead of yellowStream in your last line.
Also, the PDPageContentStream.fillRect( x, y, w, h) method uses the bottom
left corner of the page as the origin (0,0) which is different from the PDF
standard -- the upper left corner. But that's not a problem as it's fixable
with simple arithmetic.

Thank you so much for your help. It would have taken me a long time to
figure it out on my own, if ever.

Alin Mazilu


On Fri, Jul 26, 2013 at 6:19 PM, Fred Hansen zweibie...@yahoo.com wrote:

 Caveat: I've not tried this; nor anything like it. I am answering because
 figuring out how to do it was a challenge.

 Presumably your program has variables 'page' and 'document' where the
 rectangle goes and variables llx, lly, w, and h delimiting the rectangle.

 Here's some code that might work.  (UNTESTED)

 // first construct a stream that draws a yellow rectangle
 //  at the desired coordinates, but on a temporary page
 PDPage tempPage = new PDPage();
 PDPageContentStream tempStream = new PDPageContentStream(document,
 tempPage);
 tempStream.setNonStrokingColor(0,255,255);//a version of yellow
 tempStream.fillRect(llx, lly, w, h);   //  where to put rect
 tempStream.close();

 // now get a handle on the stream (I hope it is not an array)
 PDStream yellowStream = tempPage.getContents();

 // get the contents of the page
 COSDictionary dict = page.getCOSDictionary();
 COSBase pageStream = dict.getDictionaryObject(Contents);

 // make sure the contents are a COSArray
 COSArray pageStreamArray;
 if (pageStream instanceof COSStream) {
 pageStreamArray = new COSArray();
 pageStreamArray.add(pageStream);
 dict.setItem(Contents, pageStreamArray);
 }
 else pageStreamArray = (COSArray)pageStream;

 // now we add yellowStream at the front of page.getContents()
 //   (in front so text is later drawn on top of it)
 pageStreamArray.add(0, yellowStream );

   --
  *From:* Alin Mazilu impet...@gmail.com
 *To:* dev@pdfbox.apache.org
 *Sent:* Friday, July 26, 2013 12:33 PM
 *Subject:* PDF Text Highlight

 Hello all,

 I have a bit of a situation on my hands. Here it is: I have a bunch of PDF
 files sitting in a folder somewhere. What I have to do is search all of
 them for certain names and highlight those names with a yellow marker-like
 background and then I have to send all PDFs to a printer.

 I have done the searching and text extraction and the printing, but for the
 life of me, I can't figure out how to do the highlighting. What makes it
 even harder is that I have hundreds of these PDFs per day and human
 interaction is out of the question. It has to be a push of a button.

 Any ideas? I appreciate it.

 Alin Mazilu





PDF Text Highlight

2013-07-26 Thread Alin Mazilu
Hello all,

I have a bit of a situation on my hands. Here it is: I have a bunch of PDF
files sitting in a folder somewhere. What I have to do is search all of
them for certain names and highlight those names with a yellow marker-like
background and then I have to send all PDFs to a printer.

I have done the searching and text extraction and the printing, but for the
life of me, I can't figure out how to do the highlighting. What makes it
even harder is that I have hundreds of these PDFs per day and human
interaction is out of the question. It has to be a push of a button.

Any ideas? I appreciate it.

Alin Mazilu


Re: [DISCUSS] Switch to java 1.6

2013-04-30 Thread Alin Mazilu
JavaFX has become part of Java main download in version 1.7 and it will
have the version number of Java. I am using PDFBox 1.7.1 in all my projects
at the moment. My initial response was because I misread the switching to
java 1.6 part, and I thought that future versions of PDFBox would not work
on any other versions of Java. I am going to have to make something like a
PDF plugin for the JavaFX WebView controller and a PDF viewer for on JavaFX
technology and I got scared, because I really like PDFBox and I don't want
to change to another library. It turns out that I can breath normally
now... :))


On Tue, Apr 30, 2013 at 1:03 PM, Thomas Chojecki i...@rayman2200.de wrote:


 Zitat von Alin Mazilu impet...@gmail.com:

  Hello,

 Hi,


  I got one: JavaFX. I use PDFBox in projects that use JavaFX 1.7/1.8.

 I try to find this JavaFX version to see what Java version it need, but I
 can't figure out where to download it.
 Wikipedia [1] did not list such a version. Can you please provide more
 detailed informations or test your project with an JRE 1.6 or higher?

 The next big question is, did you use the latest pdfbox version in your
 project? If there are no problems you can stay at the 1.8.1 version.

 So please give us more detailes.


 Best Regards
 Thomas


 [1] http://en.wikipedia.org/wiki/**JavaFXhttp://en.wikipedia.org/wiki/JavaFX




Re: [DISCUSS] Switch to java 1.6

2013-04-28 Thread Alin Mazilu
Hello,

I got one: JavaFX. I use PDFBox in projects that use JavaFX 1.7/1.8.

Alin


On Sun, Apr 28, 2013 at 1:35 PM, Andreas Lehmkuehler andr...@lehmi.dewrote:

 Hi,

 there was already a discussion about switching to java 1.6. As this is a
 very
 important topic I'd like to move the discussion to a separate thread.

 There are a lot of good reasons to switch to java 1.6 and until now
 everybody
 agrees to do the switch.

 Is there anybody who has at least one good reason not to go on and switch
 to
 java 1.6?

 BR
 Andreas Lehmkühler