Strange performance problem with certain PDF files

2016-03-19 Thread Stahle, Patrick
Hi all,

I am running into a lot of strange performance issues with certain PDF files.

Background info:
The strange thing I can't reproduce this consistently. When I get a pdf being 
generated on a particular environment it seems consistent. I do most of my 
development inside VirtualBox virtual machine running fedora. These pdf files I 
am having problems with never have performance issues when run on my virtual 
machine local drive, but if I use a Virtual Box Shared drive as the source / 
destination for the PDF, I see the problem. Another co-worker working from pure 
windows environment experience the performance problem. We are also seeing the 
same issue on our dev solaris servers. The performance range can be quite 
drastic on one of our 3DPDF's (12meg) running on my local environment it can be 
opened, stamped with some text, encrypted, and saved in around 8 sec. Doing the 
same job pointing to a virtual box share drive or on our solaris server that 
same work will take minutes. On my coworkers windows environment it takes 
around 30 seconds. We really only reproduced this consistently on the 12m 3D  
PDF. I have a much smaller pdf (non 3d / convert from msoffice) that does show 
similar performance issue but the times range from 200ms local to 8 sec.

The one thing I see in common between the 2 files is I see a lot of the 
following messages to the console:
Using output from the 12m 3DPDF file:
:
:
1787 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser  - 
parsed=COSObject{13166, 0}

These messages seem to happen on the PDDocument.open and from what I can tell, 
I get 13,166 of these messages in this example PDF.
The slowness does not happen until the following line:
document.save(outputPDFStream);

Other PDF's including some quite large I do not see this performance issue nor 
those log messages.

I know this is not much to go on, I am working on seeing if I can isolate this 
down to something more concrete / reproducible point. But I thought I would 
send this out to see if anyone has any ideas or have seen issues similar to 
this? Suggestions?

Thanks,
Patrick



Re: Multiple instances of the same field name

2016-03-19 Thread Gilad Denneboom
Yeah, getWidgets also works, in the versions it's implemented. Glad to hear
you got it to work!

On Fri, Mar 18, 2016 at 3:40 PM, Kevin Ternes 
wrote:

> Thank you!  Here is an implementation of Gilad's advice:
>
>   PDField pdField = pdAcroForm.getField("EffectiveDate");
>   List widgetList = pdField.getWidgets();
>   for (PDAnnotationWidget widget : widgetList) {
> PDRectangle r = widget.getRectangle();
> log.info(" - rectangle: llx={}, lly={}, w={}, h={}",
> r.getLowerLeftX(), r.getLowerLeftY(), r.getWidth(), r.getHeight());
>   }
>
>
> -Original Message-
> From: Gilad Denneboom [mailto:gilad.denneb...@gmail.com]
> Sent: Thursday, March 17, 2016 1:19 PM
> To: users@pdfbox.apache.org
> Subject: Re: Multiple instances of the same field name
>
> The "#0" is not a part of the actual field name, it's just a convention
> used by Acrobat to show that the field has more than one widgets. To access
> these widgets using PDFBox you can use the getKids method of PDField.
>
> On Thu, Mar 17, 2016 at 5:26 PM, Kevin Ternes 
> wrote:
>
> > How do I deal with incoming PDFs that appear to have more than one
> > field with the same name?
> > When I open the doc with Acrobat, I see fields "EffectiveDate#0" and
> > "EffectiveDate#1".
> > I am trying to manipulate the position/width/height of the two fields,
> > #0 and #1, independently.
> >
> > But when I use PDFBox,
> >   PDField pdField0 = pdAcroForm.getField("EffectiveDate");
> > I get an instance and I can get/set the field value.
> >
> > But...
> >   PDField pdField0 = pdAcroForm.getField("EffectiveDate#0");
> > returns NULL.
> >
> > I am not allowed to change the names of the fields.  Otherwise, I
> > would change them to EffectiveDate_0 and EffectiveDate_1.
>
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>


回复: Spaces are ignored when reading a PDF file

2016-03-19 Thread 风云天空
who can help me 
i get this error in multithreading
java.lang.NullPointerException
at 
java.awt.color.ICC_Profile.activateDeferredProfile(ICC_Profile.java:1086)
at java.awt.color.ICC_Profile$1.activate(ICC_Profile.java:742)
at 
sun.java2d.cmm.ProfileDeferralMgr.activateProfiles(ProfileDeferralMgr.java:95)
at java.awt.color.ICC_Profile.getInstance(ICC_Profile.java:775)
at java.awt.color.ICC_Profile.getInstance(ICC_Profile.java:1013)
at 
org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.loadICCProfile(PDICCBased.java:119)
at 
org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.(PDICCBased.java:89)
at 
org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:182)
at 
org.apache.pdfbox.pdmodel.PDResources.getColorSpace(PDResources.java:172)
at 
org.apache.pdfbox.pdmodel.PDResources.getColorSpace(PDResources.java:142)
at 
org.apache.pdfbox.contentstream.operator.color.SetNonStrokingColorSpace.process(SetNonStrokingColorSpace.java:41)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:814)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:471)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:445)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:187)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:208)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:139)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:80)
at 
com.liaoyoujin.pdfbox.doc.PdfExtractor.getFirstImage(PdfExtractor.java:109)
at com.liaoyoujin.pdfbox.doc.PdfExtractor$Job.run(PdfExtractor.java:178)
at 
com.liaoyoujin.thread.pool.BlockThreadPool$Worker.run(BlockThreadPool.java:53)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
java.util.ConcurrentModificationException
at java.util.Vector$Itr.checkForComodification(Vector.java:1156)
at java.util.Vector$Itr.next(Vector.java:1133)



-- 原始邮件 --
发件人: "Hesham G.";;
发送时间: 2016年3月18日(星期五) 下午4:44
收件人: "users"; 

主题: Re: Spaces are ignored when reading a PDF file



   John,
  
 I think I have got the idea ... Thumps up 
  
  
 Best regards ,
 Hesham 
  
 
 Included message :
  
 I’m rather confused by this thread, inferring spaces is one of the the main  
features of PDFTextStripper. I’m not sure why anyone is suggesting to process  
the text manually - there’s no need to do that. We do that already!
  
 Looking at the original code the problem is right here:
  
 > public class PDFTextStripperProcessor extends PDFTextStripper {
 >@Override
 >public void processTextPosition( TextPosition text  )  {
 >System.out.println(  text.getCharacter() );
 >}
 > }
  
 The processTextPosition method is used to pass an unprocessed TextPosition  
*in* to PDFTextStripper, but this override prevents that from happening, and is 
 just printing the unprocessed token before PDFTextStripper has had a chance to 
 do its job, such as inferring the missing spaces.
  
 You should follow our PrintTextLocations.java example which shows you how  to 
get the processed TextPositions from PDFTextStripper. It’s really easy to  do.
  
 — John
  
 > On 17 Mar 2016, at 04:44, Hesham G.   wrote:
 > 
 > Andreas,
 > 
 > You're absolutely right. I am testing it now, but it seems very  
 > complicated. I hope there might be another easier solution.
 > 
 > 
 > Best regards ,
 > Hesham
 > 
 >  
 > Included message :
 > 
 >> "Hesham G."  hat am 17. März 2016 um  11:20
 >> geschrieben:
 >> 
 >> 
 >> Andreas,
 >> 
 >> That is very helpful.
 >> 
 >> I can get the x location of each character using  TextPosition.getX(), ex:
 >> W: 102.88399
 >> i: 114.18165
 >> t: 117.660614
 >> h: 121.55801
 >> d: 133.09477
 >> u: 140.3994
 >> e: 147.60838
 >> 
 >> So to detect the space between the 2 words "With" & "due"  should I make
 >> subtraction calculations between X of the last letter(h) and the X  of the
 >> first letter (d) and if the number is large than normal then this  is a
 >> space? I think this way might be risky in the detection, or  what?
 > That's the short story. To decide what is normal could be quite  tricky. You 
 > have
 > to take the following facts into account:
 

Re: Spaces are ignored when reading a PDF file

2016-03-19 Thread clovis
just an idea from whom is not fluent in pdfbox nor PDF.
if you just want to know there is a space in between the letters and not
the amount of spaces, you can use your code to get character details and
then use extractText to get the words.

2016-03-17 7:20 GMT-03:00 Hesham G. :

> Andreas,
>
> That is very helpful.
>
> I can get the x location of each character using TextPosition.getX(), ex:
> W: 102.88399
> i: 114.18165
> t: 117.660614
> h: 121.55801
> d: 133.09477
> u: 140.3994
> e: 147.60838
>
> So to detect the space between the 2 words "With" & "due" should I make
> subtraction calculations between X of the last letter(h) and the X of the
> first letter (d) and if the number is large than normal then this is a
> space? I think this way might be risky in the detection, or what?
>
>
> Best regards ,
> Hesham
>
> 
> Included message :
>
> Hi,
>
> Frank van der Hulst  hat am 17. März 2016 um
>> 08:34
>> geschrieben:
>>
>>
>> Spaces don't exist as characters in PDFs. To identify spaces, you have to
>> compare the X coordinates of adjacent characters against their widths.
>>
> That's not correct, spaces exist but in most cases pdf engines omit them
> and
> replace spaces by a splitted text with an appropriate positioning.
>
> BTW, latex uses the same strategy. Here is a excerpt from your pdf:
>
>   [ (W) 55 (ith) -383 (due) -384 (r) 18 (egar) 18 (d) -383 (to) -383
> (Article)
> -384 (\(219\),) -416 (the) -384 (competent) -383 (authority) -383 (has)
> -384
> (the) -383 (right) ] TJ
>
> The text is in between the braces and the numbers are used for horizontal
> positioning.
>
> BR
> Andreas
>
>
>> On Thu, Mar 17, 2016 at 7:12 PM, Hesham G. 
>> wrote:
>>
>> > Hello ,
>> >
>> > I have a PDF file created using Latex. I am trying to read and print all
>> > letters in that file using PDFBox, but when doing this all spaces in >
>> that
>> > file are ignored. Here is the code I am using:
>> > PDPage page = (PDPage)allPages.get( 0 );
>> > PDStream contents = page.getContents();
>> > if ( contents != null ) {
>> > PDFTextStripperProcessor pdfTextStripperProcessor = new
>> > PDFTextStripperProcessor();
>> > pdfTextStripperProcessor.processStream( page, page.findResources(),
>> > contents.getStream() );
>> > }
>> >
>> > public class PDFTextStripperProcessor extends PDFTextStripper {
>> > @Override
>> > public void processTextPosition( TextPosition text )  {
>> > System.out.println( text.getCharacter() );
>> > }
>> > }
>> >
>> > And you can check a one page file sample here to test it:
>> >
>> >
>> https://dl.dropboxusercontent.com/u/10111483/downloads/pdfbox/pdf_latex_spaces_ignored.pdf
>> >
>> > What is the cause of this issue please?
>> >
>> >
>> > Best regards ,
>> > Hesham
>>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>


Issues while writing Chinese content using contentStream.drawString

2016-03-19 Thread Tiruppathi Rajan G
Hi,


Can anyone help me to write Chinese content into a PDF using PDFBox API?

I am referring to the suggestions mentioned in the JIRA

https://issues.apache.org/jira/browse/PDFBOX-1705

Kindly go through my comments (copy pasted again here) and advise on the
next steps;


Hi,

I am unable to write Chinese content into a PDF using your attached
EmbeddedFonts.java. Later I found way to fix it by feeding in the
"arialuni.ttf" instead of LiberationSans-Regular.ttf in your example. I had
confirmed the Chinese PDF generation by writing a hardcoded Chinese data in
my example. Later I had extended this piece of logic to write the desired
PDF in my application but got into "java.lang.IllegalArgumentException:
U+008A is not available in this font's encoding: WinAnsiEncoding". Please
advise.

1. From my application, i call an external Webservice and that returns the
Chinese + English content in response as a String,
2. I read this response string using UTF-8 as the Charset and unmarshal it
to Java bean using JAXRB.
3. To the contentStream.drawString() i passed the string calling the getter
method on the pojo which has Chinese data stored in a String variable. This
is where i am getting the above exception. Please help me to fix this.

I am using PDFBox version 2.0.0.-RC3. Your help is much appreciated.

Regards,
Tiru


Regards,
Tiru


Re: NullPointerException in multithreading

2016-03-19 Thread Tilman Hausherr

Sorry, I meant that the line

ICC_Profile profile = ICC_Profile.getInstance(input);

be enclosed by the "synchronized".

Tilman

Am 18.03.2016 um 20:05 schrieb Tilman Hausherr:

Hello 风云天空,

This is obviously not related to "Spaces are ignored when reading a 
PDF file" so you should have created a new subject line instead of 
hijacking an existing thread by pressing "reply".


I did have the same problem while working on
https://issues.apache.org/jira/browse/PDFBOX-3267

What I did was to change the source code of PDICCBased.java, i.e. 
change this line


awtColorSpace = 
(ICC_ColorSpace)ColorSpace.getInstance(ColorSpace.CS_sRGB);



to

synchronized(LOG)
 {
   awtColorSpace = 
(ICC_ColorSpace)ColorSpace.getInstance(ColorSpace.CS_sRGB);

}


This is a java bug. I'm undecided whether the change above should be 
committed. But try the change :-)


Tilman



Am 18.03.2016 um 12:02 schrieb 风云天空:

who can help me
i get this error in multithreading
java.lang.NullPointerException
at 
java.awt.color.ICC_Profile.activateDeferredProfile(ICC_Profile.java:1086)

at java.awt.color.ICC_Profile$1.activate(ICC_Profile.java:742)
at 
sun.java2d.cmm.ProfileDeferralMgr.activateProfiles(ProfileDeferralMgr.java:95)

at java.awt.color.ICC_Profile.getInstance(ICC_Profile.java:775)
at java.awt.color.ICC_Profile.getInstance(ICC_Profile.java:1013)
at 
org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.loadICCProfile(PDICCBased.java:119)
at 
org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.(PDICCBased.java:89)
at 
org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:182)
at 
org.apache.pdfbox.pdmodel.PDResources.getColorSpace(PDResources.java:172)
at 
org.apache.pdfbox.pdmodel.PDResources.getColorSpace(PDResources.java:142)
at 
org.apache.pdfbox.contentstream.operator.color.SetNonStrokingColorSpace.process(SetNonStrokingColorSpace.java:41)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:814)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:471)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:445)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
at 
org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:187)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:208)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:139)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:80)
at 
com.liaoyoujin.pdfbox.doc.PdfExtractor.getFirstImage(PdfExtractor.java:109)
at 
com.liaoyoujin.pdfbox.doc.PdfExtractor$Job.run(PdfExtractor.java:178)
at 
com.liaoyoujin.thread.pool.BlockThreadPool$Worker.run(BlockThreadPool.java:53)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)
java.util.ConcurrentModificationException
at java.util.Vector$Itr.checkForComodification(Vector.java:1156)
at java.util.Vector$Itr.next(Vector.java:1133)




-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



RE: Multiple instances of the same field name

2016-03-19 Thread Kevin Ternes
Thank you!  Here is an implementation of Gilad's advice:

  PDField pdField = pdAcroForm.getField("EffectiveDate");
  List widgetList = pdField.getWidgets();
  for (PDAnnotationWidget widget : widgetList) {
PDRectangle r = widget.getRectangle();
log.info(" - rectangle: llx={}, lly={}, w={}, h={}", r.getLowerLeftX(), 
r.getLowerLeftY(), r.getWidth(), r.getHeight());
  }


-Original Message-
From: Gilad Denneboom [mailto:gilad.denneb...@gmail.com] 
Sent: Thursday, March 17, 2016 1:19 PM
To: users@pdfbox.apache.org
Subject: Re: Multiple instances of the same field name

The "#0" is not a part of the actual field name, it's just a convention used by 
Acrobat to show that the field has more than one widgets. To access these 
widgets using PDFBox you can use the getKids method of PDField.

On Thu, Mar 17, 2016 at 5:26 PM, Kevin Ternes 
wrote:

> How do I deal with incoming PDFs that appear to have more than one 
> field with the same name?
> When I open the doc with Acrobat, I see fields "EffectiveDate#0" and 
> "EffectiveDate#1".
> I am trying to manipulate the position/width/height of the two fields, 
> #0 and #1, independently.
>
> But when I use PDFBox,
>   PDField pdField0 = pdAcroForm.getField("EffectiveDate");
> I get an instance and I can get/set the field value.
>
> But...
>   PDField pdField0 = pdAcroForm.getField("EffectiveDate#0");
> returns NULL.
>
> I am not allowed to change the names of the fields.  Otherwise, I 
> would change them to EffectiveDate_0 and EffectiveDate_1.

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



RE: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being encrypted by PDFBox 2 snapshot

2016-03-19 Thread Stahle, Patrick
Hi Tillman,
Do you have an apache email address or another email address I can just send 
the pdf to you? This MoveIt site would requires me to have an sending email 
address regardless. So I might as well just send it directly? 

 Thanks,
Patrick

-Original Message-
From: Tilman Hausherr [mailto:thaush...@t-online.de] 
Sent: Wednesday, March 16, 2016 3:38 PM
To: users@pdfbox.apache.org
Subject: Re: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being 
encrypted by PDFBox 2 snapshot

Can you reopen the file you saved with PDFBox? If not, please open an issue in 
JIRA and attach your file.

If yes, just upload the file somewhere, I'd like to have a look at the 
encryption dictionaries.

Tilman

Am 16.03.2016 um 20:19 schrieb Stahle, Patrick:
> Hi,
>
> This is not a general problem and only occurs with original PDF generated 
> with 3D content using Anark. The file when loaded seems to have encrypted and 
> loads just find in Adobe Reader, but when we try to do a "Save As" we get the 
> following error:
> "The document could not be saved. There was a problem reading this document 
> 21."
>
> If I do a control click on the "ok" button. I get the following message:
> "This direct object already has a container."
>
> Any ideas what might be causing this problem? We have tried the same thing 
> with iText and it does not experience this problem.
>
> Sample Code that reproduces the problem:
>  PDDocument doc = null;
>
>  try {
>  doc = PDDocument.load(pdfIn);
>  PDPage page = null;
>  AccessPermission apermission 
> = new AccessPermission();
>  
> apermission.setCanAssembleDocument(false);
>  
> apermission.setCanExtractContent(false);
>  
> apermission.setCanExtractForAccessibility(true);
>  
> apermission.setCanFillInForm(true);
>  
> apermission.setCanModifyAnnotations(true);
>  
> apermission.setCanPrint(true);
>  apermission.setReadOnly();
>  StandardProtectionPolicy spp 
> = new StandardProtectionPolicy(UUID.randomUUID().toString(), "", apermission);
>  doc.protect(spp);
>
>  for (int i = 0; i < 
> doc.getNumberOfPages(); i++) {
>  page = 
> doc.getPage(i);
>  
> PDPageContentStream canvas = new PDPageContentStream(doc, page, 
> PDPageContentStream.AppendMode.APPEND, true, true);
>  
> canvas.saveGraphicsState();
>  
> canvas.restoreGraphicsState();
>  
> canvas.close();
>  }
>  doc.save(pdfOut);
>  bRet = true;
>  }
>  finally {
>  if (doc != null) {
>  doc.close();
>  }
>  }
>
> Thanks,
> Patrick
>
>


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Data from PDF - > MS Access

2016-03-19 Thread John Hewson
Can we please *not* have this conversation on the PDFBox mailing list, thanks.

— John

> On 17 Mar 2016, at 07:01, Tres Finocchiaro  wrote:
> 
>> 
>> There is a good outlook connector called moyosoft but it does cost about
>> $200
> 
> 
> I'm not sure why you'd use outlook at all.  If the data is going to be
> available in email, use something that can fetch email silently and process
> the inbox as a batch process.
> 
> To have a desktop application interact with Outlook is likely a bad
> design.  Something like a silent IMAP processor would be much more scalable
> , in my opinion. :)
> 
> - tres.finocchi...@gmail.com
> 
> On Thu, Mar 17, 2016 at 12:45 AM, Al Grant  wrote:
> 
>> Thanks Ken.
>> 
>> Yes that's the only way I can see so far.
>> 
>> There is a good outlook connector called moyosoft but it does cost about
>> $200
>> 
>> The field itself might be called CompanyXYZ Ave the data type longstring.
>> 
>> I will also need record number in the email to update the correct record.
>> 
>> Cheers
>> 
>> Al
>> On 17/03/2016 5:28 pm, "Ken Bowen"  wrote:
>> 
>>> What is the nature of the feedback into the database?
>>> 
>>> If it amounts to more or less make entries in fields in the db,
>>> and you are stuck with email as a medium, you might hack up a
>>> convention like this:
>>> 1) Select a recognizable boundary line (begin & end), say a line of
>>> at least 10 + or *, or whatever.
>>> 2) Between the boundary lines, have your compatriots make entries like:
>>>[Field Name] = [Value to be input]
>>> with the restriction that no ‘=‘ sign occurs on either side (or replace
>>> the use of ‘=‘ by something else that would satisfy that restriction).
>>> 
>>> You can knock out a script to process each email to a csv file, and
>>> then import that to your Access db.
>>> 
>>> Regards,
>>> Ken Bowen
>>> 
>>> On Mar 16, 2016, at 9:23 PM, Al Grant  wrote:
>>> 
 Hi All
 
 This might be slightly OT - but the list was so helpful in the past...
 
 I have a database in MS Access and a standalone Java applet that
>> imports
 data from a PDF form and scrapes data from the PDF form into
>>> corresponding
 fields in the Access Database. (Thanks to the list for help on this!)
 
 A report from this Access database then goes out via email to a handful
>>> of
 people in other companies and they need a way to provide feedback into
>>> the
 database.
 
 The question is how to achieve this feedback?
 
 It is difficult because the I am working within a number of
>> constraints:
 
 1. We are all working behind large corporate firewalls;
 2. I have Office 2007 installed;
 3. Our shared mailboxes are accessed only via a web interface (not OWA
>> -
>>> I
 think Lotus)
 
 Getting ports on firewall, vpns etc is not an option, nor are cloud
 services like Dropbox or Amazon, so I think I am stuck with email as
>> the
 transport medium.
 
 Solutions?
 
 Cheers
 
 -AL
 
 --
 "Beat it punk!"
 - Clint Eastwood
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>>> For additional commands, e-mail: users-h...@pdfbox.apache.org
>>> 
>>> 
>> 


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Orientation printout - used 2.0.0.RC3 release

2016-03-19 Thread John Hewson

> On 15 Mar 2016, at 12:01, Marco Di Girolami  wrote:
> 
> Hi all,
> I have one problem printing a PDF label (1,96x0,97 inches) to a label printer.
>  
> In attachment you can see the input PDF and the some outputs printouts 
> printed via my java application to PDFCreator (but I have the same results on 
> the real printer).
>  
> You can see that in input I have a PDF document with correctly size and 
> orientation but in output I could not to obtain the SAME document.
>  
> In attachment you can see my simple code.

Our mailing list removes attachments, can you send the code in an e-mail?

> I’ve also tried to:
> 1) manage orientation on PageFormat object;
> 2) setting size and imageablearea on Paper object,
> 3) rotating content using  COSBase and COSArray objects
> with many results but not the correct one.
>  
> Can you help me please??
>  
> Thanks very much in advance,
> Marco
>  
> --
> Marco Di Girolami
> Team Leader
> NoemaLife S.p.A.
> Via Gobetti, 52
> 40129 Bologna - Italy
> tel. 39.345.93.45.786 - 39.051.7098.285
> mdigirol...@noemalife.com  - 
> www.noemalife.com 
>  
> 
> Confidentiality Notice. This email is confidential and any attachment is 
> intended for the named addresses only, or person authorized to receive it on 
> their behalf. The content should be treated confidentially as it is protected 
> by laws about data protection, copyright and know-how defense. The recipient 
> may not disclose this message or any attachment to anyone else without 
> authorization of the sender. Unauthorized use, copying or disclosure may be 
> unlawful. If this transmission is received in error please notify the sender 
> immediately and delete this message from your email system.
> 
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org 
> 
> For additional commands, e-mail: users-h...@pdfbox.apache.org 
> 


Re: Log4j message PDFBox 2.0

2016-03-19 Thread Tilman Hausherr

Am 16.03.2016 um 19:16 schrieb Stahle, Patrick:

Hi,

I see the following debug log message:
493 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer  - 
ScratchFileBuffer not closed!

Is this something be concerned about?



It is PDFBox that noticed a flaw in PDFBox and doing the closing itself :-)

It may have been solved recently. Wait a few days and use the 2.0 release.

Tilman

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Orientation printout - used 2.0.0.RC3 release

2016-03-19 Thread John Hewson

> On 16 Mar 2016, at 17:36, John Hewson  wrote:
> 
> 
>> On 15 Mar 2016, at 12:01, Marco Di Girolami > > wrote:
>> 
>> Hi all,
>> I have one problem printing a PDF label (1,96x0,97 inches) to a label 
>> printer.
>>  
>> In attachment you can see the input PDF and the some outputs printouts 
>> printed via my java application to PDFCreator (but I have the same results 
>> on the real printer).
>>  
>> You can see that in input I have a PDF document with correctly size and 
>> orientation but in output I could not to obtain the SAME document.
>>  
>> In attachment you can see my simple code.
> 
> Our mailing list removes attachments, can you send the code in an e-mail?

Oh an can you post the PDFs somewhere public? Thanks.

— John
> 
>> I’ve also tried to:
>> 1) manage orientation on PageFormat object;
>> 2) setting size and imageablearea on Paper object,
>> 3) rotating content using  COSBase and COSArray objects
>> with many results but not the correct one.
>>  
>> Can you help me please??
>>  
>> Thanks very much in advance,
>> Marco
>>  
>> --
>> Marco Di Girolami
>> Team Leader
>> NoemaLife S.p.A.
>> Via Gobetti, 52
>> 40129 Bologna - Italy
>> tel. 39.345.93.45.786 - 39.051.7098.285
>> mdigirol...@noemalife.com  - 
>> www.noemalife.com 
>>  
>> 
>> Confidentiality Notice. This email is confidential and any attachment is 
>> intended for the named addresses only, or person authorized to receive it on 
>> their behalf. The content should be treated confidentially as it is 
>> protected by laws about data protection, copyright and know-how defense. The 
>> recipient may not disclose this message or any attachment to anyone else 
>> without authorization of the sender. Unauthorized use, copying or disclosure 
>> may be unlawful. If this transmission is received in error please notify the 
>> sender immediately and delete this message from your email system.
>> 
>> 
>> -
>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org 
>> 
>> For additional commands, e-mail: users-h...@pdfbox.apache.org 
>> 



RE: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being encrypted by PDFBox 2 snapshot

2016-03-19 Thread Stahle, Patrick
Would you like the original pdf prior to my encrypting for comparison, and the 
simplest code sample I have? I can also upload that one too...

Also debugging through my code I noticed, what seemed like if I didn't use the 
initialization constructor my sets didn't change the value of the actual byte.

Ex.
AccessPermission apermission = new AccessPermission();
apermission.setCanPrint(true);
apermission.setCanModifyAnnotations(true);
apermission.setCanAssembleDocument(true);
apermission.setCanFillInForm(true);
apermission.setCanExtractForAccessibility(true);
apermission.setReadOnly();

Eclipse debugger shows: 
bytes= -4 
readOnly= true

while:
AccessPermission apermission = new AccessPermission(0);
apermission.setCanPrint(true);
apermission.setCanModifyAnnotations(true);
apermission.setCanAssembleDocument(true);
apermission.setCanFillInForm(true);
apermission.setCanExtractForAccessibility(true);
apermission.setReadOnly();

Eclipse debugger shows: 
bytes= 1828 
readOnly= true

Which looks correct to me, or least compares to what I see from iText (minus 
the readOnly bit).

-Original Message-
From: Tilman Hausherr [mailto:thaush...@t-online.de] 
Sent: Thursday, March 17, 2016 2:55 PM
To: users@pdfbox.apache.org
Subject: Re: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being 
encrypted by PDFBox 2 snapshot

Am 17.03.2016 um 14:21 schrieb Stahle, Patrick:
> Ok,  I think I have the  file uploaded to the following:
> http://wikisend.com/download/381906/tmp_10435-Technical  Data Package 
> BOMAnarkStampedPDFBox102259703.pdf
>
> The link is hard for me to test since I have to do this all from my phone. If 
> it says not found or expired try putting in 381906 from the download page...

Thanks, it worked. I did find some weirdness: parts of the encryption object 
exists twice.

659 0 obj
<<
/Filter /Standard
/V 1
/R 3
/Length 40
/P -1044
/O <92B3A580FEDD525873E5DEA425E75E1B74858FD5C6F5FED7E4C6C39C2E23D2DB>
/U <50D7EE978EFC3D29DAF239DA746CCC2228BF4E5E4E758A4164004E56FFFA0108>
 >>
endobj
660 0 obj
<<
/ID [<3DADA7608D955343B3F967EB90F6801F> <1C65E39CFBD4D44099223D10A9D542B5>]
/Info 13 0 R
/Root 1 0 R
/Encrypt <<
/Filter /Standard
/V 1
/R 3
/Length 40
/P -1044
/O <92B3A580FEDD525873E5DEA425E75E1B74858FD5C6F5FED7E4C6C39C2E23D2DB>
/U <50D7EE978EFC3D29DAF239DA746CCC2228BF4E5E4E758A4164004E56FFFA0108>
 >>
/Type /XRef
/Size 661
/Index [1 659]
/W [1 3 0]
/Filter /FlateDecode
/Length 1881
 >>



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



RE: ExtractText command line OPTIONS

2016-03-19 Thread Gordon Schneider
Tilman

My assumption was obviously wrong. It works now. It did make a big difference 
to the order of the text in the resulting file. It will be easier to work with.

Thanks

Gord
 

-Original Message-
From: Tilman Hausherr [mailto:thaush...@t-online.de] 
Sent: March 18, 2016 12:40 PM
To: users@pdfbox.apache.org
Subject: Re: ExtractText command line OPTIONS

Am 18.03.2016 um 17:38 schrieb Gordon Schneider:
> What is the proper away to add the -sort option to the command. I have looked 
> at lots of different things to figure this out without any success.

just use

-sort



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



RE: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being encrypted by PDFBox 2 snapshot

2016-03-19 Thread Stahle, Patrick
Also other applications like PDF Exchange do not exhibit the problem. Most 
likely has something to do with the 3D capabilities of Adobe Reader

-Original Message-
From: Tilman Hausherr [mailto:thaush...@t-online.de] 
Sent: Wednesday, March 16, 2016 3:38 PM
To: users@pdfbox.apache.org
Subject: Re: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being 
encrypted by PDFBox 2 snapshot

Can you reopen the file you saved with PDFBox? If not, please open an issue in 
JIRA and attach your file.

If yes, just upload the file somewhere, I'd like to have a look at the 
encryption dictionaries.

Tilman

Am 16.03.2016 um 20:19 schrieb Stahle, Patrick:
> Hi,
>
> This is not a general problem and only occurs with original PDF generated 
> with 3D content using Anark. The file when loaded seems to have encrypted and 
> loads just find in Adobe Reader, but when we try to do a "Save As" we get the 
> following error:
> "The document could not be saved. There was a problem reading this document 
> 21."
>
> If I do a control click on the "ok" button. I get the following message:
> "This direct object already has a container."
>
> Any ideas what might be causing this problem? We have tried the same thing 
> with iText and it does not experience this problem.
>
> Sample Code that reproduces the problem:
>  PDDocument doc = null;
>
>  try {
>  doc = PDDocument.load(pdfIn);
>  PDPage page = null;
>  AccessPermission apermission 
> = new AccessPermission();
>  
> apermission.setCanAssembleDocument(false);
>  
> apermission.setCanExtractContent(false);
>  
> apermission.setCanExtractForAccessibility(true);
>  
> apermission.setCanFillInForm(true);
>  
> apermission.setCanModifyAnnotations(true);
>  
> apermission.setCanPrint(true);
>  apermission.setReadOnly();
>  StandardProtectionPolicy spp 
> = new StandardProtectionPolicy(UUID.randomUUID().toString(), "", apermission);
>  doc.protect(spp);
>
>  for (int i = 0; i < 
> doc.getNumberOfPages(); i++) {
>  page = 
> doc.getPage(i);
>  
> PDPageContentStream canvas = new PDPageContentStream(doc, page, 
> PDPageContentStream.AppendMode.APPEND, true, true);
>  
> canvas.saveGraphicsState();
>  
> canvas.restoreGraphicsState();
>  
> canvas.close();
>  }
>  doc.save(pdfOut);
>  bRet = true;
>  }
>  finally {
>  if (doc != null) {
>  doc.close();
>  }
>  }
>
> Thanks,
> Patrick
>
>


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: pdfrenderer unicode characters in latest 2.1.0-SNAPSHOT ?

2016-03-19 Thread Tilman Hausherr

Am 16.03.2016 um 22:23 schrieb Jesse Kuhnert:

It appears as if the java2d font rendering logic in new 2.x versions missed
something with unicode support. I have pdfs being output wonderfully with
my sample unicode text (bengali in this case) but when we try to produce
images any glyphs we have which aren't english appear to just be rendered
as "blank" somehow. (meaning the spacing and line heights look like there
is ghost text taking up space there but maybe that's just our logic for
laying out pdf)

Has anyone else tried to produce unicode based pdf images yet ?




I just tried to render the file created with the EmbeddedFonts example 
and it works fine.


Tilman

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Multiple instances of the same field name

2016-03-19 Thread Gilad Denneboom
The "#0" is not a part of the actual field name, it's just a convention
used by Acrobat to show that the field has more than one widgets. To access
these widgets using PDFBox you can use the getKids method of PDField.

On Thu, Mar 17, 2016 at 5:26 PM, Kevin Ternes 
wrote:

> How do I deal with incoming PDFs that appear to have more than one field
> with the same name?
> When I open the doc with Acrobat, I see fields "EffectiveDate#0" and
> "EffectiveDate#1".
> I am trying to manipulate the position/width/height of the two fields, #0
> and #1, independently.
>
> But when I use PDFBox,
>   PDField pdField0 = pdAcroForm.getField("EffectiveDate");
> I get an instance and I can get/set the field value.
>
> But...
>   PDField pdField0 = pdAcroForm.getField("EffectiveDate#0");
> returns NULL.
>
> I am not allowed to change the names of the fields.  Otherwise, I would
> change them to EffectiveDate_0 and EffectiveDate_1.
>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>


Re: Spaces are ignored when reading a PDF file

2016-03-19 Thread Tilman Hausherr

Am 17.03.2016 um 11:20 schrieb Hesham G.:


So to detect the space between the 2 words "With" & "due" should I 
make subtraction calculations between X of the last letter(h) and the 
X of the first letter (d) and if the number is large than normal then 
this is a space? I think this way might be risky in the detection, or 
what? 


What you're doing is to reinvent the PDFTextStripper code, which has 
some strategies to decide where there are spaces. That's not a bad idea 
(there are some weaknesses), however it is indeed... "tricky".


https://www.youtube.com/watch?v=cjEdxO91RWQ=youtu.be=3m33s



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Issues while writing Chinese content using contentStream.drawString

2016-03-19 Thread Tilman Hausherr

Am 17.03.2016 um 21:35 schrieb Tiruppathi Rajan G:

Hi,


Can anyone help me to write Chinese content into a PDF using PDFBox API?

I am referring to the suggestions mentioned in the JIRA

https://issues.apache.org/jira/browse/PDFBOX-1705

Kindly go through my comments (copy pasted again here) and advise on the
next steps;


Hi,

I am unable to write Chinese content into a PDF using your attached
EmbeddedFonts.java. Later I found way to fix it by feeding in the
"arialuni.ttf" instead of LiberationSans-Regular.ttf in your example. I had
confirmed the Chinese PDF generation by writing a hardcoded Chinese data in
my example. Later I had extended this piece of logic to write the desired
PDF in my application but got into "java.lang.IllegalArgumentException:
U+008A is not available in this font's encoding: WinAnsiEncoding". Please
advise.

1. From my application, i call an external Webservice and that returns the
Chinese + English content in response as a String,
2. I read this response string using UTF-8 as the Charset and unmarshal it
to Java bean using JAXRB.
3. To the contentStream.drawString() i passed the string calling the getter
method on the pojo which has Chinese data stored in a String variable. This
is where i am getting the above exception. Please help me to fix this.



I just did this: I took the embeddedfonts example, and added "中華人民共 
和國" to one of the lines. It worked fine. You mentioned that when you 
hardcoded something, it worked too.
Maybe something is not going well in the operations you're doing to 
create the string you are passing to drawString. That's what debugging 
and logging is for


Tilman



I am using PDFBox version 2.0.0.-RC3. Your help is much appreciated.

Regards,
Tiru


Regards,
Tiru




-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being encrypted by PDFBox 2 snapshot

2016-03-19 Thread Tilman Hausherr

Am 17.03.2016 um 14:21 schrieb Stahle, Patrick:

Ok,  I think I have the  file uploaded to the following:
http://wikisend.com/download/381906/tmp_10435-Technical  Data Package 
BOMAnarkStampedPDFBox102259703.pdf

The link is hard for me to test since I have to do this all from my phone. If 
it says not found or expired try putting in 381906 from the download page...


Thanks, it worked. I did find some weirdness: parts of the encryption 
object exists twice.


659 0 obj
<<
/Filter /Standard
/V 1
/R 3
/Length 40
/P -1044
/O <92B3A580FEDD525873E5DEA425E75E1B74858FD5C6F5FED7E4C6C39C2E23D2DB>
/U <50D7EE978EFC3D29DAF239DA746CCC2228BF4E5E4E758A4164004E56FFFA0108>
>>
endobj
660 0 obj
<<
/ID [<3DADA7608D955343B3F967EB90F6801F> <1C65E39CFBD4D44099223D10A9D542B5>]
/Info 13 0 R
/Root 1 0 R
/Encrypt <<
/Filter /Standard
/V 1
/R 3
/Length 40
/P -1044
/O <92B3A580FEDD525873E5DEA425E75E1B74858FD5C6F5FED7E4C6C39C2E23D2DB>
/U <50D7EE978EFC3D29DAF239DA746CCC2228BF4E5E4E758A4164004E56FFFA0108>
>>
/Type /XRef
/Size 661
/Index [1 659]
/W [1 3 0]
/Filter /FlateDecode
/Length 1881
>>



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Spaces are ignored when reading a PDF file

2016-03-19 Thread Andreas Lehmkühler
Hi,

> Frank van der Hulst  hat am 17. März 2016 um 08:34
> geschrieben:
> 
> 
> Spaces don't exist as characters in PDFs. To identify spaces, you have to
> compare the X coordinates of adjacent characters against their widths.
That's not correct, spaces exist but in most cases pdf engines omit them and
replace spaces by a splitted text with an appropriate positioning.

BTW, latex uses the same strategy. Here is a excerpt from your pdf:

   [ (W) 55 (ith) -383 (due) -384 (r) 18 (egar) 18 (d) -383 (to) -383 (Article)
-384 (\(219\),) -416 (the) -384 (competent) -383 (authority) -383 (has) -384
(the) -383 (right) ] TJ

The text is in between the braces and the numbers are used for horizontal
positioning.

BR
Andreas

> 
> On Thu, Mar 17, 2016 at 7:12 PM, Hesham G.  wrote:
> 
> > Hello ,
> >
> > I have a PDF file created using Latex. I am trying to read and print all
> > letters in that file using PDFBox, but when doing this all spaces in that
> > file are ignored. Here is the code I am using:
> > PDPage page = (PDPage)allPages.get( 0 );
> > PDStream contents = page.getContents();
> > if ( contents != null ) {
> > PDFTextStripperProcessor pdfTextStripperProcessor = new
> > PDFTextStripperProcessor();
> > pdfTextStripperProcessor.processStream( page, page.findResources(),
> > contents.getStream() );
> > }
> >
> > public class PDFTextStripperProcessor extends PDFTextStripper {
> > @Override
> > public void processTextPosition( TextPosition text )  {
> > System.out.println( text.getCharacter() );
> > }
> > }
> >
> > And you can check a one page file sample here to test it:
> >
> > https://dl.dropboxusercontent.com/u/10111483/downloads/pdfbox/pdf_latex_spaces_ignored.pdf
> >
> > What is the cause of this issue please?
> >
> >
> > Best regards ,
> > Hesham

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: pdfrenderer unicode characters in latest 2.1.0-SNAPSHOT ?

2016-03-19 Thread John Hewson

> On 16 Mar 2016, at 14:23, Jesse Kuhnert  
> wrote:
> 
> It appears as if the java2d font rendering logic in new 2.x versions missed
> something with unicode support. I have pdfs being output wonderfully with
> my sample unicode text (bengali in this case) but when we try to produce
> images any glyphs we have which aren't english appear to just be rendered
> as "blank" somehow. (meaning the spacing and line heights look like there
> is ghost text taking up space there but maybe that's just our logic for
> laying out pdf)
> 
> Has anyone else tried to produce unicode based pdf images yet ?

We’ve not had any notable text rendering bugs in a while, can you send a
sample problem PDF?

— John
-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: display an image in pdf

2016-03-19 Thread Tilman Hausherr

Am 18.03.2016 um 18:13 schrieb Haddy, Diane E:

Hello

I have two questions.  I would like to display a .gif image into a pdf.  Can I 
use PDJpeg class for .gif image?

And in this piece of code, which I got form the internet:

String imageName = "/images/arrow.gif;

   PDXObjectImage image = new PDJpeg(doc, new FileInputStream(imageName));
   PDPageContentStream content = new PDPageContentStream (doc, page)
   content.drawImage(image, 200, 500)
   content.close();

   doc.save(fileName)
   doc.close();

the file path is in the web directory of my java project.  However when I run 
this, I get FileNotFoundException.  Any suggestions?


Better use PDPixelMap for gif files. Just read it into a BufferedImage 
with ImageIO.read().


The second problem ("file path is in the web directory of my java 
project") sounds like you're doing a tomcat project. 
"/images/arrow.gif"is wrong for sure, obviously your image isn't there, 
maybe "images/arrow.gif" would be correct, maybe not. Do this to find 
out where you are:


System.out.println (new File(".").getAbsolutePath());

then search where your image directory is relative to that.

Tilman



Thank you
Diane





Diane Haddy

Health Care Information Systems
Application Developer
3281 Ridgeway Dr
Coralville, IA 52241
Phone  (319) 384-9725




Notice: This UI Health Care e-mail (including attachments) is covered by the 
Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and 
may be legally privileged.  If you are not the intended recipient, you are 
hereby notified that any retention, dissemination, distribution, or copying of 
this communication is strictly prohibited.  Please reply to the sender that you 
have received the message in error, then delete it.  Thank you.





-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



display an image in pdf

2016-03-19 Thread Haddy, Diane E
Hello

I have two questions.  I would like to display a .gif image into a pdf.  Can I 
use PDJpeg class for .gif image?

And in this piece of code, which I got form the internet:

String imageName = "/images/arrow.gif;

  PDXObjectImage image = new PDJpeg(doc, new FileInputStream(imageName));
  PDPageContentStream content = new PDPageContentStream (doc, page)
  content.drawImage(image, 200, 500)
  content.close();

  doc.save(fileName)
  doc.close();

the file path is in the web directory of my java project.  However when I run 
this, I get FileNotFoundException.  Any suggestions?

Thank you
Diane





Diane Haddy

Health Care Information Systems
Application Developer
3281 Ridgeway Dr
Coralville, IA 52241
Phone  (319) 384-9725




Notice: This UI Health Care e-mail (including attachments) is covered by the 
Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and 
may be legally privileged.  If you are not the intended recipient, you are 
hereby notified that any retention, dissemination, distribution, or copying of 
this communication is strictly prohibited.  Please reply to the sender that you 
have received the message in error, then delete it.  Thank you.



Re: Spaces are ignored when reading a PDF file

2016-03-19 Thread Hesham G.

Andreas,

That is very helpful.

I can get the x location of each character using TextPosition.getX(), ex:
W: 102.88399
i: 114.18165
t: 117.660614
h: 121.55801
d: 133.09477
u: 140.3994
e: 147.60838

So to detect the space between the 2 words "With" & "due" should I make 
subtraction calculations between X of the last letter(h) and the X of the 
first letter (d) and if the number is large than normal then this is a 
space? I think this way might be risky in the detection, or what?



Best regards ,
Hesham


Included message :

Hi,

Frank van der Hulst  hat am 17. März 2016 um 
08:34

geschrieben:


Spaces don't exist as characters in PDFs. To identify spaces, you have to
compare the X coordinates of adjacent characters against their widths.

That's not correct, spaces exist but in most cases pdf engines omit them and
replace spaces by a splitted text with an appropriate positioning.

BTW, latex uses the same strategy. Here is a excerpt from your pdf:

  [ (W) 55 (ith) -383 (due) -384 (r) 18 (egar) 18 (d) -383 (to) -383 
(Article)

-384 (\(219\),) -416 (the) -384 (competent) -383 (authority) -383 (has) -384
(the) -383 (right) ] TJ

The text is in between the braces and the numbers are used for horizontal
positioning.

BR
Andreas



On Thu, Mar 17, 2016 at 7:12 PM, Hesham G.  wrote:

> Hello ,
>
> I have a PDF file created using Latex. I am trying to read and print all
> letters in that file using PDFBox, but when doing this all spaces in 
> that

> file are ignored. Here is the code I am using:
> PDPage page = (PDPage)allPages.get( 0 );
> PDStream contents = page.getContents();
> if ( contents != null ) {
> PDFTextStripperProcessor pdfTextStripperProcessor = new
> PDFTextStripperProcessor();
> pdfTextStripperProcessor.processStream( page, page.findResources(),
> contents.getStream() );
> }
>
> public class PDFTextStripperProcessor extends PDFTextStripper {
> @Override
> public void processTextPosition( TextPosition text )  {
> System.out.println( text.getCharacter() );
> }
> }
>
> And you can check a one page file sample here to test it:
>
> 
https://dl.dropboxusercontent.com/u/10111483/downloads/pdfbox/pdf_latex_spaces_ignored.pdf
>
> What is the cause of this issue please?
>
>
> Best regards ,
> Hesham


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Spaces are ignored when reading a PDF file

2016-03-19 Thread Hesham G.

Andreas,

You're absolutely right. I am testing it now, but it seems very complicated. 
I hope there might be another easier solution.



Best regards ,
Hesham


Included message :


"Hesham G."  hat am 17. März 2016 um 11:20
geschrieben:


Andreas,

That is very helpful.

I can get the x location of each character using TextPosition.getX(), ex:
W: 102.88399
i: 114.18165
t: 117.660614
h: 121.55801
d: 133.09477
u: 140.3994
e: 147.60838

So to detect the space between the 2 words "With" & "due" should I make
subtraction calculations between X of the last letter(h) and the X of the
first letter (d) and if the number is large than normal then this is a
space? I think this way might be risky in the detection, or what?
That's the short story. To decide what is normal could be quite tricky. You 
have

to take the following facts into account:

- different fonts have different widths (important if the font before the 
space

isn't the same than the font after the space)
- keep in mind that you have to take a scaling and sometimes a rotation into
account
- the "space" between characters may vary if the text is jusitified

There are certainly some other details which may be important as well, so 
that

you end up with some more or less heuristic.

BR
Andreas


Best regards ,
Hesham


Included message :

Hi,

> Frank van der Hulst  hat am 17. März 2016 um
> 08:34
> geschrieben:
>
>
> Spaces don't exist as characters in PDFs. To identify spaces, you have 
> to

> compare the X coordinates of adjacent characters against their widths.
That's not correct, spaces exist but in most cases pdf engines omit them 
and

replace spaces by a splitted text with an appropriate positioning.

BTW, latex uses the same strategy. Here is a excerpt from your pdf:

   [ (W) 55 (ith) -383 (due) -384 (r) 18 (egar) 18 (d) -383 (to) -383
(Article)
-384 (\(219\),) -416 (the) -384 (competent) -383 (authority) -383 
(has) -384

(the) -383 (right) ] TJ

The text is in between the braces and the numbers are used for horizontal
positioning.

BR
Andreas

>
> On Thu, Mar 17, 2016 at 7:12 PM, Hesham G.  
> wrote:

>
> > Hello ,
> >
> > I have a PDF file created using Latex. I am trying to read and print 
> > all

> > letters in that file using PDFBox, but when doing this all spaces in
> > that
> > file are ignored. Here is the code I am using:
> > PDPage page = (PDPage)allPages.get( 0 );
> > PDStream contents = page.getContents();
> > if ( contents != null ) {
> > PDFTextStripperProcessor pdfTextStripperProcessor = new
> > PDFTextStripperProcessor();
> > pdfTextStripperProcessor.processStream( page, 
> > page.findResources(),

> > contents.getStream() );
> > }
> >
> > public class PDFTextStripperProcessor extends PDFTextStripper {
> > @Override
> > public void processTextPosition( TextPosition text )  {
> > System.out.println( text.getCharacter() );
> > }
> > }
> >
> > And you can check a one page file sample here to test it:
> >
> > 
https://dl.dropboxusercontent.com/u/10111483/downloads/pdfbox/pdf_latex_spaces_ignored.pdf
> >
> > What is the cause of this issue please?
> >
> >
> > Best regards ,
> > Hesham

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Data from PDF - > MS Access

2016-03-19 Thread Al Grant
Hi All

This might be slightly OT - but the list was so helpful in the past...

I have a database in MS Access and a standalone Java applet that imports
data from a PDF form and scrapes data from the PDF form into corresponding
fields in the Access Database. (Thanks to the list for help on this!)

A report from this Access database then goes out via email to a handful of
people in other companies and they need a way to provide feedback into the
database.

The question is how to achieve this feedback?

It is difficult because the I am working within a number of constraints:

1. We are all working behind large corporate firewalls;
2. I have Office 2007 installed;
3. Our shared mailboxes are accessed only via a web interface (not OWA - I
think Lotus)

Getting ports on firewall, vpns etc is not an option, nor are cloud
services like Dropbox or Amazon, so I think I am stuck with email as the
transport medium.

Solutions?

Cheers

-AL

-- 
"Beat it punk!"
- Clint Eastwood


Re: Spaces are ignored when reading a PDF file

2016-03-19 Thread Hesham G.
John,

I think I have got the idea ... Thumps up 


Best regards ,
Hesham 


Included message :

I’m rather confused by this thread, inferring spaces is one of the the main 
features of PDFTextStripper. I’m not sure why anyone is suggesting to process 
the text manually - there’s no need to do that. We do that already!

Looking at the original code the problem is right here:

> public class PDFTextStripperProcessor extends PDFTextStripper {
>@Override
>public void processTextPosition( TextPosition text )  {
>System.out.println( text.getCharacter() );
>}
> }

The processTextPosition method is used to pass an unprocessed TextPosition *in* 
to PDFTextStripper, but this override prevents that from happening, and is just 
printing the unprocessed token before PDFTextStripper has had a chance to do 
its job, such as inferring the missing spaces.

You should follow our PrintTextLocations.java example which shows you how to 
get the processed TextPositions from PDFTextStripper. It’s really easy to do.

— John

> On 17 Mar 2016, at 04:44, Hesham G.  wrote:
> 
> Andreas,
> 
> You're absolutely right. I am testing it now, but it seems very complicated. 
> I hope there might be another easier solution.
> 
> 
> Best regards ,
> Hesham
> 
> 
> Included message :
> 
>> "Hesham G."  hat am 17. März 2016 um 11:20
>> geschrieben:
>> 
>> 
>> Andreas,
>> 
>> That is very helpful.
>> 
>> I can get the x location of each character using TextPosition.getX(), ex:
>> W: 102.88399
>> i: 114.18165
>> t: 117.660614
>> h: 121.55801
>> d: 133.09477
>> u: 140.3994
>> e: 147.60838
>> 
>> So to detect the space between the 2 words "With" & "due" should I make
>> subtraction calculations between X of the last letter(h) and the X of the
>> first letter (d) and if the number is large than normal then this is a
>> space? I think this way might be risky in the detection, or what?
> That's the short story. To decide what is normal could be quite tricky. You 
> have
> to take the following facts into account:
> 
> - different fonts have different widths (important if the font before the 
> space
> isn't the same than the font after the space)
> - keep in mind that you have to take a scaling and sometimes a rotation into
> account
> - the "space" between characters may vary if the text is jusitified
> 
> There are certainly some other details which may be important as well, so that
> you end up with some more or less heuristic.
> 
> BR
> Andreas
> 
>> Best regards ,
>> Hesham
>> 
>> 
>> Included message :
>> 
>> Hi,
>> 
>> > Frank van der Hulst  hat am 17. März 2016 um
>> > 08:34
>> > geschrieben:
>> >
>> >
>> > Spaces don't exist as characters in PDFs. To identify spaces, you have > to
>> > compare the X coordinates of adjacent characters against their widths.
>> That's not correct, spaces exist but in most cases pdf engines omit them and
>> replace spaces by a splitted text with an appropriate positioning.
>> 
>> BTW, latex uses the same strategy. Here is a excerpt from your pdf:
>> 
>>   [ (W) 55 (ith) -383 (due) -384 (r) 18 (egar) 18 (d) -383 (to) -383
>> (Article)
>> -384 (\(219\),) -416 (the) -384 (competent) -383 (authority) -383 (has) -384
>> (the) -383 (right) ] TJ
>> 
>> The text is in between the braces and the numbers are used for horizontal
>> positioning.
>> 
>> BR
>> Andreas
>> 
>> >
>> > On Thu, Mar 17, 2016 at 7:12 PM, Hesham G.  > 
>> > wrote:
>> >
>> > > Hello ,
>> > >
>> > > I have a PDF file created using Latex. I am trying to read and print > > 
>> > > all
>> > > letters in that file using PDFBox, but when doing this all spaces in
>> > > that
>> > > file are ignored. Here is the code I am using:
>> > > PDPage page = (PDPage)allPages.get( 0 );
>> > > PDStream contents = page.getContents();
>> > > if ( contents != null ) {
>> > > PDFTextStripperProcessor pdfTextStripperProcessor = new
>> > > PDFTextStripperProcessor();
>> > > pdfTextStripperProcessor.processStream( page, > > 
>> > > page.findResources(),
>> > > contents.getStream() );
>> > > }
>> > >
>> > > public class PDFTextStripperProcessor extends PDFTextStripper {
>> > > @Override
>> > > public void processTextPosition( TextPosition text )  {
>> > > System.out.println( text.getCharacter() );
>> > > }
>> > > }
>> > >
>> > > And you can check a one page file sample here to test it:
>> > >
>> > > https://dl.dropboxusercontent.com/u/10111483/downloads/pdfbox/pdf_latex_spaces_ignored.pdf
>> > >
>> > > What is the cause of this issue please?
>> > >
>> > >
>> > > Best regards ,
>> > > Hesham
>> 
>> -
>> To unsubscribe, e-mail: 

Strange "Save As" issue with Adobe Reader 11 / DC with PDF being encrypted by PDFBox 2 snapshot

2016-03-19 Thread Stahle, Patrick
Hi,

This is not a general problem and only occurs with original PDF generated with 
3D content using Anark. The file when loaded seems to have encrypted and loads 
just find in Adobe Reader, but when we try to do a "Save As" we get the 
following error:
"The document could not be saved. There was a problem reading this document 21."

If I do a control click on the "ok" button. I get the following message:
"This direct object already has a container."

Any ideas what might be causing this problem? We have tried the same thing with 
iText and it does not experience this problem.

Sample Code that reproduces the problem:
PDDocument doc = null;

try {
doc = PDDocument.load(pdfIn);
PDPage page = null;
AccessPermission apermission = 
new AccessPermission();

apermission.setCanAssembleDocument(false);

apermission.setCanExtractContent(false);

apermission.setCanExtractForAccessibility(true);

apermission.setCanFillInForm(true);

apermission.setCanModifyAnnotations(true);
apermission.setCanPrint(true);
apermission.setReadOnly();
StandardProtectionPolicy spp = 
new StandardProtectionPolicy(UUID.randomUUID().toString(), "", apermission);
doc.protect(spp);

for (int i = 0; i < 
doc.getNumberOfPages(); i++) {
page = 
doc.getPage(i);

PDPageContentStream canvas = new PDPageContentStream(doc, page, 
PDPageContentStream.AppendMode.APPEND, true, true);

canvas.saveGraphicsState();

canvas.restoreGraphicsState();
canvas.close();
}
doc.save(pdfOut);
bRet = true;
}
finally {
if (doc != null) {
doc.close();
}
}

Thanks,
Patrick



Re: pdfrenderer unicode characters in latest 2.1.0-SNAPSHOT ?

2016-03-19 Thread Tilman Hausherr

Am 16.03.2016 um 22:39 schrieb Jesse Kuhnert:

Yeah that one worked fine for me too it just doesn't in the slightly more
elaborate situation we are rendering stuff in . Though the actual pdf is
rendered perfectly. Meh ok sorry I don't have a more definitive idea why it
isn't working in our real product code vs smaller debug examples where it
works fine. I'll file a bug report if I ever find anything .


Maybe the font is missing on the target server?

Tilman




On Wednesday, March 16, 2016, Tilman Hausherr  wrote:


Am 16.03.2016 um 22:23 schrieb Jesse Kuhnert:


It appears as if the java2d font rendering logic in new 2.x versions
missed
something with unicode support. I have pdfs being output wonderfully with
my sample unicode text (bengali in this case) but when we try to produce
images any glyphs we have which aren't english appear to just be rendered
as "blank" somehow. (meaning the spacing and line heights look like there
is ghost text taking up space there but maybe that's just our logic for
laying out pdf)

Has anyone else tried to produce unicode based pdf images yet ?



I just tried to render the file created with the EmbeddedFonts example and
it works fine.

Tilman

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org





-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Data from PDF - > MS Access

2016-03-19 Thread Al Grant
Thanks Ken.

Yes that's the only way I can see so far.

There is a good outlook connector called moyosoft but it does cost about
$200

The field itself might be called CompanyXYZ Ave the data type longstring.

I will also need record number in the email to update the correct record.

Cheers

Al
On 17/03/2016 5:28 pm, "Ken Bowen"  wrote:

> What is the nature of the feedback into the database?
>
> If it amounts to more or less make entries in fields in the db,
> and you are stuck with email as a medium, you might hack up a
> convention like this:
> 1) Select a recognizable boundary line (begin & end), say a line of
> at least 10 + or *, or whatever.
> 2) Between the boundary lines, have your compatriots make entries like:
> [Field Name] = [Value to be input]
> with the restriction that no ‘=‘ sign occurs on either side (or replace
> the use of ‘=‘ by something else that would satisfy that restriction).
>
> You can knock out a script to process each email to a csv file, and
> then import that to your Access db.
>
> Regards,
> Ken Bowen
>
> On Mar 16, 2016, at 9:23 PM, Al Grant  wrote:
>
> > Hi All
> >
> > This might be slightly OT - but the list was so helpful in the past...
> >
> > I have a database in MS Access and a standalone Java applet that imports
> > data from a PDF form and scrapes data from the PDF form into
> corresponding
> > fields in the Access Database. (Thanks to the list for help on this!)
> >
> > A report from this Access database then goes out via email to a handful
> of
> > people in other companies and they need a way to provide feedback into
> the
> > database.
> >
> > The question is how to achieve this feedback?
> >
> > It is difficult because the I am working within a number of constraints:
> >
> > 1. We are all working behind large corporate firewalls;
> > 2. I have Office 2007 installed;
> > 3. Our shared mailboxes are accessed only via a web interface (not OWA -
> I
> > think Lotus)
> >
> > Getting ports on firewall, vpns etc is not an option, nor are cloud
> > services like Dropbox or Amazon, so I think I am stuck with email as the
> > transport medium.
> >
> > Solutions?
> >
> > Cheers
> >
> > -AL
> >
> > --
> > "Beat it punk!"
> > - Clint Eastwood
>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>


Log4j message PDFBox 2.0

2016-03-19 Thread Stahle, Patrick
Hi,

I see the following debug log message:
493 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer  - 
ScratchFileBuffer not closed!

Is this something be concerned about?

Thanks,
Patrick


Re: PrintTextLocations 1.8 vs 2.0

2016-03-19 Thread Tilman Hausherr
I found a case where the strategy you mentioned didn't work for a TT 
font, the file 032431. Here's some updated code.



private Shape calculateGlyphBounds(Matrix textRenderingMatrix, 
PDFont font, int code) throws IOException

{
GeneralPath path = null;
AffineTransform at = textRenderingMatrix.createAffineTransform();
at.concatenate(font.getFontMatrix().createAffineTransform());
if (font instanceof PDType3Font)
{
PDType3Font t3Font = (PDType3Font) font;
PDType3CharProc charProc = t3Font.getCharProc(code);
if (charProc != null)
{
PDRectangle glyphBBox = charProc.getGlyphBBox();
if (glyphBBox != null)
{
path = glyphBBox.toGeneralPath();
}
}
}
else if (font instanceof PDVectorFont)
{
PDVectorFont vectorFont = (PDVectorFont) font;
path = vectorFont.getPath(code);

if (font instanceof PDTrueTypeFont)
{
PDTrueTypeFont ttFont = (PDTrueTypeFont) font;
int unitsPerEm = 
ttFont.getTrueTypeFont().getHeader().getUnitsPerEm();

at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
}
if (font instanceof PDType0Font)
{
PDType0Font t0font = (PDType0Font) font;
if (t0font.getDescendantFont() instanceof PDCIDFontType2)
{
int unitsPerEm = ((PDCIDFontType2) 
t0font.getDescendantFont()).getTrueTypeFont().getHeader().getUnitsPerEm();

at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
}
}
}
else if (font instanceof PDSimpleFont)
{
PDSimpleFont simpleFont = (PDSimpleFont) font;

// these two lines do not always work, e.g. for the TT 
fonts in file 032431.pdf

// which is why PDVectorFont is tried first.
String name = simpleFont.getEncoding().getName(code);
path = simpleFont.getPath(name);
}
else
{
// shouldn't happen, please open issue in JIRA
System.out.println("Unknown font class: " + font.getClass());
}
if (path == null)
{
return null;
}
return at.createTransformedShape(path.getBounds2D());
}


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Spaces are ignored when reading a PDF file

2016-03-19 Thread Frank van der Hulst
Spaces don't exist as characters in PDFs. To identify spaces, you have to
compare the X coordinates of adjacent characters against their widths.

On Thu, Mar 17, 2016 at 7:12 PM, Hesham G.  wrote:

> Hello ,
>
> I have a PDF file created using Latex. I am trying to read and print all
> letters in that file using PDFBox, but when doing this all spaces in that
> file are ignored. Here is the code I am using:
> PDPage page = (PDPage)allPages.get( 0 );
> PDStream contents = page.getContents();
> if ( contents != null ) {
> PDFTextStripperProcessor pdfTextStripperProcessor = new
> PDFTextStripperProcessor();
> pdfTextStripperProcessor.processStream( page, page.findResources(),
> contents.getStream() );
> }
>
> public class PDFTextStripperProcessor extends PDFTextStripper {
> @Override
> public void processTextPosition( TextPosition text )  {
> System.out.println( text.getCharacter() );
> }
> }
>
> And you can check a one page file sample here to test it:
>
> https://dl.dropboxusercontent.com/u/10111483/downloads/pdfbox/pdf_latex_spaces_ignored.pdf
>
> What is the cause of this issue please?
>
>
> Best regards ,
> Hesham


Re: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being encrypted by PDFBox 2 snapshot

2016-03-19 Thread Tilman Hausherr

https://issues.apache.org/jira/browse/PDFBOX-3276

If you register, you can subscribe to that issue and be notified of any 
updates.


Tilman

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



RE: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being encrypted by PDFBox 2 snapshot

2016-03-19 Thread Stahle, Patrick
Hi Tillman,

Ok,  I think I have the  file uploaded to the following:
http://wikisend.com/download/381906/tmp_10435-Technical Data Package 
BOMAnarkStampedPDFBox102259703.pdf

The link is hard for me to test since I have to do this all from my phone. If 
it says not found or expired try putting in 381906 from the download page...


Thanks,
Patrick

-Original Message-
From: Stahle, Patrick [mailto:patrick.sta...@te.com] 
Sent: Thursday, March 17, 2016 8:10 AM
To: users@pdfbox.apache.org
Subject: RE: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being 
encrypted by PDFBox 2 snapshot

Hi Tillman,
Do you have an apache email address or another email address I can just send 
the pdf to you? This MoveIt site would requires me to have an sending email 
address regardless. So I might as well just send it directly? 

 Thanks,
Patrick

-Original Message-
From: Tilman Hausherr [mailto:thaush...@t-online.de] 
Sent: Wednesday, March 16, 2016 3:38 PM
To: users@pdfbox.apache.org
Subject: Re: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being 
encrypted by PDFBox 2 snapshot

Can you reopen the file you saved with PDFBox? If not, please open an issue in 
JIRA and attach your file.

If yes, just upload the file somewhere, I'd like to have a look at the 
encryption dictionaries.

Tilman

Am 16.03.2016 um 20:19 schrieb Stahle, Patrick:
> Hi,
>
> This is not a general problem and only occurs with original PDF generated 
> with 3D content using Anark. The file when loaded seems to have encrypted and 
> loads just find in Adobe Reader, but when we try to do a "Save As" we get the 
> following error:
> "The document could not be saved. There was a problem reading this document 
> 21."
>
> If I do a control click on the "ok" button. I get the following message:
> "This direct object already has a container."
>
> Any ideas what might be causing this problem? We have tried the same thing 
> with iText and it does not experience this problem.
>
> Sample Code that reproduces the problem:
>  PDDocument doc = null;
>
>  try {
>  doc = PDDocument.load(pdfIn);
>  PDPage page = null;
>  AccessPermission apermission 
> = new AccessPermission();
>  
> apermission.setCanAssembleDocument(false);
>  
> apermission.setCanExtractContent(false);
>  
> apermission.setCanExtractForAccessibility(true);
>  
> apermission.setCanFillInForm(true);
>  
> apermission.setCanModifyAnnotations(true);
>  
> apermission.setCanPrint(true);
>  apermission.setReadOnly();
>  StandardProtectionPolicy spp 
> = new StandardProtectionPolicy(UUID.randomUUID().toString(), "", apermission);
>  doc.protect(spp);
>
>  for (int i = 0; i < 
> doc.getNumberOfPages(); i++) {
>  page = 
> doc.getPage(i);
>  
> PDPageContentStream canvas = new PDPageContentStream(doc, page, 
> PDPageContentStream.AppendMode.APPEND, true, true);
>  
> canvas.saveGraphicsState();
>  
> canvas.restoreGraphicsState();
>  
> canvas.close();
>  }
>  doc.save(pdfOut);
>  bRet = true;
>  }
>  finally {
>  if (doc != null) {
>  doc.close();
>  }
>  }
>
> Thanks,
> Patrick
>
>


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: 

Spaces are ignored when reading a PDF file

2016-03-19 Thread Hesham G.
Hello ,

I have a PDF file created using Latex. I am trying to read and print all 
letters in that file using PDFBox, but when doing this all spaces in that file 
are ignored. Here is the code I am using:
PDPage page = (PDPage)allPages.get( 0 );
PDStream contents = page.getContents();
if ( contents != null ) {
PDFTextStripperProcessor pdfTextStripperProcessor = new 
PDFTextStripperProcessor();
pdfTextStripperProcessor.processStream( page, page.findResources(), 
contents.getStream() );
}

public class PDFTextStripperProcessor extends PDFTextStripper {
@Override
public void processTextPosition( TextPosition text )  {
System.out.println( text.getCharacter() );
}
}

And you can check a one page file sample here to test it:
https://dl.dropboxusercontent.com/u/10111483/downloads/pdfbox/pdf_latex_spaces_ignored.pdf

What is the cause of this issue please?


Best regards ,
Hesham

pdfrenderer unicode characters in latest 2.1.0-SNAPSHOT ?

2016-03-19 Thread Jesse Kuhnert
It appears as if the java2d font rendering logic in new 2.x versions missed
something with unicode support. I have pdfs being output wonderfully with
my sample unicode text (bengali in this case) but when we try to produce
images any glyphs we have which aren't english appear to just be rendered
as "blank" somehow. (meaning the spacing and line heights look like there
is ghost text taking up space there but maybe that's just our logic for
laying out pdf)

Has anyone else tried to produce unicode based pdf images yet ?


Re: Orientation printout - used 2.0.0.RC3 release

2016-03-19 Thread John Hewson

> On 15 Mar 2016, at 13:12, Tilman Hausherr  wrote:
> 
> I recommend that you try with your own version of PDFPageable. I looked at 
> the code and IMHO the problem is that a rotation is set that you don't want. 
> See in getPageFormat. I haven't tested it, but I assume you should delete 
> these lines:
> 
> 
>if (mediaBox.getWidth() > mediaBox.getHeight())
>{
>// rotate
>paper = new Paper();
>paper.setSize(mediaBox.getHeight(), mediaBox.getWidth());
>paper.setImageableArea(cropBox.getLowerLeftY(), 
> cropBox.getLowerLeftX(),
>cropBox.getHeight(), cropBox.getWidth());
>isLandscape = true;
>}
>else

You don’t want to do that though, because Java doesn’t handle landscape sized 
paper correctly. So it’s necessary to always print in portrait, rotating any 
landscape pages first, and flagging them as being landscape.

— John

> Here's the full code. If this doesn't get through, download the source code.
> 
> 
> public final class PDFPageable extends Book
> {
>private final PDDocument document;
>private final boolean showPageBorder;
>private final float dpi;
>private final Orientation orientation;
> 
>/**
> * Creates a new PDFPageable.
> *
> * @param document the document to print
> */
>public PDFPageable(PDDocument document)
>{
>this(document, Orientation.AUTO, false, 0);
>}
> 
>/**
> * Creates a new PDFPageable with the given page orientation.
> *
> * @param document the document to print
> * @param orientation page orientation policy
> */
>public PDFPageable(PDDocument document, Orientation orientation)
>{
>this(document, orientation, false, 0);
>}
> 
>/**
> * Creates a new PDFPageable with the given page orientation and with 
> optional page borders
> * shown. The image will be rasterized at the given DPI before being sent 
> to the printer.
> *
> * @param document the document to print
> * @param orientation page orientation policy
> * @param showPageBorder true if page borders are to be printed
> */
>public PDFPageable(PDDocument document, Orientation orientation, boolean 
> showPageBorder)
>{
>this(document, orientation, showPageBorder, 0);
>}
> 
>/**
> * Creates a new PDFPageable with the given page orientation and with 
> optional page borders
> * shown. The image will be rasterized at the given DPI before being sent 
> to the printer.
> *
> * @param document the document to print
> * @param orientation page orientation policy
> * @param showPageBorder true if page borders are to be printed
> * @param dpi if non-zero then the image will be rasterized at the given 
> DPI
> */
>public PDFPageable(PDDocument document, Orientation orientation, boolean 
> showPageBorder,
>   float dpi)
>{
>this.document = document;
>this.orientation = orientation;
>this.showPageBorder = showPageBorder;
>this.dpi = dpi;
>}
> 
>@Override
>public int getNumberOfPages()
>{
>return document.getNumberOfPages();
>}
> 
>/**
> * {@inheritDoc}
> *
> * Returns the actual physical size of the pages in the PDF file. May not 
> fit the local printer.
> */
>@Override
>public PageFormat getPageFormat(int pageIndex)
>{
>PDPage page = document.getPage(pageIndex);
>PDRectangle mediaBox = PDFPrintable.getRotatedMediaBox(page);
>PDRectangle cropBox = PDFPrintable.getRotatedCropBox(page);
> 
>// Java does not seem to understand landscape paper sizes, i.e. where 
> width > height, it
>// always crops the imageable area as if the page were in portrait. I 
> suspect that this is
>// a JDK bug but it might be by design, see PDFBOX-2922.
>//
>// As a workaround, we normalise all Page(s) to be portrait, then flag 
> them as landscape in
>// the PageFormat.
>Paper paper;
>boolean isLandscape;
>if (mediaBox.getWidth() > mediaBox.getHeight())
>{
>// rotate
>paper = new Paper();
>paper.setSize(mediaBox.getHeight(), mediaBox.getWidth());
>paper.setImageableArea(cropBox.getLowerLeftY(), 
> cropBox.getLowerLeftX(),
>cropBox.getHeight(), cropBox.getWidth());
>isLandscape = true;
>}
>else
>{
>paper = new Paper();
>paper.setSize(mediaBox.getWidth(), mediaBox.getHeight());
>paper.setImageableArea(cropBox.getLowerLeftX(), 
> cropBox.getLowerLeftY(),
>cropBox.getWidth(), cropBox.getHeight());
>isLandscape = false;
>}
> 
>PageFormat format = new PageFormat();
>format.setPaper(paper);
> 
>// auto portrait/landscape
>if 

Re: PrintTextLocations 1.8 vs 2.0

2016-03-19 Thread Tilman Hausherr

Am 16.03.2016 um 09:52 schrieb Peter Prusinowski:



thank you for the hints, now I am overwriting showGlyph() and trying 
to get the value with


PDSimpleFont sf = (PDSimpleFont) font;
String name = sf.getEncoding().getName(code);
sf.getPath(name).getBounds()

but I am getting the same height, no matter which font size is set. 
This happens with type1 and truetype fonts. What am I doing wrong ? 


Here's some code, use it with the DrawPrintTextLocations example. Please 
tell if it works, and if possible, upload files where it doesn't.



@Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, 
int code, String unicode, Vector displacement) throws IOException

{
super.showGlyph(textRenderingMatrix, font, code, unicode, 
displacement);


Rectangle2D bounds = null;
AffineTransform at = textRenderingMatrix.createAffineTransform();

if (font instanceof PDSimpleFont)
{
PDSimpleFont simpleFont = (PDSimpleFont) font;
String name = simpleFont.getEncoding().getName(code);
GeneralPath path = simpleFont.getPath(name);
bounds = path.getBounds2D();

at.scale(1/1000f, 1/1000f);

if (font instanceof PDTrueTypeFont)
{
PDTrueTypeFont ttFont = (PDTrueTypeFont) font;
int unitsPerEm = 
ttFont.getTrueTypeFont().getHeader().getUnitsPerEm();

at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
}
}
else if (font instanceof PDVectorFont)
{
PDVectorFont vectorFont = (PDVectorFont) font;
GeneralPath path = vectorFont.getPath(code);
bounds = path.getBounds2D();
at.scale(1/1000f, 1/1000f);

if (font instanceof PDType0Font)
{
PDType0Font t0font = (PDType0Font) font;
if (t0font.getDescendantFont() instanceof PDCIDFontType2)
{
int unitsPerEm = ((PDCIDFontType2) 
t0font.getDescendantFont()).getTrueTypeFont().getHeader().getUnitsPerEm();

at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
}
}
}
else
{
System.out.println("TODO other: " + font.getClass());
}
if (bounds != null)
{
Shape s = at.createTransformedShape(bounds);

// flip y-axis
AffineTransform flip = new AffineTransform();
flip.translate(0, getCurrentPage().getBBox().getHeight());
flip.scale(1, -1);
s = flip.createTransformedShape(s);

AffineTransform transform = g2d.getTransform();
int rotation = getCurrentPage().getRotation();
if (rotation != 0)
{
PDRectangle mediaBox = getCurrentPage().getMediaBox();
switch (rotation)
{
case 90:
g2d.translate(mediaBox.getHeight(), 0);
break;
case 270:
g2d.translate(0, mediaBox.getWidth());
break;
case 180:
g2d.translate(mediaBox.getWidth(), 
mediaBox.getHeight());

break;
default:
break;
}
g2d.rotate(Math.toRadians(rotation));
}

g2d.setColor(Color.CYAN);
g2d.draw(s);

if (rotation != 0)
{
g2d.setTransform(transform);
}
}
}


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: ExtractText command line OPTIONS

2016-03-19 Thread Tilman Hausherr

Am 18.03.2016 um 17:38 schrieb Gordon Schneider:

What is the proper away to add the -sort option to the command. I have looked 
at lots of different things to figure this out without any success.


just use

-sort



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Data from PDF - > MS Access

2016-03-19 Thread Al Grant
Definitely not a question for this list, but perhaps I would be better
doing the whole email to database in VBA?

On Thu, Mar 17, 2016 at 5:45 PM, Al Grant  wrote:

> Thanks Ken.
>
> Yes that's the only way I can see so far.
>
> There is a good outlook connector called moyosoft but it does cost about
> $200
>
> The field itself might be called CompanyXYZ Ave the data type longstring.
>
> I will also need record number in the email to update the correct record.
>
> Cheers
>
> Al
> On 17/03/2016 5:28 pm, "Ken Bowen"  wrote:
>
>> What is the nature of the feedback into the database?
>>
>> If it amounts to more or less make entries in fields in the db,
>> and you are stuck with email as a medium, you might hack up a
>> convention like this:
>> 1) Select a recognizable boundary line (begin & end), say a line of
>> at least 10 + or *, or whatever.
>> 2) Between the boundary lines, have your compatriots make entries like:
>> [Field Name] = [Value to be input]
>> with the restriction that no ‘=‘ sign occurs on either side (or replace
>> the use of ‘=‘ by something else that would satisfy that restriction).
>>
>> You can knock out a script to process each email to a csv file, and
>> then import that to your Access db.
>>
>> Regards,
>> Ken Bowen
>>
>> On Mar 16, 2016, at 9:23 PM, Al Grant  wrote:
>>
>> > Hi All
>> >
>> > This might be slightly OT - but the list was so helpful in the past...
>> >
>> > I have a database in MS Access and a standalone Java applet that imports
>> > data from a PDF form and scrapes data from the PDF form into
>> corresponding
>> > fields in the Access Database. (Thanks to the list for help on this!)
>> >
>> > A report from this Access database then goes out via email to a handful
>> of
>> > people in other companies and they need a way to provide feedback into
>> the
>> > database.
>> >
>> > The question is how to achieve this feedback?
>> >
>> > It is difficult because the I am working within a number of constraints:
>> >
>> > 1. We are all working behind large corporate firewalls;
>> > 2. I have Office 2007 installed;
>> > 3. Our shared mailboxes are accessed only via a web interface (not OWA
>> - I
>> > think Lotus)
>> >
>> > Getting ports on firewall, vpns etc is not an option, nor are cloud
>> > services like Dropbox or Amazon, so I think I am stuck with email as the
>> > transport medium.
>> >
>> > Solutions?
>> >
>> > Cheers
>> >
>> > -AL
>> >
>> > --
>> > "Beat it punk!"
>> > - Clint Eastwood
>>
>>
>> -
>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>> For additional commands, e-mail: users-h...@pdfbox.apache.org
>>
>>


-- 
"Beat it punk!"
- Clint Eastwood


Re: Spaces are ignored when reading a PDF file

2016-03-19 Thread Hesham G.

Clovis,

Thanks a lot :)

I will have to follow this solution if there is no alternative. The problem 
is that if I am extracting text of 500 or 600 pages PDF, that will consume 
much additional memory and time. In addition I guess it's only a special 
case for latex books only.


Best regards ,
Hesham


Included message :


just an idea from whom is not fluent in pdfbox nor PDF.
if you just want to know there is a space in between the letters and not
the amount of spaces, you can use your code to get character details and
then use extractText to get the words.

2016-03-17 7:20 GMT-03:00 Hesham G. :


Andreas,

That is very helpful.

I can get the x location of each character using TextPosition.getX(), ex:
W: 102.88399
i: 114.18165
t: 117.660614
h: 121.55801
d: 133.09477
u: 140.3994
e: 147.60838

So to detect the space between the 2 words "With" & "due" should I make
subtraction calculations between X of the last letter(h) and the X of the
first letter (d) and if the number is large than normal then this is a
space? I think this way might be risky in the detection, or what?


Best regards ,
Hesham


Included message :

Hi,

Frank van der Hulst  hat am 17. März 2016 um

08:34
geschrieben:


Spaces don't exist as characters in PDFs. To identify spaces, you have to
compare the X coordinates of adjacent characters against their widths.


That's not correct, spaces exist but in most cases pdf engines omit them
and
replace spaces by a splitted text with an appropriate positioning.

BTW, latex uses the same strategy. Here is a excerpt from your pdf:

  [ (W) 55 (ith) -383 (due) -384 (r) 18 (egar) 18 (d) -383 (to) -383
(Article)
-384 (\(219\),) -416 (the) -384 (competent) -383 (authority) -383 (has)
-384
(the) -383 (right) ] TJ

The text is in between the braces and the numbers are used for horizontal
positioning.

BR
Andreas



On Thu, Mar 17, 2016 at 7:12 PM, Hesham G. 
wrote:

> Hello ,
>
> I have a PDF file created using Latex. I am trying to read and print 
> all

> letters in that file using PDFBox, but when doing this all spaces in >
that
> file are ignored. Here is the code I am using:
> PDPage page = (PDPage)allPages.get( 0 );
> PDStream contents = page.getContents();
> if ( contents != null ) {
> PDFTextStripperProcessor pdfTextStripperProcessor = new
> PDFTextStripperProcessor();
> pdfTextStripperProcessor.processStream( page, page.findResources(),
> contents.getStream() );
> }
>
> public class PDFTextStripperProcessor extends PDFTextStripper {
> @Override
> public void processTextPosition( TextPosition text )  {
> System.out.println( text.getCharacter() );
> }
> }
>
> And you can check a one page file sample here to test it:
>
>
https://dl.dropboxusercontent.com/u/10111483/downloads/pdfbox/pdf_latex_spaces_ignored.pdf
>
> What is the cause of this issue please?
>
>
> Best regards ,
> Hesham



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org





-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: Overlaying PDFs > 1.4 fails & Hyperlinks are not clickable in overlays

2016-03-19 Thread Anna Taracha
Thank you for fixing this issue. I just tested the newest trunk and 
overlaying PDFs having XRef streams works great :)


-anna-

On 12.03.2016 16:43, Andreas Lehmkuehler wrote:

Am 10.03.2016 um 21:09 schrieb Tilman Hausherr:

Am 10.03.2016 um 19:39 schrieb Anna Taracha:

Hello!

You can find the original PDF file and the overlay PDF file here:

http://www.elisanet.fi/jakorasia/ANNA/doc.pdf (original PDF)
http://www.elisanet.fi/jakorasia/ANNA/wm.pdf (overlay file)


None of them is a PDFA file. They have many other errors beside the XRef
streams, although these can be fixed.
I've fixed the issue with the xref stream in the current trunk, see 
PDFBX-3263.


Apparently there is another problem with overlays, Andreas just 
reopened a

closed issue because of that.
The changes from the reopened issue weren't the reason for the 
problem. I've opened another ticket and fixed the issue in the current 
trunk as well, see PDFBOX-3266.


BR
Andreas


They both have XRef streams and do not have parameter errors 
(according to
3-Heights™ PDF Validator Online Tool). They are based on version 
1.5. The

result file has parameter error so something strange happens during
overlaying. As all the files are based on version 1.5, they should 
be allowed

to have XRef streams which was introduced in version 1.5. Preflight can
validate only PDF/A-1b which is based on version 1.4 so I can't use 
it to

validate PDFs 1.5 fully.

However, your suggested code did the trick and now my PDF files are 
valid. I
guess there is something wrong with XRef stream after overlaying. I 
will use

your code as a workaround for now, so thank you for it :)

Yes, I was asking, if clickable hyperlinks in overlays will be 
possible in a
future release (you would create a similar feature that I did)? I am 
just

curious, if it is something that will/could be considered.


I can't answer that because I have no opinion, I've never used that 
overlay
feature. I assume you mean transferring annotations that are in the 
overlay
file. Isn't this something that can also be done by just adding the 
annotation
to the page annotation list? What would happen if both pages have 
different sizes?


But feel free to share any code you have. We have committed many user
suggestions, but not every suggestion gets committed. If it is 
refused or

ignored, just keep trying and come back with more new ideas :-)


Tilman



-anna-

On 09.03.2016 22:22, Tilman Hausherr wrote:

Am 09.03.2016 um 16:47 schrieb Anna Taracha:

Result of PDFBox 2.0.0-RC3:

http://www.elisanet.fi/jakorasia/ANNA/result-PDFBox2.pdf

Result of PDFBox 1.8.11:

http://www.elisanet.fi/jakorasia/ANNA/result-PDFBox18.pdf


Here's what I get with PDF-Tools:

1.8:
Validating file "result-PDFBox18.pdf" for conformance level pdfa-1b
dc:language :: Wrong value type. Expected type 'bag'.
dc:date :: Wrong value type. Expected type 'seq'.
The required XMP property 'pdfaid:part' is missing.
The required XMP property 'pdfaid:conformance' is missing.
A device-specific color space (DeviceGray) without an appropriate 
output

intent is used.
The document does not conform to the requested standard.
The document contains device-specific color spaces.
The document's meta data is either missing or inconsistent or corrupt.
Done.

2:
Validating file "result-PDFBox2.pdf" for conformance level pdfa-1b
The file contains cross reference streams. <=
The file trailer dictionary is missing or invalid. <=
dc:language :: Wrong value type. Expected type 'bag'.
dc:date :: Wrong value type. Expected type 'seq'.
The required XMP property 'pdfaid:part' is missing.
The required XMP property 'pdfaid:conformance' is missing.
A device-specific color space (DeviceGray) without an appropriate 
output

intent is used.
The document does not conform to the requested standard.
The file format (header, trailer, objects, xref, streams) is 
corrupted.

The document contains device-specific color spaces.
The document's meta data is either missing or inconsistent or corrupt.
Done.



preflight has a similar error for 2.0 only:
1.4 : Trailer Syntax error, /XRef cross reference streams are not 
allowed



Please try this before saving:

doc.getDocument().setIsXRefStream(false);
COSDictionary trailer = doc.getDocument().getTrailer();
trailer.removeItem(COSName.W);
trailer.removeItem(COSName.DECODE_PARMS);
trailer.removeItem(COSName.FILTER);
trailer.removeItem(COSName.TYPE);
trailer.removeItem(COSName.INDEX);
trailer.removeItem(COSName.LENGTH);


About your other question - it is unclear. Are you asking whether 
Overlay
will be available in the future? Or do you want us to create a 
feature like

the one you did on your own?

Tilman


-anna-

On 07.03.2016 19:19, Tilman Hausherr wrote:
Could you upload two documents, one good and one bad? I'll have a 
look with

PDFDebugger to find out what's wrong.

Tilman

Am 07.03.2016 um 16:30 schrieb Anna Taracha:


Re: Strange performance problem with certain PDF files

2016-03-19 Thread John Hewson

> On 18 Mar 2016, at 12:01, Stahle, Patrick  wrote:
> 
> Hi all,
> 
> I am running into a lot of strange performance issues with certain PDF files.
> 
> Background info:
> The strange thing I can't reproduce this consistently. When I get a pdf being 
> generated on a particular environment it seems consistent. I do most of my 
> development inside VirtualBox virtual machine running fedora. These pdf files 
> I am having problems with never have performance issues when run on my 
> virtual machine local drive, but if I use a Virtual Box Shared drive as the 
> source / destination for the PDF, I see the problem. Another co-worker 
> working from pure windows environment experience the performance problem. We 
> are also seeing the same issue on our dev solaris servers. The performance 
> range can be quite drastic on one of our 3DPDF's (12meg) running on my local 
> environment it can be opened, stamped with some text, encrypted, and saved in 
> around 8 sec. Doing the same job pointing to a virtual box share drive or on 
> our solaris server that same work will take minutes. On my coworkers windows 
> environment it takes around 30 seconds. We really only reproduced this 
> consistently on the 12m 3D  PDF. I have a much smaller pdf (non 3d / convert 
> from msoffice) that does show similar performance issue but the times range 
> from 200ms local to 8 sec.

You need to isolate the problem, you’ve got too many variables to make any 
sense of it all. Get a reproducible problem on one, non-virtualised JVM first.

— John

> The one thing I see in common between the 2 files is I see a lot of the 
> following messages to the console:
> Using output from the 12m 3DPDF file:
> :
> :
> 1787 [main] DEBUG org.apache.pdfbox.pdfparser.PDFObjectStreamParser  - 
> parsed=COSObject{13166, 0}
> 
> These messages seem to happen on the PDDocument.open and from what I can 
> tell, I get 13,166 of these messages in this example PDF.
> The slowness does not happen until the following line:
> document.save(outputPDFStream);
> 
> Other PDF's including some quite large I do not see this performance issue 
> nor those log messages.
> 
> I know this is not much to go on, I am working on seeing if I can isolate 
> this down to something more concrete / reproducible point. But I thought I 
> would send this out to see if anyone has any ideas or have seen issues 
> similar to this? Suggestions?
> 
> Thanks,
> Patrick
> 


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



RE: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being encrypted by PDFBox 2 snapshot

2016-03-19 Thread Stahle, Patrick
PDFBox opens the file fine

Code:
PDDocument doc = null;

try {
doc = PDDocument.load(pdfIn);
PDPage page = null;

for (int i = 0; i < doc.getNumberOfPages(); i++) {
page = doc.getPage(i);
PDPageContentStream canvas = new 
PDPageContentStream(doc, page, PDPageContentStream.AppendMode.APPEND, true, 
true);
canvas.saveGraphicsState();
canvas.restoreGraphicsState();
canvas.close();
}
bRet = true;
}
finally {
if (doc != null) {
doc.close();
}
}

Output:
0 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer  - ScratchFileBuffer 
not closed!
2 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer  - ScratchFileBuffer 
not closed!
3 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer  - ScratchFileBuffer 
not closed!
3 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer  - ScratchFileBuffer 
not closed!
3 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer  - ScratchFileBuffer 
not closed!
3 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer  - ScratchFileBuffer 
not closed!
4 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer  - ScratchFileBuffer 
not closed!
4 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer  - ScratchFileBuffer 
not closed!
4 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer  - ScratchFileBuffer 
not closed!
5 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer  - ScratchFileBuffer 
not closed!
5 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer  - ScratchFileBuffer 
not closed!
5 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer  - ScratchFileBuffer 
not closed!
5 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer  - ScratchFileBuffer 
not closed!
7 [Finalizer] DEBUG org.apache.pdfbox.io.ScratchFileBuffer  - ScratchFileBuffer 
not closed!


-Original Message-
From: Stahle, Patrick [mailto:patrick.sta...@te.com] 
Sent: Wednesday, March 16, 2016 3:45 PM
To: users@pdfbox.apache.org
Subject: RE: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being 
encrypted by PDFBox 2 snapshot

Also other applications like PDF Exchange do not exhibit the problem. Most 
likely has something to do with the 3D capabilities of Adobe Reader

-Original Message-
From: Tilman Hausherr [mailto:thaush...@t-online.de] 
Sent: Wednesday, March 16, 2016 3:38 PM
To: users@pdfbox.apache.org
Subject: Re: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being 
encrypted by PDFBox 2 snapshot

Can you reopen the file you saved with PDFBox? If not, please open an issue in 
JIRA and attach your file.

If yes, just upload the file somewhere, I'd like to have a look at the 
encryption dictionaries.

Tilman

Am 16.03.2016 um 20:19 schrieb Stahle, Patrick:
> Hi,
>
> This is not a general problem and only occurs with original PDF generated 
> with 3D content using Anark. The file when loaded seems to have encrypted and 
> loads just find in Adobe Reader, but when we try to do a "Save As" we get the 
> following error:
> "The document could not be saved. There was a problem reading this document 
> 21."
>
> If I do a control click on the "ok" button. I get the following message:
> "This direct object already has a container."
>
> Any ideas what might be causing this problem? We have tried the same thing 
> with iText and it does not experience this problem.
>
> Sample Code that reproduces the problem:
>  PDDocument doc = null;
>
>  try {
>  doc = PDDocument.load(pdfIn);
>  PDPage page = null;
>  AccessPermission apermission 
> = new AccessPermission();
>  
> apermission.setCanAssembleDocument(false);
>  
> apermission.setCanExtractContent(false);
>  
> apermission.setCanExtractForAccessibility(true);
>  
> apermission.setCanFillInForm(true);
>  
> apermission.setCanModifyAnnotations(true);
>  
> apermission.setCanPrint(true);
>  apermission.setReadOnly();
>  StandardProtectionPolicy spp 
> = new StandardProtectionPolicy(UUID.randomUUID().toString(), "", apermission);
>

RE: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being encrypted by PDFBox 2 snapshot

2016-03-19 Thread Thalvayapati, Raghu - BLS CTR
Please un subscribe from this

-Original Message-
From: Stahle, Patrick [mailto:patrick.sta...@te.com] 
Sent: Thursday, March 17, 2016 3:06 PM
To: users@pdfbox.apache.org
Subject: RE: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being 
encrypted by PDFBox 2 snapshot

Would you like the original pdf prior to my encrypting for comparison, and the 
simplest code sample I have? I can also upload that one too...

Also debugging through my code I noticed, what seemed like if I didn't use the 
initialization constructor my sets didn't change the value of the actual byte.

Ex.
AccessPermission apermission = new AccessPermission();
apermission.setCanPrint(true);
apermission.setCanModifyAnnotations(true);
apermission.setCanAssembleDocument(true);
apermission.setCanFillInForm(true);
apermission.setCanExtractForAccessibility(true);
apermission.setReadOnly();

Eclipse debugger shows: 
bytes= -4 
readOnly= true

while:
AccessPermission apermission = new AccessPermission(0);
apermission.setCanPrint(true);
apermission.setCanModifyAnnotations(true);
apermission.setCanAssembleDocument(true);
apermission.setCanFillInForm(true);
apermission.setCanExtractForAccessibility(true);
apermission.setReadOnly();

Eclipse debugger shows: 
bytes= 1828 
readOnly= true

Which looks correct to me, or least compares to what I see from iText (minus 
the readOnly bit).

-Original Message-
From: Tilman Hausherr [mailto:thaush...@t-online.de]
Sent: Thursday, March 17, 2016 2:55 PM
To: users@pdfbox.apache.org
Subject: Re: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being 
encrypted by PDFBox 2 snapshot

Am 17.03.2016 um 14:21 schrieb Stahle, Patrick:
> Ok,  I think I have the  file uploaded to the following:
> http://wikisend.com/download/381906/tmp_10435-Technical  Data Package 
> BOMAnarkStampedPDFBox102259703.pdf
>
> The link is hard for me to test since I have to do this all from my phone. If 
> it says not found or expired try putting in 381906 from the download page...

Thanks, it worked. I did find some weirdness: parts of the encryption object 
exists twice.

659 0 obj
<<
/Filter /Standard
/V 1
/R 3
/Length 40
/P -1044
/O <92B3A580FEDD525873E5DEA425E75E1B74858FD5C6F5FED7E4C6C39C2E23D2DB>
/U <50D7EE978EFC3D29DAF239DA746CCC2228BF4E5E4E758A4164004E56FFFA0108>
 >>
endobj
660 0 obj
<<
/ID [<3DADA7608D955343B3F967EB90F6801F> <1C65E39CFBD4D44099223D10A9D542B5>]
/Info 13 0 R
/Root 1 0 R
/Encrypt <<
/Filter /Standard
/V 1
/R 3
/Length 40
/P -1044
/O <92B3A580FEDD525873E5DEA425E75E1B74858FD5C6F5FED7E4C6C39C2E23D2DB>
/U <50D7EE978EFC3D29DAF239DA746CCC2228BF4E5E4E758A4164004E56FFFA0108>
 >>
/Type /XRef
/Size 661
/Index [1 659]
/W [1 3 0]
/Filter /FlateDecode
/Length 1881
 >>



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: pdfrenderer unicode characters in latest 2.1.0-SNAPSHOT ?

2016-03-19 Thread Jesse Kuhnert
Yeah that one worked fine for me too it just doesn't in the slightly more
elaborate situation we are rendering stuff in . Though the actual pdf is
rendered perfectly. Meh ok sorry I don't have a more definitive idea why it
isn't working in our real product code vs smaller debug examples where it
works fine. I'll file a bug report if I ever find anything .

On Wednesday, March 16, 2016, Tilman Hausherr  wrote:

> Am 16.03.2016 um 22:23 schrieb Jesse Kuhnert:
>
>> It appears as if the java2d font rendering logic in new 2.x versions
>> missed
>> something with unicode support. I have pdfs being output wonderfully with
>> my sample unicode text (bengali in this case) but when we try to produce
>> images any glyphs we have which aren't english appear to just be rendered
>> as "blank" somehow. (meaning the spacing and line heights look like there
>> is ghost text taking up space there but maybe that's just our logic for
>> laying out pdf)
>>
>> Has anyone else tried to produce unicode based pdf images yet ?
>>
>>
>
> I just tried to render the file created with the EmbeddedFonts example and
> it works fine.
>
> Tilman
>
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>


Re: Spaces are ignored when reading a PDF file

2016-03-19 Thread Tilman Hausherr

Am 17.03.2016 um 07:12 schrieb Hesham G.:

Hello ,

I have a PDF file created using Latex. I am trying to read and print all 
letters in that file using PDFBox, but when doing this all spaces in that file 
are ignored.


Here's what I get with ExtractText (your code is unusual), this 
looks excellent to me:


article titles c©by Michael O’Kane are not part of the law mu7ami.com
Article [220] Right to Regulate
With due regard to Article (219), the competent authority has the right
of monitoring the companies with regard to application of the provisions
set forth in the law and the company’s articles of association and bylaw
including the authority to inspect the company and check its account and
ask for data from the board of directors or the company managers through
a representative or more of its personnel or experts it chooses for this 
pur-

pose.
Article [221] Access to Records
All the company officials shall acquaint the Ministry representatives and
the Authority, fi the company is listed in the financial market or 
seeking to
be listed, with regard to the works stated in Article (220), all that 
they ask

of company books and records and documents and provide them with all
related information or clarification.
94 version 0.2 provided by mu7ami.com


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: JBIG2 Images

2016-03-19 Thread Tilman Hausherr

Am 14.03.2016 um 10:24 schrieb Felix Hermann:

My interpretation: The compiler finds org.jpedal.jbig2.jai.JBIG2ImageReaderSpi. 
However, it does not realize, that there is an ImageReader ...


That's the jpedal plugin. We tried to use that one a few years ago and 
failed. It's the levigo plugin that works (at least for us). I'm 
wondering why you haven't opened an issue on their site, re the deadlock 
you got.

https://github.com/levigo/jbig2-imageio/issues


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



RE: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being encrypted by PDFBox 2 snapshot

2016-03-19 Thread Stahle, Patrick
Where would you suggest I upload a sample file too? I will try reopening it in 
pdfbox in a moment.

-Original Message-
From: Tilman Hausherr [mailto:thaush...@t-online.de] 
Sent: Wednesday, March 16, 2016 3:38 PM
To: users@pdfbox.apache.org
Subject: Re: Strange "Save As" issue with Adobe Reader 11 / DC with PDF being 
encrypted by PDFBox 2 snapshot

Can you reopen the file you saved with PDFBox? If not, please open an issue in 
JIRA and attach your file.

If yes, just upload the file somewhere, I'd like to have a look at the 
encryption dictionaries.

Tilman

Am 16.03.2016 um 20:19 schrieb Stahle, Patrick:
> Hi,
>
> This is not a general problem and only occurs with original PDF generated 
> with 3D content using Anark. The file when loaded seems to have encrypted and 
> loads just find in Adobe Reader, but when we try to do a "Save As" we get the 
> following error:
> "The document could not be saved. There was a problem reading this document 
> 21."
>
> If I do a control click on the "ok" button. I get the following message:
> "This direct object already has a container."
>
> Any ideas what might be causing this problem? We have tried the same thing 
> with iText and it does not experience this problem.
>
> Sample Code that reproduces the problem:
>  PDDocument doc = null;
>
>  try {
>  doc = PDDocument.load(pdfIn);
>  PDPage page = null;
>  AccessPermission apermission 
> = new AccessPermission();
>  
> apermission.setCanAssembleDocument(false);
>  
> apermission.setCanExtractContent(false);
>  
> apermission.setCanExtractForAccessibility(true);
>  
> apermission.setCanFillInForm(true);
>  
> apermission.setCanModifyAnnotations(true);
>  
> apermission.setCanPrint(true);
>  apermission.setReadOnly();
>  StandardProtectionPolicy spp 
> = new StandardProtectionPolicy(UUID.randomUUID().toString(), "", apermission);
>  doc.protect(spp);
>
>  for (int i = 0; i < 
> doc.getNumberOfPages(); i++) {
>  page = 
> doc.getPage(i);
>  
> PDPageContentStream canvas = new PDPageContentStream(doc, page, 
> PDPageContentStream.AppendMode.APPEND, true, true);
>  
> canvas.saveGraphicsState();
>  
> canvas.restoreGraphicsState();
>  
> canvas.close();
>  }
>  doc.save(pdfOut);
>  bRet = true;
>  }
>  finally {
>  if (doc != null) {
>  doc.close();
>  }
>  }
>
> Thanks,
> Patrick
>
>


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



NullPointerException in multithreading

2016-03-19 Thread Tilman Hausherr

Hello 风云天空,

This is obviously not related to "Spaces are ignored when reading a PDF 
file" so you should have created a new subject line instead of hijacking 
an existing thread by pressing "reply".


I did have the same problem while working on
https://issues.apache.org/jira/browse/PDFBOX-3267

What I did was to change the source code of PDICCBased.java, i.e. change 
this line


awtColorSpace = (ICC_ColorSpace)ColorSpace.getInstance(ColorSpace.CS_sRGB);


to

synchronized(LOG)
 {
   awtColorSpace = 
(ICC_ColorSpace)ColorSpace.getInstance(ColorSpace.CS_sRGB);

}


This is a java bug. I'm undecided whether the change above should be 
committed. But try the change :-)


Tilman



Am 18.03.2016 um 12:02 schrieb 风云天空:

who can help me
i get this error in multithreading
java.lang.NullPointerException
at 
java.awt.color.ICC_Profile.activateDeferredProfile(ICC_Profile.java:1086)
at java.awt.color.ICC_Profile$1.activate(ICC_Profile.java:742)
at 
sun.java2d.cmm.ProfileDeferralMgr.activateProfiles(ProfileDeferralMgr.java:95)
at java.awt.color.ICC_Profile.getInstance(ICC_Profile.java:775)
at java.awt.color.ICC_Profile.getInstance(ICC_Profile.java:1013)
at 
org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.loadICCProfile(PDICCBased.java:119)
at 
org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.(PDICCBased.java:89)
at 
org.apache.pdfbox.pdmodel.graphics.color.PDColorSpace.create(PDColorSpace.java:182)
at 
org.apache.pdfbox.pdmodel.PDResources.getColorSpace(PDResources.java:172)
at 
org.apache.pdfbox.pdmodel.PDResources.getColorSpace(PDResources.java:142)
at 
org.apache.pdfbox.contentstream.operator.color.SetNonStrokingColorSpace.process(SetNonStrokingColorSpace.java:41)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:814)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:471)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:445)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:187)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:208)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:139)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:80)
at 
com.liaoyoujin.pdfbox.doc.PdfExtractor.getFirstImage(PdfExtractor.java:109)
at com.liaoyoujin.pdfbox.doc.PdfExtractor$Job.run(PdfExtractor.java:178)
at 
com.liaoyoujin.thread.pool.BlockThreadPool$Worker.run(BlockThreadPool.java:53)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
java.util.ConcurrentModificationException
at java.util.Vector$Itr.checkForComodification(Vector.java:1156)
at java.util.Vector$Itr.next(Vector.java:1133)




-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org