from:"sahy...@fileaffairs.de"

Re: Reading page using PDFTextStripper

2020-11-23 Thread sahy...@fileaffairs.de



Hi,
Am Sonntag, den 22.11.2020, 07:10 +0200 schrieb Hesham Gneady:
> I've tried it now, but it made no difference. I've actually explained
> the
> problem wrong, here's what actually happens:
> 
> The 1st line in the PDF file is:
> 
> 131 Comments are made from 1905, / See: Certain Neurotic Mechanisms
> in
> 
> Where "131" is normal text, while the rest of the line has
> "Subscript"
> formatting. If I copy/paste the line from the PDF manually it copies
> it
> right ordered, but when extracting the text using PDFBox it extracts
> it like
> this:
> 
> Comments are made from 1905, / See: Certain Neurotic Mechanisms in
> 131
> 
> The text is being read before the "131" number.


that's what I'm getting using the -sort option using PDFBox 2.0.21

131 Comments are made from 1905, / See: Certain Neurotic Mechanisms in 
Jealousy, Paranoia, and Homosexuality. (Internat. Journ. Psycho-
Analysis, vol. iv, 
April, 1923.) Freud, S. / A response to a mother’s concern about her
son’s 
homosexuality 1935 -Letters of Sigmund Freud. E. L. Freud (Ed.). New
York, NY: 
Basic Books. P 423. In this letter Freud links homosexuality to
‘arrested 
development.’
132 Allan Schore, Affect Regulation and the Origin of the self,
Lawrence Erlbaum 
1994. p 24

BR
Maruan


> 
>  
> 
>  
> 
> Best regards,
> 
> Hesham
> 
>  
> 
> -
> ---
> --
> 
> Included Message:
> 
>  
> 
> Am 17.11.20 um 07:54 schrieb Hesham Gneady:
> 
> > Hi,
> 
> > 
> 
> >   
> 
> > 
> 
> > I am trying to read this PDF file using
> 
> > PDFTextStripper.processTextPosition():
> 
> > 
> 
> >  <
> > https://dl.dropboxusercontent.com/s/o660xrp4sgp9tbv/PDFTextStripper%20
> > >
> https://dl.dropboxusercontent.com/s/o660xrp4sgp9tbv/PDFTextStripper%20
> 
> > readin
> 
> > g%20sample.pdf?dl=0
> 
> > 
> 
> >   
> 
> > 
> 
> > But when I do that it reads it with wrong order. It reads the 2nd
> > line 
> 
> > before the 1st line because the 1st line has Subscript effect. Is 
> 
> > there a way to read it right ordered?
> 
> I a pdf the text doesn't neccessarly appear in the rendering order.
> You
> should give the sort option a try:
> 
>  
> 
> org.apache.pdfbox.text.PDFTextStripper.setSortByPosition(boolean)
> 
>  
> 
>  
> 
> Andreas
> 
>  
> 
> -
> 
> To unsubscribe, e-mail:  
> users-unsubscr...@pdfbox.apache.org
> 
> For additional commands, e-mail:   users-h...@pdfbox.apache.org>
> users-h...@pdfbox.apache.org
> 
>  
>

Re: Missing Field Values

2021-05-07 Thread sahy...@fileaffairs.de



Could you upload a sample PDF to a shared hoster to take a look?

BR
Maruan

Am Freitag, dem 07.05.2021 um 11:02 +0200 schrieb Ranjeet Kuruvilla:
> I am converting a byte[] to PDDocument and had a shocking experience:
> There were field values (not the fields themselves) missing. I
> compared
> PDFBox 2.0.23 to IText.
> 
> Acroform acroform = PDDocument.load(source, password);
> 
> HashMap fields2 = (new PDFReader(source,
> password)).getAcroFields().getFields(); // Fields from IText
> 
> for(PDField field: acroform.getFields()) // Fields from PDFBox
> 
> {
> 
>     System.out.println("Field " + field.getFullyQualifiedName() + "
> IText [" + acroform.getField(field.getFullyQualifiedName()) + "]
> PDFBox
> [" + field.getValueAsString() + "]");
> 
> }
> 
> The result was occassionally akin to
> 
>     Field KEY IText [Value] PDFBox []
> 
> I expected it to be
> 
>     Field KEY IText [Value] PDFBox  [Value]
> 
> . It might be, that that particular PDF has Fields with the same key,
> because I did not experience that problem with other PDFs.
> 
> May I ask whether there is a known bug with PDFBox 2.0.23, that
> allows
> for such a behaviour? How come, that PDFs created in C++ are no
> longer
> readable in PDFBox? How can I fix the bug? I do not wish to use IText
> to
> solve it.
> 
> 
> 

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: flattening

2021-05-25 Thread sahy...@fileaffairs.de



Am Dienstag, dem 25.05.2021 um 18:33 +0200 schrieb Ranjeet Kuruvilla:
> Hallo.
> It is clear, that flattening does have a bug. Compare flattening of
> PDFBox with IText and you realize, that PDFBox destroys all fields,
> once
> I call
> 
> acroform.flatten(fields, true) or acroform.flatten(fields, false)
> 
> 

the purpose of flatten is that the flattened fields are removed and
become part of the regular page content. 

If you'd like to keep the field but would like it to be protected you
have to set the field to read only. Keep in mind that one could use a
lib and remove the flag and change the content afterwards.

Now, if there is really a bug I need to have a clear description how to
reproduce it together with sample content. If you're able to provide
that I'm happy to take a look.

BR
Maruan

> .
> 
> How can I request someone to fix flattening and make it work like in
> IText. That is that the right fields are flattened while all the other
> fields remain untouched!
> 
> 
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
> 




-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: flattening

2021-05-26 Thread sahy...@fileaffairs.de

Am Mittwoch, dem 26.05.2021 um 07:53 +0200 schrieb Ranjeet Kuruvilla:
> I know how Itext works and PDFBox does not work the same. In Itext the
> result is that only the right fields are flattened. It seems there is a
> bug.

I've created a small test which is running fine:

PDDocument document = Loader.loadPDF(new File(IN_DIR, NAME_OF_PDF));
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
int numFieldsBefore = acroForm.getFields().size();

List toBeFlattened = new ArrayList<>();
PDTextField field = (PDTextField) acroForm.getField("AlignLeft-
Filled");
toBeFlattened.add(field);
acroForm.flatten(toBeFlattened,false);

assertEquals(numFieldsBefore, acroForm.getFields().size() + 1, "The
number of form fields shall be reduced by one");

A manual inspection of the result shows that all fields but one are
still accessible.

The file being used was
src/test/resources/org/apache/pdfbox/pdmodel/interactive/form/Multiline
Fields.pdf

Version used was the current trunk.

So please do a similar test with your files. I'm not saying there is no
bug but we have to be able to reproduce it to be able to solve it. With
the files I've tested with the result is fine.

BR
Maruan


> 
> In PdfBox every field is flattened, even though I delivered a list of
> fields, that were meant to be flattened.
> 
> It seems the PDF is destroyed by flattening.
> 
> I have attached the code.
> 
> On 25.05.21 19:27, Tilman Hausherr wrote:
> > Here's a PDF flattened by itext
> > https://github.com/itext/i7js-examples/blob/develop/cmpfiles/sandbox/acroforms/cmp_checkbox_flatten.pdf
> > 
> > and they do something similar to what we do, i.e. remove the fields
> > and converting it to form XObjects.
> > Tilman
> > 
> > Am 25.05.2021 um 19:03 schrieb sahy...@fileaffairs.de:
> > > Am Dienstag, dem 25.05.2021 um 18:33 +0200 schrieb Ranjeet
> > > Kuruvilla:
> > > > Hallo.
> > > > It is clear, that flattening does have a bug. Compare flattening
> > > > of
> > > > PDFBox with IText and you realize, that PDFBox destroys all
> > > > fields,
> > > > once
> > > > I call
> > > > 
> > > > acroform.flatten(fields, true) or acroform.flatten(fields, false)
> > > > 
> > > > 
> > > the purpose of flatten is that the flattened fields are removed and
> > > become part of the regular page content.
> > > 
> > > If you'd like to keep the field but would like it to be protected
> > > you
> > > have to set the field to read only. Keep in mind that one could use
> > > a
> > > lib and remove the flag and change the content afterwards.
> > > 
> > > Now, if there is really a bug I need to have a clear description
> > > how to
> > > reproduce it together with sample content. If you're able to
> > > provide
> > > that I'm happy to take a look.
> > > 
> > > BR
> > > Maruan
> > > 
> > > > .
> > > > 
> > > > How can I request someone to fix flattening and make it work like
> > > > in
> > > > IText. That is that the right fields are flattened while all the
> > > > other
> > > > fields remain untouched!
> > > > 
> > > > 
> > > > 
> > > > -
> > > > 
> > > > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > > > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > > > 
> > > 
> > > 
> > > 
> > > ---
> > > --
> > > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > > 
> > 
> > 
> > -
> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > 
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Problem while parsing XMPMetadata with embedded Schemas

2021-06-24 Thread sahy...@fileaffairs.de




Am Donnerstag, dem 24.06.2021 um 16:40 +0200 schrieb Jörn Mörsch:
> Hello PDFBox Community,
> 
> we have a problem parsing the *XMP* metadata of a PDF using
> *xmpParser.parse(InputStream)*. The *XMP* metadata contains external
> schemas, when trying to validate the data we get the error "*Cannot
> find
> a definition for the namespace http://ns.ftx.com/forms/1.0/*";.

assuming you are using XMPBox it does not support arbritary XMP Schemas
only a predefined set which is being defeines to be usable in PDF/A.

There is an initial discussion about it's future and depending on the
outcome this restriction might be removed.

BR
Maruan
> 
> According to the following *pages*, the *XMP metadata* should be
> correct:
> 
> https://www.w3.org/RDF/Validator/ 
> 
> https://www.pdflib.com/pdf-knowledge-base/xmp/free-xmp-validator/
> 
> 
> Does anyone know what exactly the problem is? XMP metadata not
> correct?
> Can it be a Bug in PDFBox?
> 
> Kind Regards,
> 
> Joern Moersch
> 



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: XDP support

2021-08-01 Thread sahy...@fileaffairs.de

Hi,

may I add to that.

If you have an XDP which is embedded in a PDF there is limited support
within PDFBox to support handling that i.e. you can get the XDP from
the PDF update it and put it back into the PDF.

PDFBox does not support creating a PDF from scratch from a pure XDP
file. One could create a stub PDF using PDFBox and embed the XDP in it
but only as dynamic XDP (i.e. all PDF elements are created at runtime
by supporting viewers such as Adobe Reader).

If what you are looking for is a PDF representation of the (rendered)
content of the XDP again PDFBox does not support that.

To get full XDP support you need to look for a commercial solution.
There are a number of vendors on the market.

Bottom line is that if you need to handle an XDP based workflow
(creation, filling, rendering, printing ...) PDFBox is not suitable for
your needs.

BR
Maruan



Am Samstag, dem 31.07.2021 um 15:51 +0200 schrieb Tilman Hausherr:
> Am 30.07.2021 um 11:18 schrieb Manjuka Gunaseekara:
> > Hi Team,
> > We currently doing a POC to read some XDP forms and prefill it and
> > convert
> > to PDF. We tried with your tool for PDF forms and saw it can
> > prefill and
> > generate as PDF. So would like to see the same use case can be
> > handled for
> > XDP. Let me know a way forward. Or if your current framework not
> > supporting
> > this could you please let me know any third party to integrate with
> > PDFBox
> > to get my use case completed.
> > Looking forward to hearing from you.
> > Thank you
> > Manjuka
> > 
> 
> Hi,
> 
> It's not supported, I don't know what it is (apparently some Adobe 
> available elsewhere see if you can understand this format and
> construct 
> something from our existing code.
> 
> Tilman
> 
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
> 

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Setting the value of a single-select PDChoice field

2021-08-16 Thread sahy...@fileaffairs.de

If I understand the question correctly it's about being able to use the
List based setter with a single select option field if the List has a
single entry. If the List has more than one entry still throw the
exception.

Is that correct?

BR
Maruan

Am Dienstag, dem 17.08.2021 um 04:47 +0200 schrieb Tilman Hausherr:
> IMHO the only bug is that a setter can set an illegal option.
> 
> The rest looks fine to me, a user should know what field they are
> filling.
> 
> Tilman
> 
> 
> Am 16.08.2021 um 18:10 schrieb Oliver Degener:
> > Hi,
> > 
> > I'm using PDFBox to automatically fill in all kinds of PDF
> > documents. Recently, I was seeing the following error message with
> > a new PDF:
> > 'The list box does not allow multiple selections.'
> > 
> > I saw in the code that the PDChoice field can be single-select or
> > multi-select. With single-select fields, the setter that takes a
> > list of choices [0] throws an exception whenever the given list is
> > not empty. Therefore I now have to use the general setter from
> > PDField [1] which does not check whether the provided option is
> > valid (selectable).
> > 
> > It would be great if the setter from [0] would also work for
> > single-select choices and would only throw an exception if the
> > list's size is greater than 1.
> > 
> > Any thoughts on this?
> > 
> > Thanks & Regards
> > Oliver
> > 
> > [0]
> > https://pdfbox.apache.org/docs/2.0.3/javadocs/org/apache/pdfbox/pdmodel/interactive/form/PDChoice.html#setValue(java.util.List)
> > [1]
> > https://pdfbox.apache.org/docs/2.0.3/javadocs/org/apache/pdfbox/pdmodel/interactive/form/PDChoice.html#setValue(java.lang.String)
> > 
> > ---
> > --
> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > 
> 
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
> 

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Rending text in thumbnail images

2021-09-09 Thread sahy...@fileaffairs.de





would it be possible to provide a sample PDF which also reveals the
issue but doesn't contain confidential data?

BR
Maruan

Am Donnerstag, dem 09.09.2021 um 08:44 -0700 schrieb John Lussmyer:
> On Wed Sep 08 20:31:47 PDT 2021 thaush...@t-online.de said:
> > Ooops, you didn't mention that you turned antialiasing off. The
> > image
> > looks as if interpolation was also turned off. If you set rendering
> > hints you always have to set all the hints you need. Here's the
> > default:
> > 
> >     private RenderingHints createDefaultRenderingHints(Graphics2D
> > graphics)
> >     {
> >     RenderingHints r = new RenderingHints(null);
> >     r.put(RenderingHints.KEY_INTERPOLATION, isBitonal(graphics)
> > ?
> > RenderingHints.VALUE_INTERPOLATION_NEAREST_NEIGHBOR :
> >     RenderingHints.VALUE_INTERPOLATION_BICUBIC);
> >     r.put(RenderingHints.KEY_RENDERING,
> > RenderingHints.VALUE_RENDER_QUALITY);
> >     r.put(RenderingHints.KEY_ANTIALIASING, isBitonal(graphics)
> > ?
> > RenderingHints.VALUE_ANTIALIAS_OFF :
> > RenderingHints.VALUE_ANTIALIAS_ON);
> >     return r;
> >     }
> 
> So, setting one Rendering Hint discards all default values?  What
> does it use for those others then?
> 
> Just tried with this set:
> hintlist.put(RenderingHints.KEY_ANTIALIASING,
> RenderingHints.VALUE_ANTIALIAS_OFF);
> hintlist.put(RenderingHints.KEY_TEXT_ANTIALIASING,
> RenderingHints.VALUE_TEXT_ANTIALIAS_OFF);
> hintlist.put(RenderingHints.KEY_INTERPOLATION,
> RenderingHints.VALUE_INTERPOLATION_BICUBIC);
> hintlist.put(RenderingHints.KEY_RENDERING,
> RenderingHints.VALUE_RENDER_QUALITY);
> hintlist.put(RenderingHints.KEY_FRACTIONALMETRICS,
> RenderingHints.VALUE_FRACTIONALMETRICS_ON);
> 
> I also tried a variation with VALUE_INTERPOLATION_NEAREST_NEIGHBOR.
> 
> No change.  Still looks like random pixels scattered on the page.
> 
> 
> --
> 
> Tigers prowl and Dragons soar in my dreams...
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Save the fillable PDF filled by a user

2021-10-15 Thread sahy...@fileaffairs.de

Dear Jack,

when submitting PDF there are several options you have. The one you are
looking for is submitting the data as HTML data.

You'll find details in
https://opensource.adobe.com/dc-acrobat-sdk-docs/acrobatsdk/pdfs/acrobatsdk_jsapiref.pdf

Look for the submitForm method. 

That gives you all the details for the action you have to set for the
submit button. Setting the action accordingly will get you the data at
the web server similar to an HTML form submission. For complex forms
you might be better of with XML data or other options.

BR
Maruan



Am Freitag, dem 15.10.2021 um 00:17 -0400 schrieb Light Speed:
> Thank you Tilman. I actually read that post and others but so far none
> of
> them answered my question. My question was not on how to create a
> button.
> My question was how to capture the values that the user filled in and
> submitted. This is AFTER the user clicks the button. In regular java
> web
> applications, I know how to submit a HTML form and process the incoming
> request object with users values, but in the case of submitting PDF
> forms,
> I just couldn't put my hands on it when submitting a pdf form, Any
> suggestions? Thank you again.
> 
> On Tue, Oct 12, 2021 at 2:08 AM Tilman Hausherr 
> wrote:
> 
> > There is a somewhat obscure way to do this with a push button:
> > 
> > https://stackoverflow.com/questions/58611014/adding-a-button-with-the-submitform-function-with-pdfbox-in-java
> > 
> > Try to work with that one, then share the code and the PDF if it
> > doesn't
> > work; that person got it to work, except that the button wasn't
> > displayed properly.
> > 
> > Tilman
> > 
> > Am 12.10.2021 um 02:08 schrieb Light Speed:
> > > Hi
> > > 
> > > I need some guidance from all your PDFBox users. I have been trying
> > different features like creating PDF and adding forms, etc. All
> > seeemed to
> > be fine. My use case is as following, the final goal is to save the
> > PDF
> > that customer has filled:
> > > 
> > > 1. Create a Fillable PDF, with some text boxes, radio buttons, etc.
> > > Save
> > it as MyFormTemplate (This works)
> > > 2. Make it available via a web page (This works too.)
> > > 3. The plan is to create a submit button on the PDF form to submit
> > > the
> > filled form (the form will be filled by users). Once the server gets
> > the
> > filled form, it will save the form as a new PDF file with a new name
> > (say
> > JackForm01.pdf) (so that the original MyFormTemplate can be used over
> > and
> > over again by others).
> > > 
> > > My question is at #3, how do I capture the data that user fills? Or
> > > even
> > if I want to ask users to save it and then upload, I still need some
> > mechanism to capture the change.
> > > 
> > > Many thanks folks!
> > > 
> > > -Jack
> > > 
> > > ---
> > > --
> > > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > > 
> > 
> > 
> > -
> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > 
> > 

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Question on PDFTextStripper.getText()

2021-10-16 Thread sahy...@fileaffairs.de

Hi,

Am Samstag, dem 16.10.2021 um 16:57 +0530 schrieb Naganand Kanagal:
> Hi,
> 
> Noticed that PDFTextStripper.getText() returns the last line of a PDF
> document  as the first line in a text life. 

Text in the PDF doesn't necessarily appear in visual order. You can use
PDFTextStripper.setSortByPosition prior to getText() to extract text
closer to visual order.

BR
Maruan


> Since I am pattern searching in
> a document for "Name" and name happens to be the first line in these
> documents it really gets me the wrong information. Why does the last
> line
> become the first line? Is there a way to set this right?
> 
> 
> Logfile:
> nio-443-exec-2] ProcessDoc.ProcessDocument
> (ProcessDocument.java:156)  []
>  - ProcessDocument:readFromPDFFile, txt extracted*:**Kindly refer to
> my
> LinkedIn profile for More details related to certifications,
> education etc.
> *
> 
> Yogesh Dixit
>     Gurgaon, Haryana
> 
> 
> The first line in PDF is Yogesh Dixit Any help will be appreciated.
> Regards,
> Naganand Kanagal
> 
> 
> Regards,
> Naganand Kanagal




-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Guidance on heap space error when printing document

2021-11-05 Thread sahy...@fileaffairs.de

t; > > > > HashPrintRequestAttributeSet();
> > > > > 
> > > > >   try(PDDocument doc = PDDocument.load(pdf,
> > > > > MemoryUsageSetting.setupMixed(500L))){
> > > > >   job.setPageable(new PDFPageable(doc));
> > > > >   job.print(attributes);
> > > > >   }
> > > > > 
> > > > > This works really well for thousands of PDFs, but we've run
> > > > > into one
> > > > > particular PDF that causes an OutOfMemoryException.
> > > > > 
> > > > > 
> > > > > The problem PDF is rendered (does not contain bitmaps).
> > > > > 
> > > > > I've provided a huge amount of heap (over 1GB now), and it is
> > > > > still
> > > > failing.
> > > > > As near as I can tell from the stack trace (which I'll
> > > > > include below),
> > it
> > > > > seems like the problem is with creation of a huge buffered
> > > > > image.
> > > > > 
> > > > > The physical page size in the PDF is 8.5 x 11".
> > > > > 
> > > > > The PDFPrintable is configured as follows:
> > > > > 
> > > > > Scaling is ACTUAL_SIZE
> > > > > dpi is 0.0
> > > > > subsamplingAllowed is false
> > > > > renderingHints is null
> > > > > 
> > > > > 
> > > > > Here is the stack trace:
> > > > > 
> > > > > java.lang.OutOfMemoryError: Java heap space
> > > > > at
> > > > java.desktop/java.awt.image.DataBufferInt.(DataBufferInt.
> > > > java:75)
> > > > > at
> > java.desktop/java.awt.image.Raster.createPackedRaster(Raster.java:4
> > 67)
> > > > > at
> > > > > 
> > > > 
> > java.desktop/java.awt.image.DirectColorModel.createCompatibleWritab
> > leRaster(DirectColorModel.java:1032)
> > > > > at
> > > > java.desktop/java.awt.image.BufferedImage.(BufferedImage.
> > > > java:333)
> > > > > at
> > org.apache.pdfbox.rendering.TilingPaint.getImage(TilingPaint.java:1
> > 43)
> > > > > at
> > > > > org.apache.pdfbox.rendering.TilingPaint.(TilingPaint.ja
> > > > > va:103)
> > > > > at
> > > > > 
> > > > 
> > org.apache.pdfbox.rendering.TilingPaintFactory.create(TilingPaintFa
> > ctory.java:60)
> > > > > at
> > > > > org.apache.pdfbox.rendering.PageDrawer.getPaint(PageDrawer.ja
> > > > > va:351)
> > > > > at
> > > > > 
> > > > 
> > org.apache.pdfbox.rendering.PageDrawer.getNonStrokingPaint(PageDraw
> > er.java:719)
> > > > > at
> > > > > org.apache.pdfbox.rendering.PageDrawer.fillPath(PageDrawer.ja
> > > > > va:819)
> > > > > at
> > > > > 
> > > > 
> > org.apache.pdfbox.contentstream.operator.graphics.FillEvenOddRule.p
> > rocess(FillEvenOddRule.java:37)
> > > > > at
> > > > > 
> > > > 
> > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDF
> > StreamEngine.java:932)
> > > > > at
> > > > > 
> > > > 
> > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperat
> > ors(PDFStreamEngine.java:510)
> > > > > at
> > > > > 
> > > > 
> > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFSt
> > reamEngine.java:484)
> > > > > at
> > > > > 
> > > > 
> > org.apache.pdfbox.contentstream.PDFStreamEngine.showForm(PDFStreamE
> > ngine.java:187)
> > > > > at
> > org.apache.pdfbox.rendering.PageDrawer.showForm(PageDrawer.java:146
> > 2)
> > > > > at
> > > > > 
> > > > 
> > org.apache.pdfbox.contentstream.operator.graphics.DrawObject.proces
> > s(DrawObject.java:86)
> > > > > at
> > > > > 
> > > > 
> > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDF
> > StreamEngine.java:932)
> > > > > at
> > > > > 
> > > > 
> > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperat
> > ors(PDFStreamEngine.java:510)
> > > > > at
> > > > > 
> > > > 
> > org.apache.pdfbox.c

Re: Flattening does not affect widgets not linked to a field

2021-12-23 Thread sahy...@fileaffairs.de

Hello Constantine,

which version of PDFBox are you using?

Flatten of AcroForm works from the Fields entry so if there are no
entries and only Widget Annotations they remain as is. We introduced a
number of fixups last year (PDFBox 2.0.22) which can be exectuted and
build the Fields entry from existing widget annotations.

PDDocumentCatalog.getAcroForm(PDDocumentFixup)

BR
Maruan

Am Donnerstag, dem 23.12.2021 um 15:52 +0200 schrieb Constantine
Dokolas:
> I need to make sure I understand how flattening is supposed to
> operate.
> 
> I have a PDF that, as far as I can tell, was an AcroForm that
> underwent
> some sort of partial flattening. What I see when opening it with the
> debugger, is a normal AcroForm object but without any fields.
> Instead, the
> widget annotations contain the T and V entries themselves.
> 
> AcroForm.flatten() does not embed these widgets. However, PDFium does
> embed
> these and so does the Aspose PDF library.
> 
> Is this the intended behavior?
> 
> Thanks in advance and keep up the good work!
> Constantine
> 
> --
> There is a computer disease that anybody who works with computers
> knows
> about. It's a very serious disease and it interferes completely with
> the
> work. The trouble with computers is that you 'play' with them!
> - Richard P. Feynman

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: How to set PDF/A to an existing PDF

2022-01-31 Thread sahy...@fileaffairs.de

There is a CreatePDFA.java example in the examples subprojekt for
PDFBox 2.0 as well as the current trunk version.

With kind regards
Maruan


Am Montag, dem 31.01.2022 um 16:03 -0500 schrieb Tommy Wu:
> The following cookbook is not long working for the new version. Do
> you have
> a way to do it now?
> 
> 
> 
> Apache PDFBox | PDF/A Creation
> 
> 
> 
> 
> Thanks



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: appearance stream from MacOS Adobe Reader

2022-05-24 Thread sahy...@fileaffairs.de

Hi Kai,

could you either upload the XFDF to a public location or copy the XML
node of the annotation? I would be interested in the element which has
that value to check if the range is permitted or not.

BR
Maruan

Am Dienstag, dem 24.05.2022 um 09:12 + schrieb Kai Keggenhoff:
> Hi,
> 
> we're seeing a problem with XFDF generated by MacOS Adobe Reader for
> stamp annotations.
> For some stamps, Adobe Reader generates appearance stream data that
> contain the number 4294967036
> (The Windows version apparently does not !)
> 
> When we try to process such XFDF, we get
> 
> java.lang.NumberFormatException: For input string: "4294967036"
>     at
> java.lang.NumberFormatException.forInputString(NumberFormatException.
> java:65) ~[?:1.8.0_332]
>     at java.lang.Integer.parseInt(Integer.java:583)
> ~[?:1.8.0_332]
>     at java.lang.Integer.parseInt(Integer.java:615)
> ~[?:1.8.0_332]
>     at
> org.apache.pdfbox.pdmodel.fdf.FDFAnnotationStamp.parseDictElement(FDF
> AnnotationStamp.java:396) ~[pdfbox-app-2.0.25.jar:2.0.25]
>     at
> org.apache.pdfbox.pdmodel.fdf.FDFAnnotationStamp.parseDictElement(FDF
> AnnotationStamp.java:380) ~[pdfbox-app-2.0.25.jar:2.0.25]
>     at
> org.apache.pdfbox.pdmodel.fdf.FDFAnnotationStamp.parseDictElement(FDF
> AnnotationStamp.java:380) ~[pdfbox-app-2.0.25.jar:2.0.25]
>     at
> org.apache.pdfbox.pdmodel.fdf.FDFAnnotationStamp.parseDictElement(FDF
> AnnotationStamp.java:380) ~[pdfbox-app-2.0.25.jar:2.0.25]
>     at
> org.apache.pdfbox.pdmodel.fdf.FDFAnnotationStamp.parseStreamElement(F
> DFAnnotationStamp.java:234) ~[pdfbox-app-2.0.25.jar:2.0.25]
>     at
> org.apache.pdfbox.pdmodel.fdf.FDFAnnotationStamp.parseStampAnnotation
> AppearanceXML(FDFAnnotationStamp.java:174) ~[pdfbox-app-
> 2.0.25.jar:2.0.25]
>     at
> org.apache.pdfbox.pdmodel.fdf.FDFAnnotationStamp.(FDFAnnotation
> Stamp.java:133) ~[pdfbox-app-2.0.25.jar:2.0.25]
>     at
> org.apache.pdfbox.pdmodel.fdf.FDFDictionary.(FDFDictionary.java
> :211) ~[pdfbox-app-2.0.25.jar:2.0.25]
>     at
> org.apache.pdfbox.pdmodel.fdf.FDFCatalog.(FDFCatalog.java:63)
> ~[pdfbox-app-2.0.25.jar:2.0.25]
>     at
> org.apache.pdfbox.pdmodel.fdf.FDFDocument.(FDFDocument.java:90)
> ~[pdfbox-app-2.0.25.jar:2.0.25]
> 
> Do I blame MacOS Adobe Reader for generating that number when its
> Windows brother does not ?
> Do I blame Java 8's Integer.parseInt for not being lenient enough ?
> Do I blame PDFBox for not using Long ?
> Do I blame myself for fudging the last release and not deploying
> 2.0.26 properly ?
> Is there any sensible way to correct such data before trying to
> process it ?
> 
> All the best,
> 
> Kai

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: appearance stream from MacOS Adobe Reader

2022-05-24 Thread sahy...@fileaffairs.de

Hi Kai,

there are several ways to look at it:

- is a value of 4294966971 permitted?
- does something like 





make sense?


For the first one it's a little unclear as one could argue that the
integer data typeas it only states that this is a mathematical integer.

There are som ehint about implementations: 

The (PDF 2.0) spec has this about it (Annex C - Architectural limits)

"Integer values (such as object numbers) can often be expressed within
32 bits."

The 1.7 spec has this (Annex C - Architectural limits)

integer 2,147,483,647 Largest integer value; equal to 231 − 1.

So from that perspective the values are too large.


Do the values make sense? Looking at an Ascent of 1006 and the Descent
of 4294966971 this would be a funny looking font. Same applies to the
Lab color settings etc. So I think although one could make it work
(e.g. by using long instead of int) in PDFBox I'm not sure that this is
really something wanted.

What are the values when you export the same data from Windows? As you
are dealing with xml you could strip the data before doing the import
as a solution for your case.

BR
Maruan



Am Dienstag, dem 24.05.2022 um 11:48 + schrieb Kai Keggenhoff:
> Hi Maruan,
> 
> unfortunately I cannot share neither the PDF nor the whole XFDF, but
> I located the annotation that caused it and looked at its appearance
> stream.
> These large, unsigned INT values seem to appear in font (Descent,
> FontBBox) and color space (Range) related entries.
> Here are a few samples (incomplete XML) :
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Thank you for looking into this !
> 
> Kai
> 
> -Ursprüngliche Nachricht-
> Von: sahy...@fileaffairs.de  
> Gesendet: Dienstag, 24. Mai 2022 12:17
> An: users@pdfbox.apache.org
> Betreff: Re: appearance stream from MacOS Adobe Reader
> 
> CAUTION - External Sender
> 
> 
> Hi Kai,
> 
> could you either upload the XFDF to a public location or copy the XML
> node of the annotation? I would be interested in the element which
> has
> that value to check if the range is permitted or not.
> 
> BR
> Maruan
> 
> Am Dienstag, dem 24.05.2022 um 09:12 + schrieb Kai Keggenhoff:
> > Hi,
> > 
> > we're seeing a problem with XFDF generated by MacOS Adobe Reader
> > for
> > stamp annotations.
> > For some stamps, Adobe Reader generates appearance stream data that
> > contain the number 4294967036
> > (The Windows version apparently does not !)
> > 
> > When we try to process such XFDF, we get
> > 
> > java.lang.NumberFormatException: For input string: "4294967036"
> >     at
> > java.lang.NumberFormatException.forInputString(NumberFormatExceptio
> > n.
> > java:65) ~[?:1.8.0_332]
> >     at java.lang.Integer.parseInt(Integer.java:583)
> > ~[?:1.8.0_332]
> >     at java.lang.Integer.parseInt(Integer.java:615)
> > ~[?:1.8.0_332]
> >     at
> > org.apache.pdfbox.pdmodel.fdf.FDFAnnotationStamp.parseDictElement(F
> > DF
> > AnnotationStamp.java:396) ~[pdfbox-app-2.0.25.jar:2.0.25]
> >     at
> > org.apache.pdfbox.pdmodel.fdf.FDFAnnotationStamp.parseDictElement(F
> > DF
> > AnnotationStamp.java:380) ~[pdfbox-app-2.0.25.jar:2.0.25]
> >     at
> > org.apache.pdfbox.pdmodel.fdf.FDFAnnotationStamp.parseDictElement(F
> > DF
> > AnnotationStamp.java:380) ~[pdfbox-app-2.0.25.jar:2.0.25]
> >     at
> > org.apache.pdfbox.pdmodel.fdf.FDFAnnotationStamp.parseDictElement(F
> > DF
> > AnnotationStamp.java:380) ~[pdfbox-app-2.0.25.jar:2.0.25]
> >     at
> > org.apache.pdfbox.pdmodel.fdf.FDFAnnotationStamp.parseStreamElement
> > (F
> > DFAnnotationStamp.java:234) ~[pdfbox-app-2.0.25.jar:2.0.25]
> >     at
> > org.apache.pdfbox.pdmodel.fdf.FDFAnnotationStamp.parseStampAnnotati
> > on
> > AppearanceXML(FDFAnnotationStamp.java:174) ~[pdfbox-app-
> > 2.0.25.jar:2.0.25]
> >     at
> > org.apache.pdfbox.pdmodel.fdf.FDFAnnotationStamp.(FDFAnnotati
> > on
> > Stamp.java:133) ~[pdfbox-app-2.0.25.jar:2.0.25]
> >     at
> > org.apache.pdfbox.pdmodel.fdf.FDFDicti

Re: ExtractImages command test - images appear different

2022-08-02 Thread sahy...@fileaffairs.de

Hi Daniel,

the command you are using extracts images contaoined in the PDF but
doesn't render the PDF into an Image. 

Use https://pdfbox.apache.org/2.0/commandline.html#pdftoimage

BR
Maruan

Am Dienstag, dem 02.08.2022 um 15:31 + schrieb Daniel Earwicker:
> Hi, this project looks perfect for my needs - converting PDF pages
> into images for easy rendering elsewhere. This is very much my first
> try so apologies in advance if this is a stupid question, but in the
> docs at https://pdfbox.apache.org/2.0/commandline.html I can't see
> any options that might improve the output.
> 
> Here's a side-by-side comparison, ExtractImages output on the left,
> and the PDF opened in chrome on the right:
> 
> https://imgur.com/a/KgNAZQ2
> 
> The PDF is an example I got from:
> https://www.ets.org/Media/Tests/GRE/pdf/gre_research_validity_data.pdf
> 
> Just in case this is relevant, I ran it a clean debian container:
> 
>     docker run -it -v c:/Users/me:/external debian:bullseye-slim
> 
>     apt update
>     apt install openjdk-17-jre -y
>     apt install wget -y
>     wget https://dlcdn.apache.org/pdfbox/2.0.26/pdfbox-app-2.0.26.jar
> 
> and then tested with:
> 
>     java -jar pdfbox-app-2.0.26.jar ExtractImages -prefix
> /external/extract-test /external/gre_research_validity_data.pdf
> 
> The screenshot is of the resulting extract-test-2.jpg file.
> 
> There's obviously some problem with the colours, and also there's a
> lot of extra stuff in the page margins that Chrome somehow knows it
> ought to hide. Is there any way to configure this extraction process
> so the image to look like how Chrome displays it? And for this kind
> of accurate rendering to work for the majority of PDFs? (this being
> the first one I tried). Thanks!
> This email is from FISCAL Technologies Limited, a company registered
> in England and Wales with company number 4801836, whose registered
> office is at 448 Basingstoke Road, Reading, RG2 0LP, United Kingdom.
> This notice applies to this email and to any other email subsequently
> sent by anyone at FISCAL Technologies Limited and appearing in the
> same chain of email correspondence. References below to "this email"
> should be read accordingly. The contents of this email and any
> attachments (if any) are private and confidential. If you have
> received this message in error, please notify us immediately by
> returning it to the sender or call our switchboard on +44 (0) 845 680
> 1905 and remove it from your system, do not use, copy or disclose it.
> The opinions expressed within this communication are not necessarily
> those expressed by FISCAL Technologies Limited. Emails are not secure
> and may contain viruses and it is your responsibility to scan
> attachments (if any).  The e-mail system of FISCAL Technologies
> Limited is subject to random monitoring. For information about how we
> use your personal data (including your rights) please see our privacy
> policy - https://www.fiscaltec.com/uk/general/privacy-policy/
> Visit our website at www.fiscaltec.co.uk<http://www.fiscaltec.co.uk>

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Fill value in the PDF Checkbox Field using PDAcroForm.

2022-08-16 Thread sahy...@fileaffairs.de

In addition,

are you sure that Indeterminate or Intersex are the potential values
defined for the checkbox(es). The visible text is not what determines
the potential values fo rthe checked/unchecked state. Also it's unusual
that a checkbox has 3 potenmtial values as it's either checked or
unchecked.

If you upload the the PDF to a shared location I can take a look.

BR
Maruan



Am Mittwoch, dem 17.08.2022 um 08:41 +0200 schrieb Tilman Hausherr:
> Hi,
> 
> If this is really a PDCheckBox then call check() or uncheck(). I
> can't 
> see the image, maybe this was an attachment.
> 
> Tilman
> 
> Am 17.08.2022 um 08:26 schrieb Damaji Kalunge:
> > Hi Team,
> > 
> >   In the editable PDF we have a checkbox as shown below.
> > image.png
> > 
> > *Indeterminate / Intersex / Unspecified []*
> > *
> > *
> >    We are able to check the above checkbox using below code and
> > only 
> > with value "*unspecified" .
> > * Then only the checkbox is ticked in the filled PDF.
> > COSDictionary 
> > cosDictionary3=pDAcroForm.getField("ap.sex").getCOSObject(); 
> > cosDictionary3.setString(COSName.V ,"unspecified");
> > **
> >   We have requirement to set the checkbox value *Indeterminate or 
> > **Intersex * then by with value  "i*ndeterminate"  or "i**ntersex
> > *" 
> > does not ticked the checkbox in the filled PDF.
> > 
> >  I am afraid if we set  the last value to PDF Checkbox then
> > downstream 
> > processing may have an impact which is not in our control to fix or
> > analyze.
> > 
> > Could you please help out in this scenario ?
> > 
> > Thanks
> > Damaji.
> > 
> > *
> > *
> 

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Fill value in the PDF Checkbox Field using PDAcroForm.

2022-08-17 Thread sahy...@fileaffairs.de



reading a little further (and not being able to see the image) I'm
guessing that there is a single checkbox which you can tick if the
person has one of the three potential states. But that's not possible
to as a checkbox can only hold a value for unchecked (default OFF) and
one for checked. So checking by providing one of the three values and
also capturing that is not possible. 

You should use a List field in that case or a group of radio buttons or
have a hidden field where you in addition to checking the checkbox hold
the value of what checked means.

BR
Maruan

  
Am Mittwoch, dem 17.08.2022 um 08:54 +0200 schrieb
sahy...@fileaffairs.de:
> In addition,
> 
> are you sure that Indeterminate or Intersex are the potential values
> defined for the checkbox(es). The visible text is not what determines
> the potential values fo rthe checked/unchecked state. Also it's
> unusual
> that a checkbox has 3 potenmtial values as it's either checked or
> unchecked.
> 
> If you upload the the PDF to a shared location I can take a look.
> 
> BR
> Maruan
> 
> 
> 
> Am Mittwoch, dem 17.08.2022 um 08:41 +0200 schrieb Tilman Hausherr:
> > Hi,
> > 
> > If this is really a PDCheckBox then call check() or uncheck(). I
> > can't 
> > see the image, maybe this was an attachment.
> > 
> > Tilman
> > 
> > Am 17.08.2022 um 08:26 schrieb Damaji Kalunge:
> > > Hi Team,
> > > 
> > >   In the editable PDF we have a checkbox as shown below.
> > > image.png
> > > 
> > > *Indeterminate / Intersex / Unspecified []*
> > > *
> > > *
> > >    We are able to check the above checkbox using below code and
> > > only 
> > > with value "*unspecified" .
> > > * Then only the checkbox is ticked in the filled PDF.
> > > COSDictionary 
> > > cosDictionary3=pDAcroForm.getField("ap.sex").getCOSObject(); 
> > > cosDictionary3.setString(COSName.V ,"unspecified");
> > > **
> > >   We have requirement to set the checkbox value *Indeterminate or
> > > **Intersex * then by with value  "i*ndeterminate"  or "i**ntersex
> > > *" 
> > > does not ticked the checkbox in the filled PDF.
> > > 
> > >  I am afraid if we set  the last value to PDF Checkbox then
> > > downstream 
> > > processing may have an impact which is not in our control to fix
> > > or
> > > analyze.
> > > 
> > > Could you please help out in this scenario ?
> > > 
> > > Thanks
> > > Damaji.
> > > 
> > > *
> > > *
> > 
> 

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Fill value in the PDF Checkbox Field using PDAcroForm.

2022-08-17 Thread sahy...@fileaffairs.de

Hi Damaji,

the attachment didn't make it through - can you upload that somewhere
or even better the PDF in question?

BR
Maruan

Am Mittwoch, dem 17.08.2022 um 13:03 +0530 schrieb Damaji Kalunge:
> HI Team,
> 
>   There is confusion around how CheckboxPDF Field is in PDF. I have
> attached screenshot  . Please refer to it.
>   This is group checkbox field with one PDF Field name "app.sex".
>   Requirement is to set the each checkbox based on its value.
> 
>  Chosen  below  way  to fill PDF because it is encrypted.   
>   Below code is working fine for male, female and unspecified
> COSDictionary 
> cosDictionary3=pDAcroForm.getField("ap.sex").getCOSObject();
> cosDictionary3.setString(COSName.V ,"unspecified");
>   but it is not working for "indeterminate"  and  "intersex "  means
> the checkbox are ticked. 
>   
>  Could you please help out ?
> 
> 
> Thanks
> Damaji.
> 
> 
>    
>    
> 
> On Wed, Aug 17, 2022 at 12:40 PM sahy...@fileaffairs.de
>  wrote:
> > 
> > reading a little further (and not being able to see the image) I'm
> > guessing that there is a single checkbox which you can tick if the
> > person has one of the three potential states. But that's not
> > possible
> > to as a checkbox can only hold a value for unchecked (default OFF)
> > and
> > one for checked. So checking by providing one of the three values
> > and
> > also capturing that is not possible. 
> > 
> > You should use a List field in that case or a group of radio
> > buttons or
> > have a hidden field where you in addition to checking the checkbox
> > hold
> > the value of what checked means.
> > 
> > BR
> > Maruan
> > 
> > 
> > Am Mittwoch, dem 17.08.2022 um 08:54 +0200 schrieb
> > sahy...@fileaffairs.de:
> > > In addition,
> > > 
> > > are you sure that Indeterminate or Intersex are the potential
> > > values
> > > defined for the checkbox(es). The visible text is not what
> > > determines
> > > the potential values fo rthe checked/unchecked state. Also it's
> > > unusual
> > > that a checkbox has 3 potenmtial values as it's either checked or
> > > unchecked.
> > > 
> > > If you upload the the PDF to a shared location I can take a look.
> > > 
> > > BR
> > > Maruan
> > > 
> > > 
> > > 
> > > Am Mittwoch, dem 17.08.2022 um 08:41 +0200 schrieb Tilman
> > > Hausherr:
> > > > Hi,
> > > > 
> > > > If this is really a PDCheckBox then call check() or uncheck().
> > > > I
> > > > can't 
> > > > see the image, maybe this was an attachment.
> > > > 
> > > > Tilman
> > > > 
> > > > Am 17.08.2022 um 08:26 schrieb Damaji Kalunge:
> > > > > Hi Team,
> > > > > 
> > > > >   In the editable PDF we have a checkbox as shown below.
> > > > > image.png
> > > > > 
> > > > > *Indeterminate / Intersex / Unspecified []*
> > > > > *
> > > > > *
> > > > >    We are able to check the above checkbox using below code
> > > > > and
> > > > > only 
> > > > > with value "*unspecified" .
> > > > > * Then only the checkbox is ticked in the filled PDF.
> > > > > COSDictionary 
> > > > > cosDictionary3=pDAcroForm.getField("ap.sex").getCOSObject(); 
> > > > > cosDictionary3.setString(COSName.V ,"unspecified");
> > > > > **
> > > > >   We have requirement to set the checkbox value
> > > > > *Indeterminate or
> > > > > **Intersex * then by with value  "i*ndeterminate"  or
> > > > > "i**ntersex
> > > > > *" 
> > > > > does not ticked the checkbox in the filled PDF.
> > > > > 
> > > > >  I am afraid if we set  the last value to PDF Checkbox then
> > > > > downstream 
> > > > > processing may have an impact which is not in our control
> > > > > to fix
> > > > > or
> > > > > analyze.
> > > > > 
> > > > > Could you please help out in this scenario ?
> > > > > 
> > > > > Thanks
> > > > > Damaji.
> > > > > 
> > > > > *
> > > > > *
> > > > 
> > > 
> > 
> > 
> > ---
> > --
> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@pdfbox.apache.org

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Fill value in the PDF Checkbox Field using PDAcroForm.

2022-08-17 Thread sahy...@fileaffairs.de

Hello Damaji,

the PDF supports my assumption. There is a single checkbox representing
eiher one of three values for the checked state. And that's not
possible with a checkbox. It can only hold one value for the unchecked
and one value for the checked state. 

So if you need to be able to differentiate between the additional 
genders you need to have 
- additional checkboxes or
- a dropdown or
- store the value somewehere else in the PDF in addition to checking
the box (e.g. by adding a hidden field or storing the information in
the metadata of the PDF).

The last workaround obviously will not work if one uses the PDF
interactively which limits the use of the PDF.

BR
Maruan

Am Donnerstag, dem 18.08.2022 um 09:29 +0530 schrieb Damaji Kalunge:
> Hi Team,
>  Please refer to the link PDF [
> https://drive.google.com/file/d/17XoTxgGSrn9-XMpZsSqhNanzTW6UBCmg/view?usp=sharing
> ].
> Thanks
> Damaji.
> 
> On Wed, Aug 17, 2022 at 9:46 PM Gilad Denneboom
> 
> wrote:
> 
> > You can't attach files directly here. Upload it to a file-sharing
> > website
> > (like Google Drive) and post a link to it.
> > 
> > On Wed, Aug 17, 2022 at 10:11 AM Damaji Kalunge
> > 
> > wrote:
> > 
> > > Hi Team,
> > > 
> > > I have attached the PDF itself.
> > > 
> > >  Requirement is tick the sex checkbox based on its possible
> > > values like
> > *male,
> > > female,  **unspecified*, "i*ndeterminate"  and  "i**ntersex *" .
> > >    Facing issue for "i*ndeterminate"  and  "i**ntersex *"  with
> > > below
> > > code.
> > > 
> > > COSDictionary
> > cosDictionary3=pDAcroForm.getField("ap.sex").getCOSObject();
> > > cosDictionary3.setString(COSName.V ,"unspecified");
> > > 
> > > Thanks
> > > Damaji.
> > > 
> > > On Wed, Aug 17, 2022 at 1:10 PM sahy...@fileaffairs.de <
> > > sahy...@fileaffairs.de> wrote:
> > > 
> > > > Hi Damaji,
> > > > 
> > > > the attachment didn't make it through - can you upload that
> > > > somewhere
> > > > or even better the PDF in question?
> > > > 
> > > > BR
> > > > Maruan
> > > > 
> > > > Am Mittwoch, dem 17.08.2022 um 13:03 +0530 schrieb Damaji
> > > > Kalunge:
> > > > > HI Team,
> > > > > 
> > > > >   There is confusion around how CheckboxPDF Field is in PDF.
> > > > > I have
> > > > > attached screenshot  . Please refer to it.
> > > > >   This is group checkbox field with one PDF Field name
> > > > > "app.sex".
> > > > >   Requirement is to set the each checkbox based on its value.
> > > > > 
> > > > >  Chosen  below  way  to fill PDF because it is encrypted.
> > > > >   Below code is working fine for male, female and unspecified
> > > > > COSDictionary
> > > > > cosDictionary3=pDAcroForm.getField("ap.sex").getCOSObject();
> > > > > cosDictionary3.setString(COSName.V ,"unspecified");
> > > > >   but it is not working for "indeterminate"  and  "intersex
> > > > > "  means
> > > > > the checkbox are ticked.
> > > > > 
> > > > >  Could you please help out ?
> > > > > 
> > > > > 
> > > > > Thanks
> > > > > Damaji.
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > On Wed, Aug 17, 2022 at 12:40 PM sahy...@fileaffairs.de
> > > > >  wrote:
> > > > > > 
> > > > > > reading a little further (and not being able to see the
> > > > > > image) I'm
> > > > > > guessing that there is a single checkbox which you can tick
> > > > > > if the
> > > > > > person has one of the three potential states. But that's
> > > > > > not
> > > > > > possible
> > > > > > to as a checkbox can only hold a value for unchecked
> > > > > > (default OFF)
> > > > > > and
> > > > > > one for checked. So checking by providing one of the three
> > > > > > values
> > > > > > and
> > > > > > also capturing that is not possible.
> > > > > > 
> > > > > > You shou

Re: Fill value in the PDF Checkbox Field using PDAcroForm.

2022-08-18 Thread sahy...@fileaffairs.de

if I understood correctly the request was to be able to set the last
checkbox labled Indeterminate / Intersex / Unspecified with either one
of the three values (unspecified works) and also capture the value.
e.g. when setting 'indeterminate' as the value the checkbox should be
checked and also have the value 'indeterminate'.

That's not possible. It can only be set using 'unspecified' and then
have the value 'unspecified' as a single checkbox can not have multiple
ON values.

BR
Maruan



Am Donnerstag, dem 18.08.2022 um 09:29 +0200 schrieb Tilman Hausherr:
> I tried it in firefox... these are three checkboxes that behave like
> a 
> radio button set.
> 
> Then I looked at it with PDFDebugger, there are no radio button
> flags. 
> There is no javascript. Could this be some default behavior when 
> checkboxes have the same name? You possibly mentioned something a few
> days / weeks ago.
> 
> Tilman
> 
> Am 18.08.2022 um 07:18 schrieb sahy...@fileaffairs.de:
> > Hello Damaji,
> > 
> > the PDF supports my assumption. There is a single checkbox
> > representing
> > eiher one of three values for the checked state. And that's not
> > possible with a checkbox. It can only hold one value for the
> > unchecked
> > and one value for the checked state.
> > 
> > So if you need to be able to differentiate between the additional
> > genders you need to have
> > - additional checkboxes or
> > - a dropdown or
> > - store the value somewehere else in the PDF in addition to
> > checking
> > the box (e.g. by adding a hidden field or storing the information
> > in
> > the metadata of the PDF).
> > 
> > The last workaround obviously will not work if one uses the PDF
> > interactively which limits the use of the PDF.
> > 
> > BR
> > Maruan
> > 
> > Am Donnerstag, dem 18.08.2022 um 09:29 +0530 schrieb Damaji
> > Kalunge:
> > > Hi Team,
> > >   Please refer to the link PDF [
> > > https://drive.google.com/file/d/17XoTxgGSrn9-XMpZsSqhNanzTW6UBCmg/view?usp=sharing
> > > ].
> > > Thanks
> > > Damaji.
> > > 
> > > On Wed, Aug 17, 2022 at 9:46 PM Gilad Denneboom
> > > 
> > > wrote:
> > > 
> > > > You can't attach files directly here. Upload it to a file-
> > > > sharing
> > > > website
> > > > (like Google Drive) and post a link to it.
> > > > 
> > > > On Wed, Aug 17, 2022 at 10:11 AM Damaji Kalunge
> > > > 
> > > > wrote:
> > > > 
> > > > > Hi Team,
> > > > > 
> > > > > I have attached the PDF itself.
> > > > > 
> > > > >   Requirement is tick the sex checkbox based on its possible
> > > > > values like
> > > > *male,
> > > > > female,  **unspecified*, "i*ndeterminate"  and  "i**ntersex
> > > > > *" .
> > > > >     Facing issue for "i*ndeterminate"  and  "i**ntersex *" 
> > > > > with
> > > > > below
> > > > > code.
> > > > > 
> > > > > COSDictionary
> > > > cosDictionary3=pDAcroForm.getField("ap.sex").getCOSObject();
> > > > > cosDictionary3.setString(COSName.V ,"unspecified");
> > > > > 
> > > > > Thanks
> > > > > Damaji.
> > > > > 
> > > > > On Wed, Aug 17, 2022 at 1:10 PM sahy...@fileaffairs.de <
> > > > > sahy...@fileaffairs.de> wrote:
> > > > > 
> > > > > > Hi Damaji,
> > > > > > 
> > > > > > the attachment didn't make it through - can you upload that
> > > > > > somewhere
> > > > > > or even better the PDF in question?
> > > > > > 
> > > > > > BR
> > > > > > Maruan
> > > > > > 
> > > > > > Am Mittwoch, dem 17.08.2022 um 13:03 +0530 schrieb Damaji
> > > > > > Kalunge:
> > > > > > > HI Team,
> > > > > > > 
> > > > > > >    There is confusion around how CheckboxPDF Field is in
> > > > > > > PDF.
> > > > > > > I have
> > > > > > > attached screenshot  . Please refer to it.
> > > > > > >    This is group checkbox field with one PDF Field name
> > > > > > > "app.sex".
> > > > >

Re: Fill value in the PDF Checkbox Field using PDAcroForm.

2022-08-18 Thread sahy...@fileaffairs.de

Hi Damaji,

possible approaches have been outlined below in a previous answer.

BR
Maruan

Am Donnerstag, dem 18.08.2022 um 14:41 +0530 schrieb Damaji Kalunge:
> Yes, the above understanding is correct.
>  Is there any possible solution to this problem?
> 
> Thanks
> Damaji
> 
> 
> On Thu, Aug 18, 2022 at 1:20 PM sahy...@fileaffairs.de <
> sahy...@fileaffairs.de> wrote:
> 
> > if I understood correctly the request was to be able to set the
> > last
> > checkbox labled Indeterminate / Intersex / Unspecified with either
> > one
> > of the three values (unspecified works) and also capture the value.
> > e.g. when setting 'indeterminate' as the value the checkbox should
> > be
> > checked and also have the value 'indeterminate'.
> > 
> > That's not possible. It can only be set using 'unspecified' and
> > then
> > have the value 'unspecified' as a single checkbox can not have
> > multiple
> > ON values.
> > 
> > BR
> > Maruan
> > 
> > 
> > 
> > Am Donnerstag, dem 18.08.2022 um 09:29 +0200 schrieb Tilman
> > Hausherr:
> > > I tried it in firefox... these are three checkboxes that behave
> > > like
> > > a
> > > radio button set.
> > > 
> > > Then I looked at it with PDFDebugger, there are no radio button
> > > flags.
> > > There is no javascript. Could this be some default behavior when
> > > checkboxes have the same name? You possibly mentioned something a
> > > few
> > > days / weeks ago.
> > > 
> > > Tilman
> > > 
> > > Am 18.08.2022 um 07:18 schrieb sahy...@fileaffairs.de:
> > > > Hello Damaji,
> > > > 
> > > > the PDF supports my assumption. There is a single checkbox
> > > > representing
> > > > eiher one of three values for the checked state. And that's not
> > > > possible with a checkbox. It can only hold one value for the
> > > > unchecked
> > > > and one value for the checked state.
> > > > 
> > > > So if you need to be able to differentiate between the
> > > > additional
> > > > genders you need to have
> > > > - additional checkboxes or
> > > > - a dropdown or
> > > > - store the value somewehere else in the PDF in addition to
> > > > checking
> > > > the box (e.g. by adding a hidden field or storing the
> > > > information
> > > > in
> > > > the metadata of the PDF).
> > > > 
> > > > The last workaround obviously will not work if one uses the PDF
> > > > interactively which limits the use of the PDF.
> > > > 
> > > > BR
> > > > Maruan
> > > > 
> > > > Am Donnerstag, dem 18.08.2022 um 09:29 +0530 schrieb Damaji
> > > > Kalunge:
> > > > > Hi Team,
> > > > >   Please refer to the link PDF [
> > > > > 
> > https://drive.google.com/file/d/17XoTxgGSrn9-XMpZsSqhNanzTW6UBCmg/view?usp=sharing
> > > > > ].
> > > > > Thanks
> > > > > Damaji.
> > > > > 
> > > > > On Wed, Aug 17, 2022 at 9:46 PM Gilad Denneboom
> > > > > 
> > > > > wrote:
> > > > > 
> > > > > > You can't attach files directly here. Upload it to a file-
> > > > > > sharing
> > > > > > website
> > > > > > (like Google Drive) and post a link to it.
> > > > > > 
> > > > > > On Wed, Aug 17, 2022 at 10:11 AM Damaji Kalunge
> > > > > > 
> > > > > > wrote:
> > > > > > 
> > > > > > > Hi Team,
> > > > > > > 
> > > > > > > I have attached the PDF itself.
> > > > > > > 
> > > > > > >   Requirement is tick the sex checkbox based on its
> > > > > > > possible
> > > > > > > values like
> > > > > > *male,
> > > > > > > female,  **unspecified*, "i*ndeterminate"  and 
> > > > > > > "i**ntersex
> > > > > > > *" .
> > > > > > >     Facing issue for "i*ndeterminate"  and  "i**ntersex
> > > > > > > *"
> > > > > > > with
> > > > > > > below
> > > > > > > code.
> > > >

Re: Descenders and other Font related questions for PdfStripper

2022-08-24 Thread sahy...@fileaffairs.de

Hi David,

Am Mittwoch, dem 24.08.2022 um 11:59 +0100 schrieb David Goodenough:
> I am using PDFStripper (from PDFBox 2.0.26) to annotate (by drawing a
> coloured 
> box around each relevant piece of text) an image of a PDF document.
> 
> The internal details in the TextPosition would be sufficient for the
> purpose, but 
> the exposed details are not.  In particular there is not way (that I
> have found) to 
> get the size of a box around the text that includes the descenders. 
> I can 
> aggregate all the x,y,height and width data to create an overall box,
> but it only 
> goes down to the baseline.  

did you take a look at DrawPrintTextLocations in the examples package?

Should maybe give you some hints.
> 
> The PDFont value in the TextPosition has the maximum descender depth
> (if I 
> recall correctly) but is private and there is no getter either for
> the descender 
> depth or the PDFont value from which it could be calculated.
> 
> Have I missed something, or could the TextPosition class be extended
> to add 
> access to either the descender depth, the font or preferably both? 

BR
Maruan

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: PDF Fill using Apache PDF Box 3.0.0-alpha3 using COS Model due to PDF is encrypted.

2022-08-25 Thread sahy...@fileaffairs.de

Hello Damaji,

this works for me 

try (PDDocument doc = Loader.loadPDF(new File("80.pdf"))) {
doc.setAllSecurityToBeRemoved(true);
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
PDCheckBox field = (PDCheckBox) acroForm.getField("ap.sex");
field.setValue("female");
doc.save("80-filled-pdfbox.pdf");
}

Now, when using the COSModel as you did you need to ensure that
- you find the widget matching the "on" value
- set the AS entry to the value ("female" in your case)
- set the N entry correctly mathching (copying over) the corresponding
entry from the D entry

...

Best would be to lookup the source of the PDCheckBox field to see
what's happening on the COS level when you doe PDDcheckBox.setvalue

BR
Maruan


Am Donnerstag, dem 25.08.2022 um 14:34 +0530 schrieb Damaji Kalunge:
> HI Team,
> 
>     Refer below information
>   Input PDF File : [
> https://drive.google.com/file/d/17XoTxgGSrn9-XMpZsSqhNanzTW6UBCmg/view?usp=sharing
> ]
> 
>   Source Code used to Fill the PDF: [
> https://drive.google.com/file/d/1JP5mcuWxmKwtP_TfNFJsXxTy4S0zWAHF/view?usp=sharing
> ]
> 
>    Filled PDF :[
> https://drive.google.com/file/d/1qNKvgeQrnxKsGDeUyErSgIzT_BGqG-z8/view?usp=sharing
> ]
> 
> After Filling the PDF we are facing two issues .
> 
>     1.  Not able to check the CheckBox field "sex" using attached
> source
> code.
>  2.  Not able to open the filled PDF in the Abode Reader .
> 
> Could please help out in this ?
> 
> Thanks and Regard,
>   Damaji.


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Problem signing (with LTV) an already signed document with extended features enabled

2022-09-26 Thread sahy...@fileaffairs.de

lable. Please contact the author for the original version
> > > > of this document.”
> > > > 
> > > > Now, when we try to sign it, also using an LTV-enabled
> > > > signature (Advanced or Qualified), we receive following error
> > > > message:
> > > > 
> > > > java.io.IOException: Can't write new byteRange '0 542575 554377
> > > > 7562]' not enough space: byteRange.length(): 21,
> > > > byteRangeLength: 20
> > > > at
> > > > org.apache.pdfbox.pdfwriter.COSWriter.doWriteSignature(COSWrite
> > > > r.java:763)
> > > > at
> > > > org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWrit
> > > > er.java:1199)
> > > > at
> > > > org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:452)
> > > > at
> > > > org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1435
> > > > )
> > > > at
> > > > org.apache.pdfbox.pdmodel.PDDocument.saveIncremental(PDDocument
> > > > .java:1412)
> > > > at
> > > > com.swisscom.ais.client.impl.PdfDocument.finishSignature(PdfDoc
> > > > ument.java:180)
> > > > ... 4 more
> > > > 
> > > > Instead adding a simple timestamp (without LTV) it works
> > > > without issues.
> > > > 
> > > > The byteRange length referred in the stack trace is the one of
> > > > the OLD signature and not of the one we are adding to the
> > > > document (please note that for the new one we reserve a size of
> > > > 30’000 bytes, and also increasing this size has no impact).
> > > > 
> > > > Unfortunately I’m not allowed to share with you this document
> > > > (I’m trying to arrange the creation of a new sample document
> > > > with the same properties, but the author of the file is
> > > > currently in holiday).
> > > > 
> > > > In the PDFBox Jira I’ve found some already solved issues
> > > > regarding saveIncremental and these “extended features”:
> > > > 
> > > > https://issues.apache.org/jira/browse/PDFBOX-45
> > > > https://issues.apache.org/jira/browse/PDFBOX-2857
> > > > https://issues.apache.org/jira/browse/PDFBOX-2858
> > > > https://issues.apache.org/jira/browse/PDFBOX-2859
> > > >   In one of these issues (PDFBOX-2858), I found a file example
> > > > (santander_freistellungsauftrag_modified.pdf
> > > > <https://issues.apache.org/jira/secure/attachment/12744154/sant
> > > > ander_freistellungsauftrag_modified.pdf <
> > > > https://issues.apache.org/jira/secure/attachment/12744154/santa
> > > > nder_freistellungsauftrag_modified.pdf>>). I’ve try to sign it
> > > > (and also countersign it) but “unfortunately” also this one
> > > > worked without any issue.
> > > > 
> > > > Can you kindly help me solving this problem?
> > > > 
> > > > Thanks and best regards,
> > > > 
> > > > Patrick
> > > 
> > > 
> > > -
> > > 
> > > To unsubscribe, e-mail:
> > > users-unsubscr...@pdfbox.apache.org  > > users-unsubscr...@pdfbox.apache.org>
> > > For additional commands, e-mail:
> > > users-h...@pdfbox.apache.org  > > users-h...@pdfbox.apache.org>
> 
> 
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
> 

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Merging font subsets

2022-12-16 Thread sahy...@fileaffairs.de



Am Freitag, dem 16.12.2022 um 10:31 + schrieb Mark Gibson:
> Hi,
> 
> I'm wondering if anyone cleverer than me has been able to figure out
> how to take a PDF, analyse all the embedded font subsets, and merge
> disparate subsets of the same font together (or even escalate to full
> font set).  Google has a small number of similar requests, but little
> to no solutions.

If I'm not mistaken Apache FOP has such a capability as part of
importing a PDF as a graphic.

BR
Maruan

> 
> Yours hopefully
> Mark



-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: How to fix PDF not rendering all fields

2023-02-02 Thread sahy...@fileaffairs.de

Hi,

what happens when you remove the line pdForm.setNeedAppearances( true
); or set to false?

The attachments didn't make it to the mailing list - can you upload
these to a shared location?

BR
Maruan

Am Donnerstag, dem 02.02.2023 um 19:30 +0200 schrieb Jurgen Doll:
> Hi
> 
> I have PDF forms which when opened and rendered as an image with
> PDFBox don't display the contents/values of all the populated
> fields.  However the same PDF files render all fields correctly in
> Acrobat and Edge. Specifically it seem that text fields that appear
> to render each letter in a box (combing?) are displayed with PDFBox,
> but 'unspaced' textfield values are not rendered.
> 
> I would like to know if there is some way that I can detect that a
> field won't be rendered and to correct it via PDFBox code before
> rendering.
> 
> Additional information:
> 
> The PDF forms originate from other independent organizations, which
> are consumed and populated by my application.
> 
> I've attached stripped down versions (just three fields) of one of
> these forms, namely EmptyForm.pdf (unpopulated) as consumed by my
> application and then PopulatedForm.pdf after field processing and
> then saved (all via PDFBox API).
> 
> Below basic PDFBox code that I use to achieve this:
> 
> try ( var pdfDoc = PDDocument.load( new File("EmptyForm.pdf") ) )
> {
>     var pdForm = pdfDoc.getDocumentCatalog().getAcroForm();
>     pdForm.setNeedAppearances( true );  //*1
> 
>     for ( var field : pdForm.getFields() )
>     {
>         var fldValue = getValueFor( field.getPartialName() );
>         field.setValue( fldValue );
>     }
> 
>     pdForm.refreshAppearances();  //*1
>     var pages = new PDFRenderer( pdfDoc );
>     var pgImage = pages.renderImage(0);
> 
>     // display pgImage: only two fields have been rendered NOT three
> ?
> 
>     pdfDoc.save( "PopulatedForm.pdf" );
> }
> catch ( IOException IO )
> {
>     IO.printStackTrace();
> }
> 
> //*1 Without these two lines Acrobat and Edge also exhibit the same
> behavior as PDFBox
> 
> 
> This happens using JRE 11 and PDFBox  2.0.24,  2.0.27, as well as
> 3.0.0 alpha3
> 
> Thanks in advance
> Jurgen
> 
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: How to fix PDF not rendering all fields

2023-02-03 Thread sahy...@fileaffairs.de

Hi,

tried the download but that doesn't work for me (free download).

BR
Maruan

Am Donnerstag, dem 02.02.2023 um 20:40 +0200 schrieb Jurgen Doll:
> Hi
> 
> Files can be found here  https://ufile.io/f/svrzb
> 
> > what happens when you remove the line pdForm.setNeedAppearances(
> > true );  
> > or set to false?
> 
> This doesn't change the PDFBox rendering, but does change Acrobat and
> Edge  
> where the third field value isn't displayed until clicked.
> 
> Thanks
> 
> 
> On Thu, 02 Feb 2023 20:19:38 +0200, sahy...@fileaffairs.de  
>  wrote:
> 
> > Hi,
> > 
> > what happens when you remove the line pdForm.setNeedAppearances(
> > true
> > ); or set to false?
> > 
> > The attachments didn't make it to the mailing list - can you upload
> > these to a shared location?
> > 
> > BR
> > Maruan
> > 
> > Am Donnerstag, dem 02.02.2023 um 19:30 +0200 schrieb Jurgen Doll:
> > > Hi
> > > 
> > > I have PDF forms which when opened and rendered as an image with
> > > PDFBox don't display the contents/values of all the populated
> > > fields.  However the same PDF files render all fields correctly
> > > in
> > > Acrobat and Edge. Specifically it seem that text fields that
> > > appear
> > > to render each letter in a box (combing?) are displayed with
> > > PDFBox,
> > > but 'unspaced' textfield values are not rendered.
> > > 
> > > I would like to know if there is some way that I can detect that
> > > a
> > > field won't be rendered and to correct it via PDFBox code before
> > > rendering.
> > > 
> > > Additional information:
> > > 
> > > The PDF forms originate from other independent organizations,
> > > which
> > > are consumed and populated by my application.
> > > 
> > > I've attached stripped down versions (just three fields) of one
> > > of
> > > these forms, namely EmptyForm.pdf (unpopulated) as consumed by my
> > > application and then PopulatedForm.pdf after field processing and
> > > then saved (all via PDFBox API).
> > > 
> > > Below basic PDFBox code that I use to achieve this:
> > > 
> > > try ( var pdfDoc = PDDocument.load( new File("EmptyForm.pdf") ) )
> > > {
> > >     var pdForm = pdfDoc.getDocumentCatalog().getAcroForm();
> > >     pdForm.setNeedAppearances( true );  //*1
> > > 
> > >     for ( var field : pdForm.getFields() )
> > >     {
> > >     var fldValue = getValueFor( field.getPartialName() );
> > >     field.setValue( fldValue );
> > >     }
> > > 
> > >     pdForm.refreshAppearances();  //*1
> > >     var pages = new PDFRenderer( pdfDoc );
> > >     var pgImage = pages.renderImage(0);
> > > 
> > >     // display pgImage: only two fields have been rendered NOT
> > > three
> > > ?
> > > 
> > >     pdfDoc.save( "PopulatedForm.pdf" );
> > > }
> > > catch ( IOException IO )
> > > {
> > >     IO.printStackTrace();
> > > }
> > > 
> > > //*1 Without these two lines Acrobat and Edge also exhibit the
> > > same
> > > behavior as PDFBox
> > > 
> > > 
> > > This happens using JRE 11 and PDFBox  2.0.24,  2.0.27, as well as
> > > 3.0.0 alpha3
> > > 
> > > Thanks in advance
> > > Jurgen
> > > 
> > > 
> > > -
> > > 
> > > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > > For additional commands, e-mail: users-h...@pdfbox.apache.org
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: How to fix PDF not rendering all fields

2023-02-03 Thread sahy...@fileaffairs.de

worked - will look at it tomorrow

Am Freitag, dem 03.02.2023 um 17:14 +0200 schrieb Jurgen Doll:
> Okay lets try drop box
> 
> https://www.dropbox.com/sh/9n9a3pvxyzemy89/AABe75G3siYM4Ljh0Y77DCCIa?dl=0
> 
> 
> On Fri, 03 Feb 2023 16:56:53 +0200, sahy...@fileaffairs.de  
>  wrote:
> 
> > Hi,
> > 
> > tried the download but that doesn't work for me (free download).
> > 
> > BR
> > Maruan
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: How to fix PDF not rendering all fields

2023-02-03 Thread sahy...@fileaffairs.de

Hello Jurgen,

all fields in the sample form have the comb flag set. The first two
fields are correct but the third field is missing the field lenght i.e.
the number of characters for the comb field. 

That's why there is no appearance being generated.

Either set the max number of charcters for the comb or remove the comb
property likes so

PDTextField field = (PDTextField)
form.getField("fullwidth_1_patiedetaiclone_surna-1");
field.setComb(false);

We could issue a warning in such cases or ignore the comb flag at all
in such cases - feel free to raise an enhancement request for that.

BR
Maruan



Am Freitag, dem 03.02.2023 um 17:01 +0200 schrieb Jurgen Doll:
> Hi
> 
> So I think that this behavior has something to do with a field having
> a problematic appearance /AP entry.
> 
> I thought maybe that if the field's /AP entry is bad that maybe its
> default appearance /DA would be used.
> So I tried setting the field's /DA to the form's /DA value with:
> 
> var da = pdForm.getDefaultAppearance();
> textfld.setDefaultAppearance( da );
> 
> but this unfortunately didn't cause the field value to be rendered
> either.
> 
> Can someone give me insight as to how PDFBox handles this and where
> in the PDFBox code base I can look for this ?
> Or is there another way to handle this ?
> 
> Thanks, regards
> Jurgen
> 


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: [Bug] (Radio) Buttons can not be printed in a merged PDF

2023-06-06 Thread sahy...@fileaffairs.de




Am Dienstag, dem 06.06.2023 um 11:47 + schrieb
sven.neufe...@ruv.de:
> 
> 
> 
> Heys guys,
>  
> we're having an issue with some of our PDF documents that contains
> interactive fields like Radio Buttons. In some cases we’ve to merge
> PDF documents together (using the PDFMergerUtility) and when we try
> to print that merged document these Button elements are not rendered
> correctly, when using Apache PDFBox >= 2.0.22. Using the version
> 2.0.21 the buttons are rendered correctly.

How do you render the PDF? From which application?

BR
Maruan

>  
> I’m pretty new in the world of how PDF files work under the hood and
> how the PDF structure and fields is defined in the standard. So I did
> some research and could find out the reason for this misbehavior.
>  
> It looks like that issue had been introduced with this
> commit:https://github.com/apache/pdfbox/commit/fe00cd3870f6d9ec27fcb5
> 5c89409b420ade0826
>  
> In the origin document (before entering the merge step)
> theNeedAppearances entry is set to true, but after the merge step the
> entry’s changed to be false.
> I could figure out that this line of code is the reason for
> that:https://github.com/apache/pdfbox/blob/fe00cd3870f6d9ec27fcb55c89
> 409b420ade0826/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/fixup/p
> rocessor/AcroFormGenerateAppearancesProcessor.java#L55
>  
> Is that an intended behaviour or is that an unintentionally
> sideeffect?
>  
>  
> Kind regards
>  
> Sven Neufeind


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: [Bug] (Radio) Buttons can not be printed in a merged PDF

2023-06-06 Thread sahy...@fileaffairs.de

Am Dienstag, dem 06.06.2023 um 18:27 + schrieb
sven.neufe...@ruv.de:
> I used different applications, like Microsoft Edge, Google Chrome and
> Adobe Reader DC to render the PDF file, but the behaviour is the
> same.

is it possible to share the original and merged PDF? Please put it onto
a public location as the mailing list doesn't support attachments. You
can also send it to me directly. Note that I will share my findings.

NeedAppearances set enforces the viewing application to generate the
visible representation /the appearance/ of the from widgets on the fly.
But it's discouraged.

The commit you mentioned should generate the appearance but I'd need to
look at the original and merged version to verify.

BR
Maruan

> 
> Kind regards
> Sven Neufeind
> 
> 
> -----Ursprüngliche Nachricht-
> Von: sahy...@fileaffairs.de  
> Gesendet: Dienstag, 6. Juni 2023 19:22
> An: users@pdfbox.apache.org
> Betreff: Re: [Bug] (Radio) Buttons can not be printed in a merged PDF
> 
> 
> 
> Am Dienstag, dem 06.06.2023 um 11:47 + schrieb
> sven.neufe...@ruv.de:
> > 
> > 
> > 
> > Heys guys,
> >  
> > we're having an issue with some of our PDF documents that contains
> > interactive fields like Radio Buttons. In some cases we’ve to merge
> > PDF documents together (using the PDFMergerUtility) and when we try
> > to print that merged document these Button elements are not
> > rendered
> > correctly, when using Apache PDFBox >= 2.0.22. Using the version
> > 2.0.21 the buttons are rendered correctly.
> 
> How do you render the PDF? From which application?
> 
> BR
> Maruan
> 
> >  
> > I’m pretty new in the world of how PDF files work under the hood
> > and
> > how the PDF structure and fields is defined in the standard. So I
> > did
> > some research and could find out the reason for this misbehavior.
> >  
> > It looks like that issue had been introduced with this
> > commit:
> > https://github.com/apache/pdfbox/commit/fe00cd3870f6d9ec27fcb5
> > 5c89409b420ade0826
> >  
> > In the origin document (before entering the merge step)
> > theNeedAppearances entry is set to true, but after the merge step
> > the
> > entry’s changed to be false.
> > I could figure out that this line of code is the reason for
> > that:
> > https://github.com/apache/pdfbox/blob/fe00cd3870f6d9ec27fcb55c89
> > 409b420ade0826/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/fixup
> > /p
> > rocessor/AcroFormGenerateAppearancesProcessor.java#L55
> >  
> > Is that an intended behaviour or is that an unintentionally
> > sideeffect?
> >  
> >  
> > Kind regards
> >  
> > Sven Neufeind
> 
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: [Bug] (Radio) Buttons can not be printed in a merged PDF

2023-06-13 Thread sahy...@fileaffairs.de

it took a little longer than anticipated to look at the files.

First some explanation about Need Appearances.

Normally a PDF should have the visual state of the form fields
generated so that when a reading application views the PDF the visual
appearance can be drawn.

Need Appearances being set to true signals the reading application that
the visual state is incomplete and the application should generate that
on the fly. This is discouraged and deprecated in PDF 2.0 as now the
dependency is no longer on the generating side but on the reading side
(e.g. a reader might not display the form fields at all when there is
no appearance information)

For the file in question is hat incomplete appearance information. The
best would be to fix that prior to merging.

PDFBox 2.0.22 tries to fix that but does that incomplete because of
partially existing information in the source.

2.0.21 and before don't try to fix that.

Unfortunately there is also no way to tell PDFMergerUtility 2.0.22
onwards to skip the fix as the AcroForm getter in PDFMergerUtility uses
the default call.

As a workaround set need apperances to true after the merging has been
completed.

In addition feel free to file a bug report for 2.0.22 onwards.

BR
Maruan 



Am Dienstag, dem 06.06.2023 um 18:55 + schrieb
sven.neufe...@ruv.de:
> Good point, but the issue is the same, even I try to "merge" one
> single document or more ones.
> I wrote a little example application called "PDFMerge.java". Please
> find at the end at that message. 
> The behaviour is fine for that case, when I removed the line
> "acroForm.setNeedAppearances(false);" within the
> AcroFormGenerateAppearancesProcessor class.
> 
> The process how we use PDFBox is:
> 
> 1. PDF files to be merged are created on the server side
> 2. PDF files are read and merged together
> 3. The merged PDF document is delivered to the client
> 4. The client saves then the document locally
> 5. Then the document will be opened by a standard application, e.g.
> Adobe Reader, and printed out if needed
> 
> I could find out, that the difference in the documents is produced at
> step 2.
> I also tried to set the value manually within the PDFMergerUtility
> class
> 
> sourceDoc.getDocumentCatalog().getAcroForm().setNeedAppearanc
> es(true);
> 
> But it turned out that this entry will be overridden by the
> AcroFormGenerateAppearancesProcessor.
> 
> Hope that'll help you!
> 
> Kind regards
> Sven Neufeind
> 
> ### PDFMerge.java
> 
> package org.apache.pdfbox.examples.interactive.form;
> 
> import org.apache.pdfbox.io.MemoryUsageSetting;
> import org.apache.pdfbox.multipdf.PDFMergerUtility;
> 
> import java.io.*;
> import java.util.Arrays;
> 
> public class PDFMerge {
> 
>     public static void main(final String[] args) throws IOException {
>     final var inputFiles = Arrays.asList(
>     "origin.pdf"
>     );
>     final var outputFile = "target/merged.pdf";
> 
>     final var memoryUsageSetting =
> MemoryUsageSetting.setupMainMemoryOnly();
>     final var pdfMergerUtility = new PDFMergerUtility();
>     final var dokumentOutputStream = new ByteArrayOutputStream();
>     pdfMergerUtility.setDestinationStream(dokumentOutputStream);
> 
>     inputFiles.forEach(file -> {
>     try (final var fis = new FileInputStream(file)) {
>     pdfMergerUtility.addSource(new
> ByteArrayInputStream(fis.readAllBytes()));
>     } catch (Exception e) {
>     e.printStackTrace();
>     }
>     });
> 
>     pdfMergerUtility.mergeDocuments(memoryUsageSetting);
> 
>     try (final var outputStream = new
> FileOutputStream(outputFile)) {
>     System.out.println("creating output file : " +
> outputFile);
>     dokumentOutputStream.writeTo(outputStream);
>     dokumentOutputStream.close();
>     }
> 
>     }
> }
> 
> -Ursprüngliche Nachricht-
> Von: Tilman Hausherr  
> Gesendet: Dienstag, 6. Juni 2023 19:47
> An: users@pdfbox.apache.org
> Betreff: Re: [Bug] (Radio) Buttons can not be printed in a merged PDF
> 
> I'm wondering whether maybe the documents have differents settings of
> the /NeedAppearances/ entry, and after merge it is set in a way that
> is 
> bad for you.
> 
> Does the rendering work properly if you change the entry manually? 
> doc.getDocumentCatalog().getAcroForm().setNeedAppearances()
> 
> Also, could it be you're rendering and THEN merging, i.e. from the
> same 
> PDDocument object?
> 
> Tilman
> 
> On 06.06.2023 13:47, sven.neufe...@ruv.de wrote:
> > 
> > Heys guys,
> > 
> > we're having an issue with some of our PDF documents that contains 
> > interactive fields like Radio Buttons. In some cases we’ve to merge
> > PDF documents together (using the PDFMergerUtility) and when we try
> > to 
> > print that merged document these Button elements are not rendered 
> > correctly, when using Apache PDFBox >= 2.0.22. Using the version 
> > 2.0.21 the buttons are rende

Re: Find field via the Indirect Reference number to it

2023-07-15 Thread sahy...@fileaffairs.de




Am Samstag, dem 15.07.2023 um 11:04 +0200 schrieb Gilad Denneboom:
> The CO-array (no spelling-mistake) is a part of the AcroForm object,
> which
> defines the order in which fields are calculated (see Table 218 in
> the PDF
> ISO specs). But it only contains (indirect) references to the fields.
> 
> However, your tip put me on the right path and I was able to get the
> actual
> PDField objects by comparing the values in this array to the
> values returned by the getCOSObject method of the PDFields under
> PDAcroForm.
> 
> It would actually be nice to have a direct getter and setter for it
> under
> PDAcroForm... Maybe in future versions?

Dear Gilad,

feel free to file an enhancement request with  the desciption of this
mail thread.

BR
Maruan


> 
> On Sat, Jul 15, 2023 at 10:50 AM Tilman Hausherr
> 
> wrote:
> 
> > I don't know what you mean with "CO array", I thought this was a
> > typo
> > because your image does not have a "CO" array.
> > 
> > Re "but they seem to be quite oblique" - please try to run
> > .getObject()
> > on them.
> > 
> > PDField.createField() does not create a new field, it just creates
> > the
> > PD-Object from a COSDictionary.
> > 
> > Tilman
> > 
> > 
> > On 15.07.2023 10:43, Gilad Denneboom wrote:
> > > The CO array contains COSObjects, yes, but they seem to be quite
> > > oblique,
> > > with nothing more than a reference number.
> > > Are you saying I can use the COSObject itself to find the field?
> > > If so,
> > how?
> > > Note I'm not trying to create new fields, just locate the
> > > existing ones
> > > referenced in this array.
> > > 
> > > On Sat, Jul 15, 2023 at 4:15 AM Tilman Hausherr
> > > 
> > > wrote:
> > > 
> > > > How did you get the indirect number in the first place?
> > > > 
> > > > Normally this would be a COSObject and you can dereference that
> > > > one by
> > > > calling getObject() and here it would be a COSDictionary. You
> > > > can pass
> > > > this to PDField.createField().
> > > > 
> > > > Tilman
> > > > 
> > > > On 14.07.2023 20:54, Gilad Denneboom wrote:
> > > > > Hi all,
> > > > > 
> > > > > I'm trying to see if there's a way to get a field's name
> > > > > using PDFBox
> > > > based
> > > > > on the Indirect Reference number to it.
> > > > > Namely, the numbers that are used in the CO array of the
> > > > > AcroForm
> > > > object. I
> > > > > can see those numbers in the PDF Debugger app next to the
> > > > > field name
> > (see
> > > > > screenshot: https://i.imgur.com/gCHvVRx.png), but I looked
> > > > > everywhere
> > in
> > > > > the properties of the PDField and PDAnnotationWidget and
> > > > > can't find
> > them
> > > > > there. Any pointers will be much appreciated!
> > > > > 
> > > > > Regards, Gilad.
> > > > > 
> > > > 
> > > > ---
> > > > --
> > > > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > > > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > > > 
> > > > 
> > 
> > 
> > ---
> > --
> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > 
> > 


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Adding annotations to PDFs - "expected a name object"

2023-08-02 Thread sahy...@fileaffairs.de

Hello Kai,

Am Mittwoch, dem 02.08.2023 um 07:56 + schrieb Kai Keggenhoff:
> 
> 
> 
> Hi Tilman,
>  
> thank you for the suggestion. Turns out, it's “Line” annotations.
> When I import
>  
> 
> http://ns.adobe.com/xfdf/"; xml:space="preserve">
>     
>     caption-style="Inline"
>   
> creationdate="D:20230526112815+02'00'" flags="print"
>   
> start="1770.007568,1614.721436" end="1791.373535,1658.575195"
>   date="D:20230526112815+
> 02'00'" page="0"
>   
> rect="1764.007568,1608.721436,1797.373535,1664.575195"
>   subject="Linie"
> title="John Doe">
>     flags="print,nozoom,norotate" open="no"
>   
> page="0" rect="1780.690552,-92.00,1964.690552,0.00"/>
>    
>     
> 
> 

how was the xfdf generated. Can you compare to an xfdf generated when
exporting using Adobe Acrobat?

BR
Maruan


>  
> into either its original source or just a fresh, DIN A4 PDF with a
> single line of text created with PDFBox, Adobe Reader will complain.
> At this point, I switched from 2.0.29 to 3.0.0beta1 and the result
> was the same.
>  
> In the resulting PDF, PDFDebugger shows what is in the attachment –
> an empty “NM” entry.
> I then tried to set either “Name” or “NM” (and ultimately both) in
> dictionary of the PDAnnotation to “Line” with
>  
> PDAnnotation a =
> PDAnnotation.createAnnotation(xfdfAnnotation.getCOSObject());
> a.getCOSObject().setItem(COSName.NM, new COSString("Line"));
> a.getCOSObject().setItem(COSName.NAME, new COSString("Line"));
>  
> but while these values show up fine in PDFDebugger,
> Adobe Reader will still complain.
>  
> Would you happen to have more suggestions how I could work around
> this please ?
>  
> Thanks in advance,
>  
> Kai
>  
> -Ursprüngliche Nachricht-
> Von: Tilman Hausherr 
> Gesendet: Dienstag, 1. August 2023 19:41
> An: users@pdfbox.apache.org
> Betreff: Re: Adding annotations to PDFs - "expected a name object"
>  
> CAUTION - External Sender
>  
>  
> Try to edit the xfdf file so that it has less annotations until you
> know
> which one is the culprit. Then tell what names were in the xfdf file.
> Or
> make a screenshot in PDFDebugger of that annotation, without the text
> content.
>  
> Tilman
>  
> On 01.08.2023 10:15, Kai Keggenhoff wrote:
> >  
> > Hello everyone,
> >  
> > we’re using PDFBox to import annotations from XFDF files and add
> > them
> > to existing PDF files.
> >  
> > First we create a FDFDocument from the XFDF, then we fetch the list
> > of
> > FDFAnnotations from it via
> >  
> > < FDFDocument>.getCatalog().getFDF().getAnnotations()
> >  
> > Then we iterate over this list, create PDAnnotations from the COS
> > objects of the FDFAnnotation with PDAnnotation.createAnnotation()
> >  
> > and add the PDAnnotation to the list of page annotations we got
> > from
> > PDPage.getAnnotations()
> >  
> > For most PDFs and most XFDF files, this works without any issues.
> >  
> > However, some PDF files with merged annotations, when opened in
> > Adobe
> > Reader, produce a huge number of popups with the message
> >  
> > “Expected a name object” / “Namensobjekt wurde erwartet”
> >  
> > before the PDF is fully rendered, while the original file does not
> > show such a behaviour.
> >  
> > When such PDF files are opened in Foxit PDF Reader or PDFDebugger,
> > they are rendered just fine without errors.
> >  
> > We saw this happen every now and then over the years, with all
> > PDFBox
> > V2 versions, including 2.0.29,
> >  
> > but it appears to have become more frequent recently, so user
> > complaints are becoming more frequent too.
> >  
> > As I’m unable to share sample PDFs/XFDFs due to compliance reasons,
> > I
> > just have the question if anyone happens to have some hints
> >  
> > what I could look for in an affected PDF (preferably using
> > PDFDebugger) to find out what Adobe Reader doesn’t like ?
> >  
> > Ideally, I then can try to compare it with the original and find
> > the
> > part of the process which “breaks” the result.
> >  
> > Thanks in advance,
> >  
> > Kai
> >  
> > <
> > https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fth
> > inkproject.com%2Fde%2F&data=05%7C01%7CKai.Keggenhoff%40thinkproject
> > .com%7Cc2c7a48b704a43b51b5108db92b67c7e%7C066d0cfbe2e648f093a415c5c
> > 8979a86%7C0%7C0%7C638265084694491778%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> > oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000
> > %7C%7C%7C&sdata=F%2FQ4yYRr1Rv%2BwLcEM5cmp%2FW0uhVDtG8c9Xez%2FbXbBmo
> > %3D&reserved=0>
> >  
> > *Kai Keggenhoff* / Senior Software Developer
> >  
> > *thinkproject.com***
> >

Re: Adding annotations to PDFs - "expected a name object"

2023-08-02 Thread sahy...@fileaffairs.de

Hello Kai,

Am Mittwoch, dem 02.08.2023 um 09:24 + schrieb Kai Keggenhoff:
> Hi Maruan,
> 
> we let Adobe Reader submit a form with flags set for XFDF, include
> annotations and "ExclNonUserAnnots" (table 237 in the PDF 32000-
> 1:2008 spec)
> The XFDF is sent to a server address specific for the document and
> then processed.
> 

how does the annottaion created by adobe Reader look in PDF Debugger?

You also mentioned that it doesn't work for merged PDFs - does it work
for the original PDF wo merging? How is the merging done?

BR
Maruan


> Usually, when sent by Adobe Reader, the full XFDF contains additional
> "f", "fields" and "id" nodes which I omitted for brevity.
> Here, I also reformatted it to make it easier to read.
> As we only read the children of "annots" when we process the XFDF to
> create annotation objects to add to the PDPage, I don't think these
> elements are relevant.
> 
> For some time now, we also offer our users the option to use Foxit
> Web SDK to annotate PDFs and upload them to our system.
> A line annotation exported from Foxit looks like this
> 
> 
> http://ns.adobe.com/xfdf/"; xml:space="preserve">
>     
>      date="D:20230802111256+02'00'" flags="print"
>     name="af1a7071-8b53-4219-953b-9ed2fe3e32d4"
> rect="125.326691,718.888428,206.180573,803.442627"
>     title="PKM ServicePoint (conclude GmbH)"
> creationdate="D:20230802111256+02'00'"
>     opacity="1" subject="Linie" style="solid"
> width="2.00"
>     head="None" tail="None"
> start="129.076691,799.692627" end="202.430573,722.638428"
> caption="no">
>      date="D:20230802111256+02'00'" flags="print"
>     name="7133e39c-0ade-4a9e-aabf-
> b4a0e1bcce06" rect="129.076691,722.638428,202.430573,799.692627"
> open="no"/>
>     
>     
>      modified="A3D15A58D538804D652927671920C3A1"/>
> 
> 
> Strangely, when this is imported into the same PDF as the XFDF from
> Adobe Reader (it's a different line of course), the resulting PDF is
> fine.
> I think I need to compare these two in detail.
> 
> Meanwhile, here's how the imported annotation from Adobe Reader looks
> in PDFDebugger
> https://cde-dev.conclude.com/img/line-annot.png
> 
> Thank you very much for the input,
> 
> Kai
> 
> -Ursprüngliche Nachricht-
> Von: sahy...@fileaffairs.de 
> Gesendet: Mittwoch, 2. August 2023 10:22
> An: users@pdfbox.apache.org
> Betreff: Re: Adding annotations to PDFs - "expected a name object"
> 
> CAUTION - External Sender
> 
> 
> Hello Kai,
> 
> Am Mittwoch, dem 02.08.2023 um 07:56 + schrieb Kai Keggenhoff:
> > 
> > 
> > 
> > Hi Tilman,
> > 
> > thank you for the suggestion. Turns out, it's "Line" annotations.
> > When I import
> > 
> > 
> > http://ns.adobe.com/xfdf/"; xml:space="preserve">
> >     
> >     > caption-style="Inline"
> > 
> > creationdate="D:20230526112815+02'00'" flags="print"
> > 
> > start="1770.007568,1614.721436" end="1791.373535,1658.575195"
> >  
> > date="D:20230526112815+
> > 02'00'" page="0"
> > 
> > rect="1764.007568,1608.721436,1797.373535,1664.575195"
> >   subject="Linie"
> > title="John Doe">
> >     > flags="print,nozoom,norotate" open="no"
> > 
> > page="0" rect="1780.690552,-92.00,1964.690552,0.00"/>
> >    
> >     
> > 
> > 
> 
> how was the xfdf generated. Can you compare to an xfdf generated when
> exporting using Adobe Acrobat?
> 
> BR
> Maruan
> 
> 
> > 
> > into either its original source or just a fresh, DIN A4 PDF with a
> > single line of text created with PDFBox, Adobe Reader will
> > complain.
> > At this point, I switched from 2.0.29 to 3.0.0beta1 and the result
> > was the same.
> > 
> > In the resulting PDF, PDF

Re: Adding annotations to PDFs - "expected a name object"

2023-08-02 Thread sahy...@fileaffairs.de

Hi Kai,

so you have an approach you can follow. If there are further issues let
us know.

caption-style is likely using rich text settings which PDFBox doesn't
support.

Maybe you can submit a snippet of the XFDF containing that attribute so
that we can take a look if time permits.

BR
Maruan

Am Mittwoch, dem 02.08.2023 um 09:54 + schrieb Kai Keggenhoff:
> Hi Maruan,
> 
> the process was mentioned in the original mail and is as follows:
> 
> XFDF is parsed to DOM
> DOM is used to create a FDFDocument
> FDFAnnotation(s) are read from
> FDFDocument.getCatalog().getFDF().getAnnotations()
> For each FDFAnnotation, a PDAnnotation is created from the
> FDFAnnotation's.getCOSObject()
> Then the PDAnnotation is added to the list of annotations on the page
> - PDPage.getAnnotations().add(...)
> 
> But I just found that if I remove the "caption-style" attribute from
> the original XFDF and replace it with caption="no" like in the XFDF
> created by Foxit, there are suddenly no more popups about "expected a
> name object" in Adobe Reader.
> 
> I think this might be due to the Adobe Reader versions our customers
> use.
> In the original, complete XFDF from Adobe Reader, which was produced
> by multiple people with different Adobe Readers, there are "freetext"
> annotation stating
> 
> xfa:APIVersion="Acrobat:10.1.5"
> 
> I think that version is like ten years old.
> The freetext annotation isn't a problem at all, I just used that as
> an indicator to the Reader version.
> 
> I guess I will start to look for this "caption-style" attribute and
> replace it as a workaround.
> 
> Thank you for all the pointers !
> 
> Kai
> 
> -Ursprüngliche Nachricht-
> Von: sahy...@fileaffairs.de 
> Gesendet: Mittwoch, 2. August 2023 11:34
> An: users@pdfbox.apache.org
> Betreff: Re: Adding annotations to PDFs - "expected a name object"
> 
> CAUTION - External Sender
> 
> 
> Hello Kai,
> 
> Am Mittwoch, dem 02.08.2023 um 09:24 + schrieb Kai Keggenhoff:
> > Hi Maruan,
> > 
> > we let Adobe Reader submit a form with flags set for XFDF, include
> > annotations and "ExclNonUserAnnots" (table 237 in the PDF 32000-
> > 1:2008 spec)
> > The XFDF is sent to a server address specific for the document and
> > then processed.
> > 
> 
> how does the annottaion created by adobe Reader look in PDF Debugger?
> 
> You also mentioned that it doesn't work for merged PDFs - does it
> work
> for the original PDF wo merging? How is the merging done?
> 
> BR
> Maruan
> 
> 
> > Usually, when sent by Adobe Reader, the full XFDF contains
> > additional
> > "f", "fields" and "id" nodes which I omitted for brevity.
> > Here, I also reformatted it to make it easier to read.
> > As we only read the children of "annots" when we process the XFDF
> > to
> > create annotation objects to add to the PDPage, I don't think these
> > elements are relevant.
> > 
> > For some time now, we also offer our users the option to use Foxit
> > Web SDK to annotate PDFs and upload them to our system.
> > A line annotation exported from Foxit looks like this
> > 
> > 
> > http://ns.adobe.com/xfdf/"; xml:space="preserve">
> >     
> >      > date="D:20230802111256+02'00'" flags="print"
> >     name="af1a7071-8b53-4219-953b-9ed2fe3e32d4"
> > rect="125.326691,718.888428,206.180573,803.442627"
> >     title="PKM ServicePoint (conclude GmbH)"
> > creationdate="D:20230802111256+02'00'"
> >     opacity="1" subject="Linie" style="solid"
> > width="2.00"
> >     head="None" tail="None"
> > start="129.076691,799.692627" end="202.430573,722.638428"
> > caption="no">
> >      > date="D:20230802111256+02'00'" flags="print"
> >     name="7133e39c-0ade-4a9e-aabf-
> > b4a0e1bcce06" rect="129.076691,722.638428,202.430573,799.692627"
> > open="no"/>
> >     
> >     
> >      > modified="A3D15A58D538804D652927671920C3A1"/>
> > 
> > 
> > Strangely, when this is imported into the same PDF as the XFDF from
> > Adobe Reader (it's a different li

Re: empty/missing pdf content

2023-09-20 Thread sahy...@fileaffairs.de

Dear Attila,

both links point to the same file. The link to the PDFBox generated one
is missing.

BR
Maruan

Am Dienstag, dem 19.09.2023 um 20:43 +0200 schrieb Pados Attila:
> Template pdf
> 
> https://drive.google.com/file/d/1mbvN9RDKoesy0tJbj3GCO4VkMPjxYw5c/view?usp=sharing
> 
> Pdf generated with pdfbox 3.0.0 without restricting flatten's input
> fields
> 
> https://drive.google.com/file/d/1mbvN9RDKoesy0tJbj3GCO4VkMPjxYw5c/view?usp=sharing
> 
> there should be a text AB Manuel Test
> 


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Bug report

2023-09-22 Thread sahy...@fileaffairs.de

Hello Morten,

the file uses a cross reference stream and as such this is not PDF/A-1
compliant.

BR
Maruan

Am Freitag, dem 22.09.2023 um 14:29 +0200 schrieb Morten Stulen:
> This email went to the spam folder so I didn't see it before now.
> 
> Here's a link to an example PDF we generate:
> https://drive.google.com/file/d/1vNbfQyWNyJq_mGNPYF9a3CI0X9K9CDSQ/view?usp=sharing
> 
> Morten
> 
> 
> ᐧ
> 
> On Wed, Sep 20, 2023 at 6:48 PM Tilman Hausherr
> 
> wrote:
> 
> > Please upload your PDF somewhere and link to it.
> > 
> > If there is really a cross reference stream, then the error message
> > is
> > correct (and 2.0.29 is at fault), because that is a PDF 1.5 feature
> > and
> > PDF/A-1b is based on 1.4.
> > 
> > Tilman
> > 
> > 
> > On 20.09.2023 17:00, Morten Stulen wrote:
> > > After upgrading to PDFBox 3, the PreflightParser.validate()
> > > function
> > > returns an error.
> > > "Trailer Syntax error, /XRef cross reference streams are not
> > > allowed"
> > > 
> > > Code:
> > > 
> > > fun isValid(bytes: ByteArray): Boolean {
> > >    val fileName = "tmp_${UUID.randomUUID()}.pdf"
> > >    val file = File(fileName)
> > > 
> > >    val document = Loader.loadPDF(bytes)
> > >    document.save(fileName)
> > > 
> > >    // This returns a ValidationError
> > >    val result = PreflightParser.validate(file)
> > > 
> > >     return result.isValid
> > > }
> > > 
> > > The tested PDFs are the same as before the upgrade.
> > > 
> > > 
> > > 
> > > ᐧ
> > > 
> > 
> > 
> > ---
> > --
> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > 
> > 


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: PDFBOX write to form question

2023-09-28 Thread sahy...@fileaffairs.de

Hello Luis,

I've had a quick look at the sources of PdfPig and when I'm not
mistaken you can only read from fields and get the value but not set
one.

So it's not a full C# port (yet).

BR
Maruan

Am Mittwoch, dem 27.09.2023 um 21:12 -0600 schrieb Luis Angel Benitez
Muñoz:
> Hi! I'm working with a different PDF library based on PDFBOX called
> PdfPig,
> how can one access an Acroform field in PDFBOX and write information
> (string, etc) to it?. I'm currently trying to write information to an
> Acroform field but the field appears to be read only, maybe the
> process to
> do this in PDFBOX can give me some guidance to do this in PdfPig.
> 
> Thanks to anyone that can help!


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Little CMS

2023-11-07 Thread sahy...@fileaffairs.de

LittleCMS is bundled inside Java so the version being used depends on
your Java version and is not something PDFBox provides directly. So if
you are really using LittleCMS 2.3 you have a very old JDK running and
not done any updates to that.

With kind regards
Maruan

Am Dienstag, dem 07.11.2023 um 15:40 +0100 schrieb Florian Schlittgen:
> Hi,
> 
> we are using PDFBox in a web application which was recently subjected
> to a penetration test. The tester found out that PDFBox is using
> 'Little CMS' version 2.3.0, at least that's what the metadata of the
> generated PDF says:
> 
> ===
> $ exiftool test.pdf
> […]
> Profile CCM Type  : Little CMS
> Profile Version   : 2.3.0
> […]
> Device Manufacturer : Little CMS
> […]
> Profile Creator   : Little CMS
> […]
> ===
> 
> According to the CVEdetails
> (https://www.cvedetails.com/vulnerability-list/vendor_id-8840/product
> _id-15596/Littlecms-Little-Cms-Color-Engine.html), at least five
> vulnerabilities have been published since the release date of the
> software in 2011. These include CVE-2013-7455, a vulnerability that
> has been given a CVSS rating of 10.0.
> 
> How can this be classified from PDFBox's point of view? How should we
> deal with this security risk or is it possibly not a risk at all?
> 
> Thank you very much for your assessment!
> Best regards, 
> Florian


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Re: Little CMS

2023-11-07 Thread sahy...@fileaffairs.de

Am Dienstag, dem 07.11.2023 um 16:59 +0100 schrieb Florian Schlittgen:
> Thanks for your feedback.
> The Java version I am currently using is corretto-11.0.21, so this is
> the up-to-date version of Java 11.
> Is the assumption correct that the metadata field 'Profile Version'
> reflects the Little CMS version?

I don't know - maybe someone else can shed litle into this.

BR
Maruan

> 
> Kind regards, Florian
> 
> > Am 07.11.2023 um 16:34 schrieb sahy...@fileaffairs.de:
> > 
> > LittleCMS is bundled inside Java so the version being used depends
> > on
> > your Java version and is not something PDFBox provides directly. So
> > if
> > you are really using LittleCMS 2.3 you have a very old JDK running
> > and
> > not done any updates to that.
> > 
> > With kind regards
> > Maruan
> > 
> > Am Dienstag, dem 07.11.2023 um 15:40 +0100 schrieb Florian
> > Schlittgen:
> > > Hi,
> > > 
> > > we are using PDFBox in a web application which was recently
> > > subjected
> > > to a penetration test. The tester found out that PDFBox is using
> > > 'Little CMS' version 2.3.0, at least that's what the metadata of
> > > the
> > > generated PDF says:
> > > 
> > > ===
> > > $ exiftool test.pdf
> > > […]
> > > Profile CCM Type  : Little CMS
> > > Profile Version   : 2.3.0
> > > […]
> > > Device Manufacturer : Little CMS
> > > […]
> > > Profile Creator   : Little CMS
> > > […]
> > > ===
> > > 
> > > According to the CVEdetails
> > > (
> > > https://www.cvedetails.com/vulnerability-list/vendor_id-8840/produ
> > > ct
> > > _id-15596/Littlecms-Little-Cms-Color-Engine.html), at least five
> > > vulnerabilities have been published since the release date of the
> > > software in 2011. These include CVE-2013-7455, a vulnerability
> > > that
> > > has been given a CVSS rating of 10.0.
> > > 
> > > How can this be classified from PDFBox's point of view? How
> > > should we
> > > deal with this security risk or is it possibly not a risk at all?
> > > 
> > > Thank you very much for your assessment!
> > > Best regards, 
> > > Florian
> > 
> > 
> > ---
> > --
> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > 
> 
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Issue with PDFs merged in PDFMerger and printing in Acrobat Reader

2023-11-16 Thread sahy...@fileaffairs.de

Hi,

the PDFs didn't make it to the mailing list. Can you upload these to a
public location?

BR
Maruan

Am Donnerstag, dem 16.11.2023 um 11:49 +0100 schrieb Christian
Puritscher:
>  
> Hello,
>  
> we have an issue with the merged PDF created with the PDFBox
> PDFMerger (problem exists in 2.x and 3.x) in Acrobat Reader (current
> version 2023.006.20380, but also older versions, tested on Windows
> and Mac):
>  
> When we merge two (or more) PDFs then the merged PDF seems fine. For
> example when we merge a 2 page PDF and a 4 page PDF the resulting PDF
> is 6 pages long and displays perfectly fine (the used example PDFs
> have been attached to the mail).
>  
>  
> Also when printing the entire document it works fine. 
>  
> But when we want to print specific pages, Acrobat Reader only allows
> to set a range spanning the pages of the last document added to the
> merged pdf (so in our example when we added a 2 page PDF and a 4 page
> pdf to a 6 page pdf we now can only choose to print a specific page
> from page 1-4, it does not allow any other input, nor print the
> correct pages):
>  
>  
> Using a different PDF Viewer (for example the built-in PFD Viewer
> from Firefox) instead of Acrobat Reader it seems to work fine, but we
> can hardly convince our customers to change their default PDF viewer.
>  
>  
> Any idea how we can handle this issue?
>  
> Thanks!
>  
>  
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Modifying the order of AcroForm Fields and/or associated Widget Annotations...

2024-01-30 Thread sahy...@fileaffairs.de




what is the expected order? Is it by location, top left to bottom
right? Calculation order ...

Never heard that order matters for flattening. Is the proprietry
software throwing any errors which would be a hint?


BR
Maruan

Am Dienstag, dem 30.01.2024 um 15:27 -0600 schrieb Dwayne Parks:
> Hello list!
> 
> I'm dealing with a proprietary software product that accepts PDFs
> with 
> fields in them to "flatten" into a final output PDF.  The difficulty
> is 
> that it expects the ordering of the fields (or their associated
> widgets) 
> to be in a certain order.  I don't know the exact details of this,
> but 
> it takes much trial and error for our folks here manually deleting
> and 
> recreating fields, trying them and seeing if they are accepted.
> 
> So, to greatly streamline the process of getting the field/widget 
> content in the PDF files in a correct order, I would like to write a 
> utility that takes a configuration file containing a list of Field
> Names 
> and reorders the content in the PDF to match the order they are in
> the 
> configuration file.
> 
> My naive initial idea is to:
> 
>    - Write a utility that outputs the current list of fields (in the
>  PDF in the order that they are there) into a config file
>    - Allow a user to reorder the lines of field names as desired
>    - Write a utility that takes the config file and the PDF and
>  rebuilds the field list/tree in the order that the config file
>  specifies... then writes out the updated PDF contents to a new
>  PDF file
> 
> Alternately, I believe that there is an order for forms/widgets that
> is 
> specified in Adobe Acrobat (tab order?) that I might be able to try
> to 
> try to recreate.  I'm not sure if that will work, but it would allow 
> non-technical users to define the needed order without intervention
> from 
> technical staff.
> 
> I realize that there might be issues with combined field/widget
> fields 
> if it comes to needing to order the widgets instead, but I am wanting
> to 
> start with the above and go from there.
> 
> So, I have a few questions to start with that someone might be able
> to 
> help me out with!
> 
> - Are there any examples of doing this sort of order modification?
> - Is it possible to reorder field contents at the PDDocument /
>    PDAcroForm / PDField level?
> - Is it possible to reorder widget annotations at the PDAnnotiation /
>    PDAnnotationWidget level?
> - Do I need to drop down to the COS* object level to do this?
> 
> Thanks in advance for any pointers, info or suggestions!
> 
> - Dwayne
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Modifying the order of AcroForm Fields and/or associated Widget Annotations...

2024-01-31 Thread sahy...@fileaffairs.de

> 
> No errors are thrown as the proprietary software will happily prompt
> the 
> user for the user-defined fields, but... it is adding hours to the
> form 
> updating time and starting to drive our semi-technical people crazy.
> 
> One other approach is to figure out how to force the order of the
> fields 
> in Acrobat (which can be changed by dragging the fields up/down to 
> position them in the list of field names) to be "honored" when it
> writes 
> out the PDF contents to a file.  It doesn't appear to do so.  And it 
> also sometimes creates Fields with Widgets as "Kids" and fields with
> the 
> Widget data combined with the Field data when new fields are created
> via 
> copy/paste...  all of this I had hoped to handle with a "cleanup" 
> utility that would take the user-edited PDFs as a source and create 
> cleaned up PDFs as separate output files.
> 
> I hope that that makes more sense on the why.  Thanks for
> listening!!!
> 
> - Dwayne
> 
> On 1/30/2024 3:33 PM, sahy...@fileaffairs.de wrote:
> > 
> > 
> > what is the expected order? Is it by location, top left to bottom
> > right? Calculation order ...
> > 
> > Never heard that order matters for flattening. Is the proprietry
> > software throwing any errors which would be a hint?
> > 
> > 
> > BR
> > Maruan
> > 
> > Am Dienstag, dem 30.01.2024 um 15:27 -0600 schrieb Dwayne Parks:
> > > Hello list!
> > > 
> > > I'm dealing with a proprietary software product that accepts PDFs
> > > with
> > > fields in them to "flatten" into a final output PDF.  The
> > > difficulty
> > > is
> > > that it expects the ordering of the fields (or their associated
> > > widgets)
> > > to be in a certain order.  I don't know the exact details of
> > > this,
> > > but
> > > it takes much trial and error for our folks here manually
> > > deleting
> > > and
> > > recreating fields, trying them and seeing if they are accepted.
> > > 
> > > So, to greatly streamline the process of getting the field/widget
> > > content in the PDF files in a correct order, I would like to
> > > write a
> > > utility that takes a configuration file containing a list of
> > > Field
> > > Names
> > > and reorders the content in the PDF to match the order they are
> > > in
> > > the
> > > configuration file.
> > > 
> > > My naive initial idea is to:
> > > 
> > >    - Write a utility that outputs the current list of fields (in
> > > the
> > >  PDF in the order that they are there) into a config file
> > >    - Allow a user to reorder the lines of field names as desired
> > >    - Write a utility that takes the config file and the PDF and
> > >  rebuilds the field list/tree in the order that the config
> > > file
> > >  specifies... then writes out the updated PDF contents to a
> > > new
> > >  PDF file
> > > 
> > > Alternately, I believe that there is an order for forms/widgets
> > > that
> > > is
> > > specified in Adobe Acrobat (tab order?) that I might be able to
> > > try
> > > to
> > > try to recreate.  I'm not sure if that will work, but it would
> > > allow
> > > non-technical users to define the needed order without
> > > intervention
> > > from
> > > technical staff.
> > > 
> > > I realize that there might be issues with combined field/widget
> > > fields
> > > if it comes to needing to order the widgets instead, but I am
> > > wanting
> > > to
> > > start with the above and go from there.
> > > 
> > > So, I have a few questions to start with that someone might be
> > > able
> > > to
> > > help me out with!
> > > 
> > > - Are there any examples of doing this sort of order
> > > modification?
> > > - Is it possible to reorder field contents at the PDDocument /
> > >    PDAcroForm / PDField level?
> > > - Is it possible to reorder widget annotations at the
> > > PDAnnotiation /
> > >    PDAnnotationWidget level?
> > > - Do I need to drop down to the COS* object level to do this?
> > > 
> > > Thanks in advance for any pointers, info or suggestions!
> > > 
> > > - Dwayne
> > > 
> > > -
> > > 
> > > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > > 
> > 
> > 
> > ---
> > --
> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > 
> 
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Modifying the order of AcroForm Fields and/or associated Widget Annotations...

2024-01-31 Thread sahy...@fileaffairs.de



added note: as Tabs is defined on a Page level it's clear that this
addresses the Annotation and not the form field as a form field can
have multiple representations on the same and/or multiple pages. The
visual part of the form field is defined by the annotation. The form
field itself i.e. without the annotation(s) doesn't have a physical
representation. 

Am Mittwoch, dem 31.01.2024 um 18:22 +0100 schrieb
sahy...@fileaffairs.de:
> Dear Dwayne,
> 
> for a generic solution reordering the fields won't help.
> 
> A field can be nested inside a field tree but let's say one of the
> nested fields is on top of the page and the other is on the bottom of
> the page. Now another nested structure might have fields in between.
> You can not move fields out of a nested structure to match the
> physical
> order as this will have consequences such as naming etc.
> 
> E.g.
> 
> Visual Order on the page
> 
>   [Policy.PolicyNumber]    [Date]
>   [Policy.PolicyName]
> 
> 
> from that you'd need to prompt for Policy.PolicyNumber, Date,
> Policy.PolicyName
> 
> 
> 
> AcroForm Order
> 
>   Policy
>   PolicyNumber
>   PolicyName
>   Date
> 
> You can not move PolicyName below Date as it's nested inside Policy.
> If
> you move PolicyName out the structure would be
> 
>   Policy
>   PolicyNumber
>   Date
>   PolicyName 
> 
> But that changes a) the fully qualified name of the field and as
> childs
> can inherit from parents moving might miss properties only defined in
> Policy.
> 
> 
> The UI application needs to follow the definition of the visual order
> as specified in the PDF. This should depend on the (Widget)
> annotations
> location on the page as this defines the physical location and is
> what
> you are looking for.
> 
> There is also a (optional) Tabs key inside the Page dictionary which
> can define the order the application should follow when tabbing
> through
> the (visual appearance) of the fields. 
> 
> from the spec:
> 
> "R (row order), C (column order), and S (structure
> order). Beginning with PDF 2.0, additional values also include A
> (annotations array order) and W (widget order). Annotations array
> order refers to the order of the annotation enumerated in the Annots
> entry of the Page dictionary (see "Table 31 — Entries in a page
> object"). Widget order means using the same array ordering but
> making two passes, the first only picking the widget annotations and
> the second picking all other annotations."
> 
> 
> Now if the proprietry software doesn't follow these rules what about
> parsing the PDF and generating the "prompt" list instead of doing it
> manually. Generating can be done by looking at the physical location
> of
> the Widget annotations associated to a particular form field so you'd
> be able to generate the field list the way they appear in the PDF and
> feed that into your configuration for the form.
> 
> BR
> Maruan
>   
> 
> Am Dienstag, dem 30.01.2024 um 18:40 -0600 schrieb Dwayne Parks:
> > I am almost certain that the expected order is basically top-left
> > to 
> > bottom-right, yes.  Currently there is no calculation being used
> > that
> > I 
> > know of.
> > 
> > Flattening:  The issue isn't in the actual flattening itself.  I
> > need
> > to 
> > explain more about the way the PDFs are used.
> > 
> > The proprietary software is running as a web service where we
> > upload 
> > multiple "forms" in PDF form as a library.  At the simplest level,
> > the 
> > fields on the form are one of two types.
> > 
> > Field Type 1 is an internal field name that the software matches to
> > internal data that it uses to set the field's value.  Say, if the
> > field 
> > name is "Policy.PolicyNumber" then it sets the field's contents to
> > its 
> > internal data for the Policy # data that it has... and that is what
> > it 
> > uses when it flattens the PDF.
> > 
> > Field Type 2 has a user-defined field name and the software (during
> > the 
> > process of generating the output PDF, before flattening the fields)
> > prompts the user for each user-defined field's contents that will
> > be 
> > used during the flattening.
> > 
> > There is a configuration page for each form that allows some
> > control 
> > over the prompting of data from the user (validation constraints, 
> > descriptive names for prompts, etc.) and a basic way to reorder the
> &g

Re: How to search for / extract text of form field

2024-03-27 Thread sahy...@fileaffairs.de

Am Mittwoch, dem 27.03.2024 um 08:01 + schrieb Paul Grütter:
> Hello Gilad,
> 
> Thank you.
> 
> Maruan Sahyoun already contacted me with the same tip. It works fine
> but only because we use PDFBox only for rendering and text extraction
> at the moment. If we would use it for other use cases, especially for
> filling in form fields, we would have to create a copy of the
> document for text extraction which is of obviously not optimal in a
> web application that may have multiple documents open at the same
> time.

You could fill the form using PDFBox store it if you have to keep a
copy with the form fields and flatten afterwards and then do the
extraction.

Or you do the text extraction and in addition iterate the form fields
and get the content and location of the form fields widget.

BR
Maruan

> 
> Kind regards,
> 
> Dipl.-Ing. (FH) Paul Grütter
> Head of Development
> 
> 
>  
> signotec GmbH
> Am Gierath 20b
> 40885 Ratingen (Germany)
> 
> Tel.: +49 2102 53575-10
> Fax: +49 2102 53575-39
> 
> E-Mail: paul.gruet...@signotec.de
> URL: www.signotec.com
> 
> Amtsgericht Düsseldorf: HRB 44307
> Geschäftsführung/CEO: Arne Brandes
> 
> 
> Mit freundlichen Grüßen
> 
> Dipl.-Ing. (FH) Paul Grütter
> Leiter Entwicklung
> 
> 
>  
> signotec GmbH
> Am Gierath 20b
> 40885 Ratingen
> 
> Tel.: +49 2102 53575-10
> Fax: +49 2102 53575-39
> 
> E-Mail: mailto:paul.gruet...@signotec.de
> URL: https://www.signotec.com/
> 
> Amtsgericht Düsseldorf: HRB 44307
> Geschäftsführung/CEO: Arne Brandes
> 
> Von: Gilad Denneboom  
> Gesendet: Sonntag, 24. März 2024 22:50
> An: paul.gruet...@signotec.de.invalid
> Cc: users@pdfbox.apache.org
> Betreff: Re: How to search for / extract text of form field
> 
> 
> Sie erhalten nicht oft eine E-Mail von
> mailto:gilad.denneb...@gmail.com.
> https://aka.ms/LearnAboutSenderIdentification
> 
> Flatten the form fields before searching the file if you want
> PDFTextStripper to find the text in them.
> 
> On Thu, Mar 21, 2024 at 12:10 PM Paul Grütter
>  wrote:
> Hello list,
>  
> I want to search for words in a PDF document and get their positions.
> It seems that PDFBox ignores text which has been entered into a form
> field although it’s rendered correctly. I can be reproduced easily
> with the standalone app:
>  
> java -jar pdfbox-app-3.0.2.jar export:text -i=Test.pdf
> java -jar pdfbox-app-3.0.2.jar render -i=Test.pdf
>  
> The Acrobat both finds and extracts text which have been entered into
> a form field.
>  
> In my code I use PDFTextStripper. I haven’t found any way to
> configure the behaviour. Is it a bug or have I overlooked something?
> For clarification: I don’t want to search for the value (‘V’) but its
> visual representation (‘AP’).
>  
> Kind regards,
>  
> Dipl.-Ing. (FH) Paul Grütter
> Head of Development
>  
> 
>  
> signotec GmbH
> Am Gierath 20b
> 40885 Ratingen (Germany)
>  
> Tel.: +49 2102 53575-10
> Fax: +49 2102 53575-39
>  
> E-Mail: mailto:paul.gruet...@signotec.de
> URL: http://www.signotec.com/
> 
> Amtsgericht Düsseldorf: HRB 44307
> Geschäftsführung/CEO: Arne Brandes
> 
>  
> 
>  
> 
> -
> To unsubscribe, e-mail: mailto:users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: mailto:users-h...@pdfbox.apache.org
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Problem finding an AcroForm field

2024-05-02 Thread sahy...@fileaffairs.de

Hi,

can you upload the PDF in question to a public location to take a view.
Attachments won't work for the mailing list.

BR
Maruan

Am Donnerstag, dem 02.05.2024 um 12:01 +0200 schrieb Ulf Dittmer:
> Hi-
> 
> I'm running the PrintFields example code
> (https://svn.apache.org/repos/asf/pdfbox/trunk/examples/src/main/java
> /org/apache/pdfbox/examples/interactive/form/PrintFields.java) to
> find all the form field names for a PDF, but it's missing a checkbox
> that I'd need to set.
> 
> The checkbox in question is on page 5, no. 46 "pro Monat". The "pro
> Stunde" checkbox is there, as are the two text fields.
> 
> The relevant output of PrintFields is
> 
> txtf_46_Entgelt_pro_Stunde
> |--txtf_46_Entgelt_pro_Stunde.txtf_46_Entgelt_pro_Stunde = ,
>  type=org.apache.pdfbox.pdmodel.interactive.form.PDTextField
>  alternate name=46 - Höhe und Berechnungsart des Arbeitsentgelts -
> Entgelt pro Stunde (brutto in Euro), mapping name=null
> flags=8388608, isNoExport=false, isReadOnly=false, isRequired=false
> chbx_46_Arbeitsentgelt
> |--chbx_46_Arbeitsentgelt.chbx_46_Arbeitsentgelt = Off,
>  type=org.apache.pdfbox.pdmodel.interactive.form.PDCheckBox
>  alternate name=46 - Höhe und Berechnungsart des Arbeitsentgelts,
> mapping name=null
> flags=0, isNoExport=false, isReadOnly=false, isRequired=false
> txtf_46_Entgelt_pro_Monat
> |--txtf_46_Entgelt_pro_Monat.txtf_46_Entgelt_pro_Monat = ,
>  type=org.apache.pdfbox.pdmodel.interactive.form.PDTextField
>  alternate name=46 - Höhe und Berechnungsart des Arbeitsentgelts -
> Entgelt pro Monat (brutto in Euro), mapping name=null
> flags=8388608, isNoExport=false, isReadOnly=false, isRequired=false
> 
> Is the PDF broken in some way, or am I missing something?
> 
> Let me know if I can supply any further information. I'd be thankful
> for any additional information.
> 
> Ulf
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Problem finding an AcroForm field

2024-05-02 Thread sahy...@fileaffairs.de



That's chbx_46_Arbeitsentgelt

BR
Maruan

Am Donnerstag, dem 02.05.2024 um 12:15 +0200 schrieb Ulf Dittmer:
> Sorry, I didn't realize that. It's a form from a German government
> agency,
> and can be found at
> 
> https://www.arbeitsagentur.de/datei/erklaerung-zum-beschaeftigungsverhaeltnis_ba047549.pdf
> 
> Ulf
> 
> On Thu, May 2, 2024 at 12:05 PM sahy...@fileaffairs.de <
> sahy...@fileaffairs.de> wrote:
> 
> > Hi,
> > 
> > can you upload the PDF in question to a public location to take a
> > view.
> > Attachments won't work for the mailing list.
> > 
> > BR
> > Maruan
> > 
> > Am Donnerstag, dem 02.05.2024 um 12:01 +0200 schrieb Ulf Dittmer:
> > > Hi-
> > > 
> > > I'm running the PrintFields example code
> > > (
> > > https://svn.apache.org/repos/asf/pdfbox/trunk/examples/src/main/ja
> > > va
> > > /org/apache/pdfbox/examples/interactive/form/PrintFields.java) to
> > > find all the form field names for a PDF, but it's missing a
> > > checkbox
> > > that I'd need to set.
> > > 
> > > The checkbox in question is on page 5, no. 46 "pro Monat". The
> > > "pro
> > > Stunde" checkbox is there, as are the two text fields.
> > > 
> > > The relevant output of PrintFields is
> > > 
> > > txtf_46_Entgelt_pro_Stunde
> > > > --txtf_46_Entgelt_pro_Stunde.txtf_46_Entgelt_pro_Stunde = ,
> > >  type=org.apache.pdfbox.pdmodel.interactive.form.PDTextField
> > >  alternate name=46 - Höhe und Berechnungsart des Arbeitsentgelts
> > > -
> > > Entgelt pro Stunde (brutto in Euro), mapping name=null
> > > flags=8388608, isNoExport=false, isReadOnly=false,
> > > isRequired=false
> > > chbx_46_Arbeitsentgelt
> > > > --chbx_46_Arbeitsentgelt.chbx_46_Arbeitsentgelt = Off,
> > >  type=org.apache.pdfbox.pdmodel.interactive.form.PDCheckBox
> > >  alternate name=46 - Höhe und Berechnungsart des Arbeitsentgelts,
> > > mapping name=null
> > > flags=0, isNoExport=false, isReadOnly=false, isRequired=false
> > > txtf_46_Entgelt_pro_Monat
> > > > --txtf_46_Entgelt_pro_Monat.txtf_46_Entgelt_pro_Monat = ,
> > >  type=org.apache.pdfbox.pdmodel.interactive.form.PDTextField
> > >  alternate name=46 - Höhe und Berechnungsart des Arbeitsentgelts
> > > -
> > > Entgelt pro Monat (brutto in Euro), mapping name=null
> > > flags=8388608, isNoExport=false, isReadOnly=false,
> > > isRequired=false
> > > 
> > > Is the PDF broken in some way, or am I missing something?
> > > 
> > > Let me know if I can supply any further information. I'd be
> > > thankful
> > > for any additional information.
> > > 
> > > Ulf
> > > 
> > > -
> > > 
> > > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > 
> > 
> > ---
> > --
> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > 
> > 


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Problem finding an AcroForm field

2024-05-02 Thread sahy...@fileaffairs.de




Am Donnerstag, dem 02.05.2024 um 12:56 +0200 schrieb Ulf Dittmer:
> Oh, so the PDF is sort of broken? The doc info says it was created
> using
> Adobe's PDF library using Adobe InDesign :-)
> 

Hi,

you can use it like this:

PDCheckBox field = (PDCheckBox)
acroForm.getField("chbx_46_Arbeitsentgelt");
field.setValue("pro Stunde");
testPdf.save("proStunde.pdf");
field.setValue("pro Monat");
testPdf.save("proMonat.pdf");

BR
Maruan

> Ulf
> 
> On Thu, May 2, 2024 at 12:51 PM Tilman Hausherr
> 
> wrote:
> 
> > It's a radio button but without the radio flag?!
> > 
> > Tilman
> > 
> > On 02.05.2024 12:42, Ulf Dittmer wrote:
> > > Yes, that's the one for the "pro Stunde" option. But the one for
> > > the "pro
> > > Monat" option is missing.
> > > 
> > > They're both connected, in that checking one manually will
> > > uncheck the
> > > other. But setting *any* value programmatically only causes the
> > > first one
> > > to be set.
> > > 
> > > Ulf
> > > 
> > > On Thu, May 2, 2024 at 12:38 PM sahy...@fileaffairs.de <
> > > sahy...@fileaffairs.de> wrote:
> > > 
> > > > That's chbx_46_Arbeitsentgelt
> > > > 
> > > > BR
> > > > Maruan
> > > > 
> > > > Am Donnerstag, dem 02.05.2024 um 12:15 +0200 schrieb Ulf
> > > > Dittmer:
> > > > > Sorry, I didn't realize that. It's a form from a German
> > > > > government
> > > > > agency,
> > > > > and can be found at
> > > > > 
> > > > > 
> > > > 
> > https://www.arbeitsagentur.de/datei/erklaerung-zum-beschaeftigungsverhaeltnis_ba047549.pdf
> > > > > Ulf
> > > > > 
> > > > > On Thu, May 2, 2024 at 12:05 PM sahy...@fileaffairs.de <
> > > > > sahy...@fileaffairs.de> wrote:
> > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > can you upload the PDF in question to a public location to
> > > > > > take a
> > > > > > view.
> > > > > > Attachments won't work for the mailing list.
> > > > > > 
> > > > > > BR
> > > > > > Maruan
> > > > > > 
> > > > > > Am Donnerstag, dem 02.05.2024 um 12:01 +0200 schrieb Ulf
> > > > > > Dittmer:
> > > > > > > Hi-
> > > > > > > 
> > > > > > > I'm running the PrintFields example code
> > > > > > > (
> > > > > > > https://svn.apache.org/repos/asf/pdfbox/trunk/examples/src/main/ja
> > > > > > > va
> > > > > > > /org/apache/pdfbox/examples/interactive/form/PrintFields.
> > > > > > > java) to
> > > > > > > find all the form field names for a PDF, but it's missing
> > > > > > > a
> > > > > > > checkbox
> > > > > > > that I'd need to set.
> > > > > > > 
> > > > > > > The checkbox in question is on page 5, no. 46 "pro
> > > > > > > Monat". The
> > > > > > > "pro
> > > > > > > Stunde" checkbox is there, as are the two text fields.
> > > > > > > 
> > > > > > > The relevant output of PrintFields is
> > > > > > > 
> > > > > > > txtf_46_Entgelt_pro_Stunde
> > > > > > > > --txtf_46_Entgelt_pro_Stunde.txtf_46_Entgelt_pro_Stunde
> > > > > > > > = ,
> > > > > > >  
> > > > > > > type=org.apache.pdfbox.pdmodel.interactive.form.PDTextFie
> > > > > > > ld
> > > > > > >   alternate name=46 - Höhe und Berechnungsart des
> > > > > > > Arbeitsentgelts
> > > > > > > -
> > > > > > > Entgelt pro Stunde (brutto in Euro), mapping name=null
> > > > > > > flags=8388608, isNoExport=false, isReadOnly=false,
> > > > > > > isRequired=false
> > > > > > > chbx_46_Arbeitsentgelt
> > > > > > > > --chbx_46_Arbeitsentgelt.chbx_46_Arbeitsentgelt = Off,
> > > > > > >  
> > > > > > > type=org.apache.pdfbox.pdmodel.inte

Re: Problem finding an AcroForm field

2024-05-02 Thread sahy...@fileaffairs.de



this will not work in this case as you already found. To get a list of
possible values to switch the checkbox to the "ON" state you can use 

checkbox.getOnValues()

BR
Maruan

Am Donnerstag, dem 02.05.2024 um 16:20 +0200 schrieb Ulf Dittmer:
> Thanks, I will give that a try. So far, I had used the code from the
> SetField example, which does this (and failed in this case):
> 
>   PDCheckBox checkbox = (PDCheckBox) field;
>     if (value.isEmpty())
>     {
>     checkbox.unCheck();
>     }
>     else
>     {
>     checkbox.check();
>     }
> 
> On Thu, May 2, 2024 at 3:17 PM sahy...@fileaffairs.de <
> sahy...@fileaffairs.de> wrote:
> 
> > 
> > 
> > Am Donnerstag, dem 02.05.2024 um 12:56 +0200 schrieb Ulf Dittmer:
> > > Oh, so the PDF is sort of broken? The doc info says it was
> > > created
> > > using
> > > Adobe's PDF library using Adobe InDesign :-)
> > > 
> > 
> > Hi,
> > 
> > you can use it like this:
> > 
> > PDCheckBox field = (PDCheckBox)
> > acroForm.getField("chbx_46_Arbeitsentgelt");
> > field.setValue("pro Stunde");
> > testPdf.save("proStunde.pdf");
> > field.setValue("pro Monat");
> > testPdf.save("proMonat.pdf");
> > 
> > BR
> > Maruan
> > 
> > > Ulf
> > > 
> > > On Thu, May 2, 2024 at 12:51 PM Tilman Hausherr
> > > 
> > > wrote:
> > > 
> > > > It's a radio button but without the radio flag?!
> > > > 
> > > > Tilman
> > > > 
> > > > On 02.05.2024 12:42, Ulf Dittmer wrote:
> > > > > Yes, that's the one for the "pro Stunde" option. But the one
> > > > > for
> > > > > the "pro
> > > > > Monat" option is missing.
> > > > > 
> > > > > They're both connected, in that checking one manually will
> > > > > uncheck the
> > > > > other. But setting *any* value programmatically only causes
> > > > > the
> > > > > first one
> > > > > to be set.
> > > > > 
> > > > > Ulf
> > > > > 
> > > > > On Thu, May 2, 2024 at 12:38 PM sahy...@fileaffairs.de <
> > > > > sahy...@fileaffairs.de> wrote:
> > > > > 
> > > > > > That's chbx_46_Arbeitsentgelt
> > > > > > 
> > > > > > BR
> > > > > > Maruan
> > > > > > 
> > > > > > Am Donnerstag, dem 02.05.2024 um 12:15 +0200 schrieb Ulf
> > > > > > Dittmer:
> > > > > > > Sorry, I didn't realize that. It's a form from a German
> > > > > > > government
> > > > > > > agency,
> > > > > > > and can be found at
> > > > > > > 
> > > > > > > 
> > > > > > 
> > > > 
> > https://www.arbeitsagentur.de/datei/erklaerung-zum-beschaeftigungsverhaeltnis_ba047549.pdf
> > > > > > > Ulf
> > > > > > > 
> > > > > > > On Thu, May 2, 2024 at 12:05 PM sahy...@fileaffairs.de <
> > > > > > > sahy...@fileaffairs.de> wrote:
> > > > > > > 
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > can you upload the PDF in question to a public location
> > > > > > > > to
> > > > > > > > take a
> > > > > > > > view.
> > > > > > > > Attachments won't work for the mailing list.
> > > > > > > > 
> > > > > > > > BR
> > > > > > > > Maruan
> > > > > > > > 
> > > > > > > > Am Donnerstag, dem 02.05.2024 um 12:01 +0200 schrieb
> > > > > > > > Ulf
> > > > > > > > Dittmer:
> > > > > > > > > Hi-
> > > > > > > > > 
> > > > > > > > > I'm running the PrintFields example code
> > > > > > > > > (
> > > > > > > > > 
> > https://svn.apache.org/repos/asf/pdfbox/trunk/examples/src/main/ja
> > > > > > > > >

Re: Radio Button not set correctly

2024-05-06 Thread sahy...@fileaffairs.de

Dear Martin,

Am Montag, dem 06.05.2024 um 15:53 +0200 schrieb Martin Resch:
> sorry, PDF attached
> 

could you upload the PDF to a public location as the mailing list
doesn't support attachments.

which version of PDFBox are you using?

BR
Maruan


> 
> > Martin Resch  hat am 06.05.2024 15:49 CEST
> > geschrieben:
> > 
> > 
> > Hi,
> > 
> > I am reading a PDF, want to set one of the radio buttons included
> > in the PDF and save the PDF in a new file.
> > 
> > My PDF has a group of two radio buttons included. The field name is
> > Formular1[0].Seite1[0].TF_P[0].Optionsfeldliste[0]
> > Valid values are:[1, 2] and Off
> > 
> > 
> > This is my code:
> > String filename = "AU_Erklaerung_final.pdf";
> > PDDocument pdfDocument = Loader.loadPDF(new File(filename));
> > PDAcroForm acroForm =
> > pdfDocument.getDocumentCatalog().getAcroForm();
> > acroForm.getField(“Formular1[0].Seite1[0].TF_P[0].Optionsfeldliste[
> > 0]”).setValue(“1”);
> > //
> > acroForm.getField("Formular1[0].Seite1[0].TF_P[0].Optionsfeldliste[
> > 0]").setValue("2");// ((PDRadioButton)
> > acroForm.getField("Formular1[0].Seite1[0].TF_P[0].Optionsfeldliste[
> > 0]")).setValue(0);// ((PDRadioButton)
> > acroForm.getField("Formular1[0].Seite1[0].TF_P[0].Optionsfeldliste[
> > 0]")).setValue(1); pdfDocument.save(filename +
> > System.currentTimeMillis() + “.pdf”);
> > 
> > 
> > I have tried:
> > pdfDocument.getDocumentCatalog().getAcroForm().getField(“Formular1[
> > 0].Seite1[0].TF_P[0].Optionsfeldliste[0]”).setValue(“1”);
> > pdfDocument.getDocumentCatalog().getAcroForm().getField(“Formular1[
> > 0].Seite1[0].TF_P[0].Optionsfeldliste[0]”).setValue(“2”);
> > // and
> > ((PDRadioButton)pdfDocument.getDocumentCatalog().getAcroForm().getF
> > ield(“Formular1[0].Seite1[0].TF_P[0].Optionsfeldliste[0]”)).setValu
> > e(0);
> > ((PDRadioButton)pdfDocument.getDocumentCatalog().getAcroForm().getF
> > ield(“Formular1[0].Seite1[0].TF_P[0].Optionsfeldliste[0]”)).setValu
> > e(1);
> > 
> > Regardless which value I set (either via string or int), it is
> > always the same radio buttion select in my PDF (the lower one).
> > 
> > 
> > Can anybody support? Thanks a lot in advance!
> > 
> > Best regards
> > Martin
> > 
> > ---
> > --
> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@pdfbox.apache.org
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Radio Button not set correctly

2024-05-28 Thread sahy...@fileaffairs.de

Am Dienstag, dem 28.05.2024 um 18:13 +0200 schrieb Tilman Hausherr:
>  
> On 28.05.2024 17:27, Martin Resch wrote:
>  
>  
> >  
> > Hi Tilman,
> > 
> > thanks a lot for the analysis!
> > 
> > So I am assuming correct that you will raise a bug ticket?
> >  
>  
> https://issues.apache.org/jira/browse/PDFBOX-5831
>  
> I'll wait a day or two because I'm not the acroform guy, and then
> I'll fix it myself.
>  
> >  
> > I am not the creator of this PDF. We have to deal with the official
> > PDFs provided by the government institution that you mentioned.
> > More PDFs can be found here:
> > https://www.bundesfreiwilligendienst.de/service/downloads
> > Hope that they haven’t more that fishy PDFs published.
> > 
> >  
> > >  
> > > I changed the PDF so that the /Opt entry has "A" and "B" instead
> > > of "1" and "2" and then it works.
> > >  
> >  
> > For my knowledge in the meantime for a potential workaround: how
> > did you do that?
> >  
>  
> I edited the PDF with NOTEPAD++. But it should also be able this way:
>  
> field.setExportValues(List.of("A", "B"));
>  
> the field must be cast to a PDRadioButton. However I noticed that
> after doing this change, I was no longer able to edit it with Adobe
> Reader.
>  
> Another problem is that I get a sort of error message:
>  
>  
> This might be because of UR3 usage rights signature.

to avoid that an incremental save should be used

>  
>  
> Tilman
>  
>  
> 
>  
>  
> 
>  
>  
> 
>  
>  
> >  
> > 
> > Best regards
> > Martin
> > 
> > 
> >  
> > >  
> > > Am 28.05.2024 um 05:24 schrieb Tilman Hausherr
> > > :
> > > 
> > > On 27.05.2024 21:01, Tilman Hausherr wrote:
> > >  
> > > >  
> > > > I'll have another look tomorrow when I'm more awake. I just
> > > > looked and it happens like you wrote. I traced through the code
> > > > and it seemed to work properly, i.e. going through different
> > > > paths for "1" and "2" (looking for dictionary elements 0 and 1)
> > > > but the result was always the same which contradicts the
> > > > observation, but that is the fascination in debugging.
> > > >  
> > >  
> > > I had another look. The values from the /Opt entry are 1 and 2,
> > > the values at the dictionary level are 0 and 1. Our software
> > > somehow gets confused when 1 is used because it appears in both:
> > > when the value "1" is set, then PDButton.updateByOption() is
> > > called twice (!!!), once with value 1 and once with 2.
> > > 
> > > I changed the PDF so that the /Opt entry has "A" and "B" instead
> > > of "1" and "2" and then it works.
> > > 
> > > So I'd say it's a PDFBox bug. The good thing is that the
> > > copyright of that PDF would be with a government institution
> > > (BAFzA) so we can use it as a test.
> > > 
> > > Are you the creator of that PDF?
> > > 
> > > Tilman
> > > 
> > > 
> > > -
> > > 
> > > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > > 
> > >  
> >  
> > 
> > ---
> > --
> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > 
> >  
>  
> 
>  
>  


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: File size issue when we use arial-unicode-ms font with pdfbox

2024-05-30 Thread sahy...@fileaffairs.de

Hi,

keep in mind that if you'd like the consumer of the PDF to change the
prefilled data a font subset will not do as ther might be chars missing
the user would like to enter.

BR
Maruan

Am Donnerstag, dem 30.05.2024 um 20:05 +0200 schrieb Tilman Hausherr:
> Hi,
> 
> It's not possible and yes, it's a weakness. I think I had some wild
> hack 
> years ago that needed a change in PDFBox itself but I can't find it
> anymore.
> 
> What might work if you can build from source: expand PDTextField so
> you 
> can pass a font (which you create with the subset flag on); pass this
> font to AppearanceGeneratorHelper; there change the line "PDFont font
> = 
> defaultAppearance.getFont();"; when done with the font, call
> font.subset().
> 
> Tilman
> 
> On 30.05.2024 17:36, Anil Basavaraju wrote:
> > Hi Pdfbox team,
> > 
> > We have a requirement to prefill the pdf with different languages
> > including CJK (Chinese, Japanese, Korean) and some special
> > characters.
> > So we embedded arial-unicode-ms.ttf with pdfbox. The ttf file is
> > about 22.7 MB. We are able to prefill the pdf form with the
> > languages and special characters.
> > But the issue is the size of the pdf file becomes around 27 MB.
> > Before prefilling the pdf file size is around 2 MB.
> > We need to upload this pdf file after prefilling, which is taking
> > more time which is not a good user experience.
> > Is there any way where we can reduce the pdf file size using
> > pdfbox?
> > 
> > Note: If we use any other font/ttf file which is smaller in size,
> > it is not supporting CJK.
> > 
> > Thanks,
> > Anil
> > 
> 
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
> 


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Separating PDF Content in to Layers (OCGs)

2024-06-23 Thread sahy...@fileaffairs.de

Also note that text might be contained in interactive form fields and
annotations. Don't know if you'd like to treat that text content as a
text layer too. 

BR
Maruan

Am Sonntag, dem 23.06.2024 um 10:32 -0700 schrieb Kevin Day:
> PS I should clarify: if your PDF generator actually specifies OCG
> layers,
> then you may be able to use that. Just know that all PDFs do *not*
> have OCG
> layers. So if you are creating a general purpose tool, you will need
> to
> handle the content stream operations.
> 
> I don't have experience with the OCG features in PDFBox, so I'll
> leave it
> to others to comment on how to do that if your source documents for
> sure
> have OCG data.
> 
> On Sun, Jun 23, 2024, 10:25 AM Kevin Day 
> wrote:
> 
> > Certainly possible. Not simple, though - and I don't think you will
> > find
> > sample code... PDFs don't have "layers" like you are suggesting -
> > they just
> > have a sequence of operations. You will need to interpret those
> > operations.
> > 
> > As a general strategy, you'll want to process the operations in the
> > content stream. Anything between a BT and ET operator will be text
> > related.
> > Everything else will be image or vector operations.
> > 
> > It will probably be easiest to think of this as a filtering
> > operation. So
> > you will want to suppress every operation between BT and ET to
> > create your
> > image version of the PDF - but leave everything else alone.
> > 
> > Be aware that there can be multiple content streams for a page, so
> > you'll
> > need to check for that. But the PDF spec does not allow a BT in one
> > stream
> > to be closed by a ET in a different stream. So you should be able
> > to just
> > filter each stream individually.
> > 
> > Finally, for the text-only extraction, I'm pretty sure you will
> > need to
> > make sure you preserve any coordinate system operators outside of
> > the BT/ET
> > blocks. It's been awhile since I've looked at this, so I might be
> > wrong
> > (i.e. it's possible that the text coordinate system is completely
> > independent of the regular coordinate system operators).
> > 
> > That should do it - you should plan on spending some time reading
> > the
> > coordinate system details in the PDF spec to figure out which
> > operators you
> > need to preserve.
> > 
> > K
> > 
> > 
> > On Sun, Jun 23, 2024, 3:47 AM PDF Developer
> > 
> > wrote:
> > 
> > > Hello,
> > > I have been asked to process a large number of PDF and, for
> > > reasons I
> > > can't go into, I need to separate the text from the graphics. I
> > > know I can
> > > create separate PDFs from the originals (using a variety of
> > > tools) but I
> > > prefer not to, mainly for speed reasons.
> > > So I thought it might be possible to use OCGs (aka Layers) for
> > > this.
> > > Parsing the PDPageContentStream in two buckets, one for text and
> > > the other
> > > for graphics.
> > > If this is feasible, does anyone know of any sample code that
> > > might be
> > > relevant that I could use to kick start things?
> > > Thanks in advance.
> > > PDFDev/
> > > 
> > 


-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: What is the replacement of org.apache.pdfbox.cos.COSDocument.getObjects in 3.x?

2025-06-18 Thread sahy...@fileaffairs.de

it's online

https://pdfbox.apache.org/3.0/migration.html#changes-when-needing-all-objects

BR
Maruan

Am Mittwoch, dem 18.06.2025 um 09:59 +0200 schrieb Tilman Hausherr:
> On 6/17/2025 10:31 PM, Jan Luehe wrote:
> > Perhaps this could be added to the migration guide
> > ?
> 
> Yes, good idea, just done, you can see it on
> 
> https://github.com/apache/pdfbox-docs/blob/master/content/3.0/migration.md
> 
> it will be in the website with the next documentation upload.
> 
> Tilman
> 
> 
> > 
> > On Tue, Jun 17, 2025 at 11:47 AM Jan Luehe 
> > wrote:
> > 
> > > Thank you, Tilman, for your quick response!
> > > 
> > > Jan
> > > 
> > > On Tue, Jun 17, 2025 at 11:38 AM Tilman Hausherr
> > > 
> > > wrote:
> > > 
> > > > I didn't find anything at first, but then I remembered that
> > > > the WriteDecodedDoc  utility looked at all objects...
> > > > 
> > > > old:
> > > > 
> > > > for (COSObject cosObject : doc.getDocument().getObjects())
> > > > 
> > > > new:
> > > > 
> > > > COSDocument cosDocument = doc.getDocument();
> > > > cosDocument.getXrefTable().keySet().stream().forEach(o ->
> > > > processObject(cosDocument.getObjectFromPool(o), skipImages));
> > > > 
> > > > So you use the keys from
> > > > cosDocument.getXrefTable().keySet()
> > > > and pass these to
> > > > cosDocument.getObjectFromPool()
> > > > 
> > > > Tilman
> > > > 
> > > > On 6/17/2025 8:27 PM, Jan Luehe wrote:
> > > > > What is the replacement of
> > > > > org.apache.pdfbox.cos.COSDocument.getObjects
> > > > (
> > > > https://javadoc.io/static/org.apache.pdfbox/pdfbox/2.0.34/org/apache/pdfbox/cos/COSDocument.html#getObjects
> > > > --
> > > > )
> > > > > in 3.x?
> > > > > 
> > > > > We have the following code which works with 2.x that we need
> > > > > to port to
> > > > 3.x:
> > > > > org.apache.pdfbox.pdmodel.PDDocument doc = ...;
> > > > > for (Iterator i =
> > > > > doc.getDocument().getObjects().iterator();
> > > > > i.hasNext();) {
> > > > >   COSBase base = i.next().getObject();
> > > > >   if (base instanceof COSStream) {
> > > > >   COSStream cosStream = (COSStream)base;
> > > > > ...
> > > > >   }
> > > > > }
> > > > > 
> > > > > In 3.x, I only see
> > > > > org.apache.pdfbox.cos.COSDocument.getObject.getObjectsByType.
> > > > > 
> > > > > How would we rewrite the above code to make it compile with
> > > > > 3.x? I don't
> > > > > see anything mentioned in
> > > > > https://pdfbox.apache.org/3.0/migration.html
> > > > > 
> > > > > Apologies if I am asking something very obvious, but I have
> > > > > never
> > > > worked on
> > > > > org.apache.pdfbox before ...
> > > > > 
> > > > > Thanks!
> > > > > 
> > > > 
> > > > ---
> > > > --
> > > > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > > > For additional commands, e-mail: users-h...@pdfbox.apache.org
> > > > 
> > > > 
> 
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: jbig2-imageio library version 3.0.5 does not seem to be available although it is listed as part of PDFBox release 3.0.5

2025-05-12 Thread sahy...@fileaffairs.de

@Tilman, @Andreas,

I've updated the site so that it no longer mentions jbig2 3.0.5.

BR
Maruan


Am Montag, dem 12.05.2025 um 14:07 +0200 schrieb Tilman Hausherr:
> I updated the html file but I'm unable upload the file => Maruan
> please
> 
> Tilman
> 
> On 12.05.2025 13:44, Andreas Lehmkühler wrote:
> > Hi,
> > 
> > I accidentally updated the version of jbig2 lib when releasing
> > PDFBox 3.0.5.
> > There is no new Version of jbig2, 3.0.4 is still the most recent
> > version.
> > 
> > Sorry for the confusion, my bad.
> > 
> > @Tilman or @Maruan
> > Please fix the website if you have some cycles, I am not available
> > at the moment. Thanks in advance.
> > 
> > Andreas
> > 
> 
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Same line calculation of PDFTextStripper

2025-05-13 Thread sahy...@fileaffairs.de

There is no public test suite other than files attached to Jira
tickets. There is a suite of some 1 files which we use for
regression tests prior to new releases but also when doing larger
changes but that can not be shared due to data privacy, licensing ...
(it has been public but with new acts like GDPR we removed public
access)

BR
Maruan

Am Montag, dem 12.05.2025 um 12:37 -0700 schrieb Kevin Day:
> Are there test files that exercise the superscript/subscript
> correction
> that the non-transitive comparator is supposed to address?  And is
> there
> some way that I can get access to the test suite that includes 2991? 
> I can
> copy the file down from the Jira ticket, but I hate to do a ton of
> development without being able to run proper tests...
> 
> This is becoming a pretty big issue for us (we now have hundreds of
> files
> where the sorted text extraction is producing very bad results), so
> I'm
> going to do some work on it in the next couple of weeks. 
> Unfortunately,
> this is probably not going to be a surgical change (i.e. a change to
> the sort algorithm).
> 
> I would like to make sure that any algorithm I come up with doesn't
> introduce big regressions.
> 
> One challenge I am seeing is that all of this logic depends heavily
> on
> whether two text positions are in the same word or not, so it will
> almost
> certainly become necessary to do word clustering and line break
> determination as part of the "sort" (not after the fact like things
> are
> now).  Whatever I come up with, it is probably going to be a clean
> implementation, which is always risky without lots of unit tests.
> 
> Thanks,
> 
> K
> 
> Kevin Day
> 
> *trumpet**p| *480.961.6003 x1002
> *e| *ke...@trumpetinc.com
> *www.trumpetinc.com  | *LinkedIn
> 
> 
> On Thu, Apr 10, 2025, 6:35 AM Kevin Day  wrote:
> 
> > I had one other thought on this.
> > 
> > Without question, the ordering of the TextPositions after the JRE
> > sort
> > completes is not consistent with the comparator. It should be easy
> > to just
> > loop the sorted TPs and check to ensure the comparator always
> > returns <=0.
> > 
> > I'm wondering if the slower fallback sort would not have this
> > problem. If
> > not, then it may be faster to do the JRE sort as a pre-sort, then
> > just
> > always call the fallback sort. The hope would be that if elements
> > were
> > already "mostly" sorted, then the fallback sort would be efficient.
> > 
> > I have no idea if that will actually be the case, but it may be
> > something
> > to investigate.  The superscript/subscript algorithm I described in
> > my last
> > post is effectively a merge sort...
> > 
> > Another possibility would be to use the JRE sort without
> > subscript/superscript handling. Then use the fallback sort with
> > subscript/superscript handling.
> > 
> > K
> > 
> > Kevin Day
> > 
> > *trumpet**p| *480.961.6003 x1002
> > *e| *ke...@trumpetinc.com
> > *www.trumpetinc.com  | *LinkedIn
> > 
> > 
> > On Wed, Apr 9, 2025, 2:03 PM Kevin Day 
> > wrote:
> > 
> > > Thank you for directing me to the discussion.  This is pretty
> > > much what I
> > > expected (the reason for the fuzzy logic is superscript/subscript
> > > handling).
> > > 
> > > I am pretty confident that the problem is not with the
> > > comparator.  The
> > > problem is that we are trying to use a simple sort algorithm to
> > > do
> > > something that is not simple.
> > > 
> > > I think that a more robust approach would be something like this:
> > > 
> > > Group TextPositons by direction (note: Current TextStripper
> > > doesn't do
> > > this, which causes a ton of problems for extractions that have
> > > watermarks,
> > > etc...)
> > > Group TextPositions by Y-position.  Because these are floats, we
> > > will
> > > need to do some amount of rounding.  So I think we use
> > > Math.floor() to make
> > > the measurement a whole number
> > > Sort each group by X-position
> > > 
> > > This gives us a baseline layout.  Now, we need to go through and
> > > identify
> > > superscript/subscript candidates and merge them into the
> > > appropriate line.
> > > This is not trivial, but it's a *lot* more deterministic than
> > > trying to
> > > tune the comparator.
> > > 
> > > 
> > > Here is a rough idea for a superscript/subscript merge algorithm
> > > (this
> > > would be applied to each "direction" group):
> > > 
> > > For superscripts:
> > > 
> > > Start at the first Y-position-group.  Scan each TextPosition and
> > > see if
> > > it is a superscript candidate to the next Y position group.  If
> > > it is a
> > > candidate, remove the TextPosition from the current Y-position-
> > > group and
> > > merge it into the correct location in the *next* Y-position-
> > > group.
> > > Repeat for the next Y-position-group.
> > > 
> > > The net effect should be that all superscript candidates have
> > > been p

Re: How to work around this error?

2025-06-03 Thread sahy...@fileaffairs.de

Hello 


Am Dienstag, dem 03.06.2025 um 17:47 +0200 schrieb Ulf Dittmer:
> Hello-
> 
> When trying to fill in values into a form, I'm encountering the error
> shown
> below, which means nothing to me. It's not just for the first field
> it
> encounters - if I comment that out, it happens for the next field.
> 
> The form (an official government PDF) doesn't seem different from any
> number of other ones we are successfully filling out; it can be found
> at
> https://ulfdittmer.com/pdfbox/Guide_BW.pdf

is that the original form or after setting the font?

BR
Maruan

> 
> The source code is also similar to other codes we use for other
> forms:
> https://ulfdittmer.com/pdfbox/StreamQRApplForm.java
> 
> It uses the following font, which is also standard
> https://ulfdittmer.com/pdfbox/Arial.ttf
> 
> Are we doing something wrong? Is the PDF weird? Is there a way around
> this
> issue?
> 
> Many thanks in advance for any help you can supply.
> 
> Ulf
> 
> Jun 03, 2025 5:21:00 PM org.apache.pdfbox.pdfparser.BaseParser
> parseCOSDictionaryNameValuePair
> WARNING: Empty COSName at offset 940
> Exception in thread "main" java.io.IOException: Could not process
> default
> appearance string ' 13 0 g' for field 'Name ggf Geburtsname': Missing
> operands for set non stroking color operator [COSInt{13}, COSInt{0}]
> at
> org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.
> (AppearanceGeneratorHelper.java:123)
> at
> org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppea
> rances(PDTextField.java:261)
> at
> org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField.applyChang
> e(PDTerminalField.java:209)
> at
> org.apache.pdfbox.pdmodel.interactive.form.PDTextField.setValue(PDTex
> tField.java:218)
> at StreamQRApplForm.setField(StreamQRApplForm.java:67)
> at StreamQRApplForm.main(StreamQRApplForm.java:28)
> Caused by: java.io.IOException: Missing operands for set non stroking
> color
> operator [COSInt{13}, COSInt{0}]
> at
> org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.
> processSetFontColor(PDDefaultAppearanceString.java:202)
> at
> org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.
> processOperator(PDDefaultAppearanceString.java:133)
> at
> org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.
> processAppearanceStringOperators(PDDefaultAppearanceString.java:105)
> at
> org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.
> (PDDefaultAppearanceString.java:87)
> at
> org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.getDefaultA
> ppearanceString(PDVariableText.java:105)
> at
> org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.
> (AppearanceGeneratorHelper.java:117)
> ... 5 more

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: How to work around this error?

2025-06-03 Thread sahy...@fileaffairs.de

Hello Ulf,

a typical default appearance string looks like this

/Helv 12 Tf 0 g

There are two operators Tf, to set the font, and g to set the color
with the arguments for the operators being supplied before. 

Helv 12 being the arguments for Tf and
0  being the argument for g 

Your form has / 13 0 g

So 13 0 are being treated as arguments for the g operator. But there
are no colors with 2 arguments. It's either 1 (Greyscale) , 3 (RGB) or
4 (CMYK) arguments.

So IMHO the default apperace string is wrong.

It's also very unusual that it doesn't set a font.

Workaround would be to set the default apperance string similar to the
first sample above.

BR
Maruan 





Am Dienstag, dem 03.06.2025 um 21:36 +0200 schrieb
sahy...@fileaffairs.de:
> Hello 
> 
> 
> Am Dienstag, dem 03.06.2025 um 17:47 +0200 schrieb Ulf Dittmer:
> > Hello-
> > 
> > When trying to fill in values into a form, I'm encountering the
> > error
> > shown
> > below, which means nothing to me. It's not just for the first field
> > it
> > encounters - if I comment that out, it happens for the next field.
> > 
> > The form (an official government PDF) doesn't seem different from
> > any
> > number of other ones we are successfully filling out; it can be
> > found
> > at
> > https://ulfdittmer.com/pdfbox/Guide_BW.pdf
> 
> is that the original form or after setting the font?
> 
> BR
> Maruan
> 
> > 
> > The source code is also similar to other codes we use for other
> > forms:
> > https://ulfdittmer.com/pdfbox/StreamQRApplForm.java
> > 
> > It uses the following font, which is also standard
> > https://ulfdittmer.com/pdfbox/Arial.ttf
> > 
> > Are we doing something wrong? Is the PDF weird? Is there a way
> > around
> > this
> > issue?
> > 
> > Many thanks in advance for any help you can supply.
> > 
> > Ulf
> > 
> > Jun 03, 2025 5:21:00 PM org.apache.pdfbox.pdfparser.BaseParser
> > parseCOSDictionaryNameValuePair
> > WARNING: Empty COSName at offset 940
> > Exception in thread "main" java.io.IOException: Could not process
> > default
> > appearance string ' 13 0 g' for field 'Name ggf Geburtsname':
> > Missing
> > operands for set non stroking color operator [COSInt{13},
> > COSInt{0}]
> > at
> > org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelpe
> > r.
> > (AppearanceGeneratorHelper.java:123)
> > at
> > org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructApp
> > ea
> > rances(PDTextField.java:261)
> > at
> > org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField.applyCha
> > ng
> > e(PDTerminalField.java:209)
> > at
> > org.apache.pdfbox.pdmodel.interactive.form.PDTextField.setValue(PDT
> > ex
> > tField.java:218)
> > at StreamQRApplForm.setField(StreamQRApplForm.java:67)
> > at StreamQRApplForm.main(StreamQRApplForm.java:28)
> > Caused by: java.io.IOException: Missing operands for set non
> > stroking
> > color
> > operator [COSInt{13}, COSInt{0}]
> > at
> > org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceStrin
> > g.
> > processSetFontColor(PDDefaultAppearanceString.java:202)
> > at
> > org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceStrin
> > g.
> > processOperator(PDDefaultAppearanceString.java:133)
> > at
> > org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceStrin
> > g.
> > processAppearanceStringOperators(PDDefaultAppearanceString.java:105
> > )
> > at
> > org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceStrin
> > g.
> > (PDDefaultAppearanceString.java:87)
> > at
> > org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.getDefaul
> > tA
> > ppearanceString(PDVariableText.java:105)
> > at
> > org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelpe
> > r.
> > (AppearanceGeneratorHelper.java:117)
> > ... 5 more
> 
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: How to work around this error?

2025-06-04 Thread sahy...@fileaffairs.de

Your welcome. Good that you found a solution.

We could handle the wrong color setting in pdfbox and use defaults
instead of throwing but I'm not a big fan of that as different users of
the lib might use different defaults. 

BR
Maruan

Am Mittwoch, dem 04.06.2025 um 09:54 +0200 schrieb Ulf Dittmer:
> Thank you, that put me on the right track. I had no idea about the
> inner
> workings of appearances, and your explanation taught me enough about
> it to
> substitute something in case a field has a weird one - kudos!
> 
> I'm guessing the form is mostly printed out, and filled in by hand,
> so that
> this issue hasn't surfaced before.
> 
> Cheers!
> Ulf
> 
> On Tue, Jun 3, 2025 at 10:27 PM sahy...@fileaffairs.de <
> sahy...@fileaffairs.de> wrote:
> 
> > Hello Ulf,
> > 
> > a typical default appearance string looks like this
> > 
> > /Helv 12 Tf 0 g
> > 
> > There are two operators Tf, to set the font, and g to set the color
> > with the arguments for the operators being supplied before.
> > 
> > Helv 12 being the arguments for Tf and
> > 0  being the argument for g
> > 
> > Your form has / 13 0 g
> > 
> > So 13 0 are being treated as arguments for the g operator. But
> > there
> > are no colors with 2 arguments. It's either 1 (Greyscale) , 3 (RGB)
> > or
> > 4 (CMYK) arguments.
> > 
> > So IMHO the default apperace string is wrong.
> > 
> > It's also very unusual that it doesn't set a font.
> > 
> > Workaround would be to set the default apperance string similar to
> > the
> > first sample above.
> > 
> > BR
> > Maruan
> > 
> > 
> > 
> > 
> > 
> > Am Dienstag, dem 03.06.2025 um 21:36 +0200 schrieb
> > sahy...@fileaffairs.de:
> > > Hello
> > > 
> > > 
> > > Am Dienstag, dem 03.06.2025 um 17:47 +0200 schrieb Ulf Dittmer:
> > > > Hello-
> > > > 
> > > > When trying to fill in values into a form, I'm encountering the
> > > > error
> > > > shown
> > > > below, which means nothing to me. It's not just for the first
> > > > field
> > > > it
> > > > encounters - if I comment that out, it happens for the next
> > > > field.
> > > > 
> > > > The form (an official government PDF) doesn't seem different
> > > > from
> > > > any
> > > > number of other ones we are successfully filling out; it can be
> > > > found
> > > > at
> > > > https://ulfdittmer.com/pdfbox/Guide_BW.pdf
> > > 
> > > is that the original form or after setting the font?
> > > 
> > > BR
> > > Maruan
> > > 
> > > > 
> > > > The source code is also similar to other codes we use for other
> > > > forms:
> > > > https://ulfdittmer.com/pdfbox/StreamQRApplForm.java
> > > > 
> > > > It uses the following font, which is also standard
> > > > https://ulfdittmer.com/pdfbox/Arial.ttf
> > > > 
> > > > Are we doing something wrong? Is the PDF weird? Is there a way
> > > > around
> > > > this
> > > > issue?
> > > > 
> > > > Many thanks in advance for any help you can supply.
> > > > 
> > > > Ulf
> > > > 
> > > > Jun 03, 2025 5:21:00 PM org.apache.pdfbox.pdfparser.BaseParser
> > > > parseCOSDictionaryNameValuePair
> > > > WARNING: Empty COSName at offset 940
> > > > Exception in thread "main" java.io.IOException: Could not
> > > > process
> > > > default
> > > > appearance string ' 13 0 g' for field 'Name ggf Geburtsname':
> > > > Missing
> > > > operands for set non stroking color operator [COSInt{13},
> > > > COSInt{0}]
> > > > at
> > > > org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorH
> > > > elpe
> > > > r.
> > > > (AppearanceGeneratorHelper.java:123)
> > > > at
> > > > org.apache.pdfbox.pdmodel.interactive.form.PDTextField.construc
> > > > tApp
> > > > ea
> > > > rances(PDTextField.java:261)
> > > > at
> > > > org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField.appl
> > > > yCha
> > > > ng
> > > > e(PDTerminalField.java:209)
> > > > at
> > > > org.apache.pdfbox.pdmodel.intera

Re: How to work around this error?

2025-06-04 Thread sahy...@fileaffairs.de

Hello Marc,

was my thought too that 13 is the font size. Just for the records that
would be something like 

/Helv 13 Tf 0 g and not /Helv Tf 13 0 g

BR
Maruan 

Am Mittwoch, dem 04.06.2025 um 09:34 -0700 schrieb Marc Kaufman:
> I would point out that the problem with this appearance setting is
> not 
> the color, but the missing Font specification.
> 
> / 13 0 g vs / Helv Tf 13 0 g. The 13 being the font size.
> 
> Marc
> 
> On 6/4/2025 1:09 AM, sahy...@fileaffairs.de wrote:
> > Your welcome. Good that you found a solution.
> > 
> > We could handle the wrong color setting in pdfbox and use defaults
> > instead of throwing but I'm not a big fan of that as different
> > users of
> > the lib might use different defaults.
> > 
> > BR
> > Maruan
> > 
> > Am Mittwoch, dem 04.06.2025 um 09:54 +0200 schrieb Ulf Dittmer:
> > > Thank you, that put me on the right track. I had no idea about
> > > the
> > > inner
> > > workings of appearances, and your explanation taught me enough
> > > about
> > > it to
> > > substitute something in case a field has a weird one - kudos!
> > > 
> > > I'm guessing the form is mostly printed out, and filled in by
> > > hand,
> > > so that
> > > this issue hasn't surfaced before.
> > > 
> > > Cheers!
> > > Ulf
> > > 
> > > On Tue, Jun 3, 2025 at 10:27 pmsahy...@fileaffairs.de <
> > > sahy...@fileaffairs.de> wrote:
> > > 
> > > > Hello Ulf,
> > > > 
> > > > a typical default appearance string looks like this
> > > > 
> > > > /Helv 12 Tf 0 g
> > > > 
> > > > There are two operators Tf, to set the font, and g to set the
> > > > color
> > > > with the arguments for the operators being supplied before.
> > > > 
> > > > Helv 12 being the arguments for Tf and
> > > > 0  being the argument for g
> > > > 
> > > > Your form has / 13 0 g
> > > > 
> > > > So 13 0 are being treated as arguments for the g operator. But
> > > > there
> > > > are no colors with 2 arguments. It's either 1 (Greyscale) , 3
> > > > (RGB)
> > > > or
> > > > 4 (CMYK) arguments.
> > > > 
> > > > So IMHO the default apperace string is wrong.
> > > > 
> > > > It's also very unusual that it doesn't set a font.
> > > > 
> > > > Workaround would be to set the default apperance string similar
> > > > to
> > > > the
> > > > first sample above.
> > > > 
> > > > BR
> > > > Maruan

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Another PDF with non-behaving form fields

2025-06-05 Thread sahy...@fileaffairs.de

Hello Ulf,

which pdfbox version are you using?

BR
Maruan


Am Donnerstag, dem 05.06.2025 um 10:08 +0200 schrieb Ulf Dittmer:
> Weird, macOS Preview considers them as fields, making it possible to
> enter data.
> 
> But happily, entering blanks, and then saving it (using macOS
> Preview), makes them proper fields, and then PrintFields can see
> those fields. I'll try that first next time :-)
> 
> Thanks for looking at it!
> 
> Ulf
> 
> On Thu, Jun 5, 2025 at 5:54 AM Tilman Hausherr
>  wrote:
> >  
> >  
> > These fields are not in the acroform field list, that is why. This
> > is the acroform field list:
> >  
> > 
> >  
> >   
> > 
> >  
> >  
> > The page 2 "fields" only exist as annotations:
> >  
> > 
> >  
> >   
> > 
> >  
> >  
> > Something must have gone wrong when generating this PDF.
> >  
> > 
> >  
> >  
> > Tilman
> >  
> > 
> >  
> >  
> > On 6/4/2025 7:20 PM, Ulf Dittmer wrote:
> >  
> >  
> > >  
> > > This one is https://ulfdittmer.com/pdfbox/Guide_HE.pdf
> > > 
> > > 
> > > I can use PDFBox to fill in the fields on page 1, but not some of
> > > those on
> > > page 2 - they're not changed. Specifically, "Wohnanschrift im
> > > Ausland:
> > > Staat" and "Adresse" as well as "E-Mail-Adresse" and
> > > "Telefonnummer"
> > > 
> > > The PDFBox Debugger shows those form fields, but the PrintFields
> > > example
> > > code (
> > > https://github.com/apache/pdfbox/blob/trunk/examples/src/main/java/org/apache/pdfbox/examples/interactive/form/PrintFields.java
> > > )
> > > does not. Those 4 that I mentioned are not in its output, and
> > > using the
> > > field names from PDFDebugger does not work.
> > > 
> > > Am I missing something, or is there more going on?
> > > 
> > >  
> >  
> > 
> >  
> >  

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Another PDF with non-behaving form fields

2025-06-05 Thread sahy...@fileaffairs.de

Hello Ulf,

pdfbox does apply some fix ups automatically when doing .getAcroForm().
One is to check for widgets which doen't have a field entry as is the
case in your form and create them.

You can look up the code in AcroFormDefaultFixup.

The way it's currently done is that this only kicks in if there are no
fields at all, which is not the case in your form.

But you can create your own fix up and call
.getAcroForm(PDDocumentFixup acroFormFixup)

In this fix up you can reuse AcroFormOrphanWidgetsProcessor which
creates field entries from widgets.

I havn't tested that with your form but when we created this mechanism
we had in min dthat this should enable users of the lib to create their
own handling reusing some of the code we supply or creating from
scratch.

BR
Maruan 

Am Donnerstag, dem 05.06.2025 um 10:45 +0200 schrieb Ulf Dittmer:
> 3.0.5
> 
> On Thu, Jun 5, 2025 at 10:40 AM sahy...@fileaffairs.de <
> sahy...@fileaffairs.de> wrote:
> 
> > Hello Ulf,
> > 
> > which pdfbox version are you using?
> > 
> > BR
> > Maruan
> > 
> > 
> > 

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Forms with non-unique field names

2025-08-12 Thread sahy...@fileaffairs.de

Very interesting that such forms are being circulated without testing
forms filling. One shpuld have picked the issues prior to publishing
the form.

Fingers crossed for the other form.

BR
Maruan

Am Dienstag, dem 12.08.2025 um 17:29 +0200 schrieb Ulf Dittmer:
> Thanks for the ideas. For that document I found that Acrobat Pro can
> rename
> fields, which then leads to them being stored separately, if a field
> was
> used more than once.
> 
> But I have another one to work with, which seems to have another set
> of
> issues. We'll see :-)
> 
> Thanks again for looking at it so quickly!
> 
> On Tue, 12 Aug 2025, 15:33 sahy...@fileaffairs.de,
> 
> wrote:
> 
> > Hi,
> > 
> > I also had a look. A somewhat simpler approach which should work,
> > at
> > least for the fields I've looked at, is to add a T entry to the
> > widgets
> > which don't have a T entry in the COSDictionary. This would - after
> > reloading - treat them as fields.
> > 
> > The approach Tilman suggested is more complete and gives you more
> > control.
> > 
> > BR
> > Maruan
> > 
> > Am Dienstag, dem 12.08.2025 um 15:14 +0200 schrieb Tilman Hausherr:
> > > Am 12.08.2025 um 14:50 schrieb Ulf Dittmer:
> > > > For OBJ2 that makes sense, as it is the same info on both
> > > > pages.
> > > > But
> > > > filling in any of OBJ4, OBJ9 or OBJ10 (to name just a few),
> > > > that
> > > > data
> > > > appears on both page 1 and 3, in fields that have nothing to do
> > > > with one
> > > > another.
> > > 
> > > OBJ4 is also on several pages (which is allowed). I opened the
> > > PDF in
> > > Adobe and entered my first name, and it then appeared on page 3
> > > at
> > > the
> > > "wrong" place, but that's the problem of whoever created (or
> > > altered)
> > > the PDF.
> > > 
> > > I think what you really want is to consider "Is there a way
> > > within
> > > the
> > > PDFBox API to dis-ambiguate those fields" as an isolated
> > > question,
> > > i.e.
> > > create a new field for each of the extra widgets.
> > > 
> > > Yes it would be possible. You'd have to create a new
> > > COSDictionary,
> > > copy
> > > all the key/values (except kids, except T and AP), then create a
> > > new
> > > PDField from that dictionary, add one of the widgets (and delete
> > > it
> > > from
> > > the original field), calculate a new "T" value (field name).
> > > 
> > > I don't know if there is a commercial tool for this. It can
> > > probably
> > > be
> > > done with PDFBox in less than a day. I might help for free but
> > > I'd
> > > prefer you try first.
> > > 
> > > Tilman
> > > 
> > > 
> > > > 
> > > > org.apache.pdfbox.examples.interactive.form.PrintFields only
> > > > lists
> > > > those
> > > > fields once, but they do appear to be used on multiple pages.
> > > > 
> > > > Ulf
> > > > 
> > > > On Tue, Aug 12, 2025 at 2:41 PM Tilman
> > > > Hausherr
> > > > wrote:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > I don't see how these field names are double. Some of the
> > > > > fields
> > > > > have
> > > > > several widgets, e.g. OBJ2 is on page 1 and page 3. This is
> > > > > done
> > > > > to have
> > > > > the content on several pages.
> > > > > 
> > > > > Tilman
> > > > > 
> > > > > Am 12.08.2025 um 14:14 schrieb Ulf Dittmer:
> > > > > > Hello-
> > > > > > 
> > > > > > I'm encountering PDFs with forms that have non-unique field
> > > > > > names.
> > > > > > Sometimes fields with the same names are used for the same
> > > > > > information (a
> > > > > > useful scenario, making filling them out programmatically
> > > > > > easier). But
> > > > > > sometimes the same names are used for entirely different
> > > > > > field
> > > > > > purposes.
> > > > > > 
> > > > > > Is there a way within the PDFBox API to dis-ambiguate those

Re: Forms with non-unique field names

2025-08-12 Thread sahy...@fileaffairs.de

Hi,

I also had a look. A somewhat simpler approach which should work, at
least for the fields I've looked at, is to add a T entry to the widgets
which don't have a T entry in the COSDictionary. This would - after
reloading - treat them as fields.

The approach Tilman suggested is more complete and gives you more
control.

BR
Maruan

Am Dienstag, dem 12.08.2025 um 15:14 +0200 schrieb Tilman Hausherr:
> Am 12.08.2025 um 14:50 schrieb Ulf Dittmer:
> > For OBJ2 that makes sense, as it is the same info on both pages.
> > But
> > filling in any of OBJ4, OBJ9 or OBJ10 (to name just a few), that
> > data
> > appears on both page 1 and 3, in fields that have nothing to do
> > with one
> > another.
> 
> OBJ4 is also on several pages (which is allowed). I opened the PDF in
> Adobe and entered my first name, and it then appeared on page 3 at
> the 
> "wrong" place, but that's the problem of whoever created (or altered)
> the PDF.
> 
> I think what you really want is to consider "Is there a way within
> the 
> PDFBox API to dis-ambiguate those fields" as an isolated question,
> i.e. 
> create a new field for each of the extra widgets.
> 
> Yes it would be possible. You'd have to create a new COSDictionary,
> copy 
> all the key/values (except kids, except T and AP), then create a new 
> PDField from that dictionary, add one of the widgets (and delete it
> from 
> the original field), calculate a new "T" value (field name).
> 
> I don't know if there is a commercial tool for this. It can probably
> be 
> done with PDFBox in less than a day. I might help for free but I'd 
> prefer you try first.
> 
> Tilman
> 
> 
> > 
> > org.apache.pdfbox.examples.interactive.form.PrintFields only lists
> > those
> > fields once, but they do appear to be used on multiple pages.
> > 
> > Ulf
> > 
> > On Tue, Aug 12, 2025 at 2:41 PM Tilman
> > Hausherr
> > wrote:
> > 
> > > Hi,
> > > 
> > > I don't see how these field names are double. Some of the fields
> > > have
> > > several widgets, e.g. OBJ2 is on page 1 and page 3. This is done
> > > to have
> > > the content on several pages.
> > > 
> > > Tilman
> > > 
> > > Am 12.08.2025 um 14:14 schrieb Ulf Dittmer:
> > > > Hello-
> > > > 
> > > > I'm encountering PDFs with forms that have non-unique field
> > > > names.
> > > > Sometimes fields with the same names are used for the same
> > > > information (a
> > > > useful scenario, making filling them out programmatically
> > > > easier). But
> > > > sometimes the same names are used for entirely different field
> > > > purposes.
> > > > 
> > > > Is there a way within the PDFBox API to dis-ambiguate those
> > > > fields? Or
> > > are
> > > > there tools that can do this (we do have a budget, so payware
> > > > would be
> > > OK,
> > > > within limits)?
> > > > 
> > > > These are government PDFs, so we don't control their creation.
> > > > But if
> > > there
> > > > is a way to edit them that addresses this, that would also work
> > > > for us -
> > > > the forms do not change frequently.
> > > > 
> > > > http://ulfdittmer.com/Guide_TH.pdf is an example of such a PDF.
> > > > OBJ3,
> > > OBJ4,
> > > > OBJ10, OBJ18 and OBJ24 are field names that are used twice.
> > > > 
> > > > Any help would be appreciated.
> > > > 
> > > > Ulf
> > > > 
> > > 
> > > -
> > > 
> > > To unsubscribe, e-mail:users-unsubscr...@pdfbox.apache.org
> > > For additional commands, e-mail:users-h...@pdfbox.apache.org
> > > 
> > > 

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

66 matches

Mail list logo