subject:"\[jira\] \[Commented\] \(PDFBOX\-4800\) Parsing of numbers does not always terminate at actual end of number"

[jira] [Commented] (PDFBOX-4800) Parsing of numbers does not always terminate at actual end of number

2020-03-24 Thread Eckhart Pedersen (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065361#comment-17065361
 ] 

Eckhart Pedersen commented on PDFBOX-4800:
--

Ok, sounds good :)

> Parsing of numbers does not always terminate at actual end of number
> 
>
> Key: PDFBOX-4800
> URL: https://issues.apache.org/jira/browse/PDFBOX-4800
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.12, 2.0.15, 2.0.19
>Reporter: Eckhart Pedersen
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.20, 3.0.0 PDFBox
>
> Attachments: 1584634522723.txt, demobank_case_error_doc1.pdf, 
> demobank_case_ok_doc1.pdf
>
>
> *Short description:*
>  The method *readStringNumber* in *BaseParser.java* reads more characters 
> than desired when parsing numbers in certain documents. We have internally 
> fixed the issue by adding the following line ({color:#de350b}marked with 
> red{color}):
> {color:#505f79}while( (lastByte = seqSource.read() ) != _ASCII_SPACE_ ** 
> &&{color}
>  {color:#505f79}     lastByte != _ASCII_LF_ ** &&{color}
>  {color:#505f79}     lastByte != _ASCII_CR_ ** &&{color}
>  {color:#505f79}     lastByte != 60 && _//see sourceforge bug 
> 1714707_{color}
>  {color:#505f79}     __     lastByte != '[' && _// 
> PDFBOX-1845_{color}
>  {color:#505f79}     __     lastByte != '(' && _// 
> PDFBOX-2579_{color}
>  {color:#505f79}     __     lastByte != 0 && _//See 
> sourceforge bug 853328_{color}
>  {color:#de350b}     __     *lastByte != '/' &&*{color}
>  {color:#505f79}    lastByte != -1 ){color}
>  {color:#505f79}     {{color}
> *Background:*
>  Our customer ran into an issue with certain documents that were converted to 
> PDF/A2 format with Qoppa jPDFPreflight 
> ([https://www.qoppa.com/pdfpreflight/]). In some instances pdfbox would 
> afterwards fail to open the document.
> (It is possible that the Qoppa conversion tool does something wrong and that 
> the resulting PDF is invalid somehow, but all other tools seem to open the 
> converted documents without any problems. We are not PDF experts, so this is 
> difficult for us to judge. If you determine that the problematic PDF document 
> is incorrect somehow, please notify us so that we can create a bug report at 
> Qoppa also.)
> I am attaching both an original version of the document (which pdfbox can 
> open just fine) and the converted version (which pdfbox cannot parse 
> correctly).
> *Additional information*
> **My colleague refers to ISO 32000-1 section 7.2.2 which describes all valid 
> white-space and delimiter characters for PDF.
> According to the list of delimiter/white-space characters the following 
> characters should also be handled in the readStringNumber method: '%','\{', 
> ')', ']', '}', '>' , FORM FEED, and HORIZONTAL TAB.
> Though again, as we are not experts on the PDF standard we recommend that you 
> check the mentioned standard documents yourself and determine what kind of 
> solution you want to implement (if any).
> *Final Note:*
> We are filing this bug report in the hope that you find it helpful. I have 
> tried to include all relevant information as well as I can, if you have 
> further questions, I would be happy to address them as well as I can.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4800) Parsing of numbers does not always terminate at actual end of number

2020-03-23 Thread Jira



[ 
https://issues.apache.org/jira/browse/PDFBOX-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064937#comment-17064937
 ] 

Andreas Lehmkühler commented on PDFBOX-4800:


[~cryptomathic_epe] that piece of code doesn't parse all kind of numbers, it is 
limited to object numbers and offsets which are positive numbers. As long as I 
didn't miss anything we are safe here

> Parsing of numbers does not always terminate at actual end of number
> 
>
> Key: PDFBOX-4800
> URL: https://issues.apache.org/jira/browse/PDFBOX-4800
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.12, 2.0.15, 2.0.19
>Reporter: Eckhart Pedersen
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.20, 3.0.0 PDFBox
>
> Attachments: 1584634522723.txt, demobank_case_error_doc1.pdf, 
> demobank_case_ok_doc1.pdf
>
>
> *Short description:*
>  The method *readStringNumber* in *BaseParser.java* reads more characters 
> than desired when parsing numbers in certain documents. We have internally 
> fixed the issue by adding the following line ({color:#de350b}marked with 
> red{color}):
> {color:#505f79}while( (lastByte = seqSource.read() ) != _ASCII_SPACE_ ** 
> &&{color}
>  {color:#505f79}     lastByte != _ASCII_LF_ ** &&{color}
>  {color:#505f79}     lastByte != _ASCII_CR_ ** &&{color}
>  {color:#505f79}     lastByte != 60 && _//see sourceforge bug 
> 1714707_{color}
>  {color:#505f79}     __     lastByte != '[' && _// 
> PDFBOX-1845_{color}
>  {color:#505f79}     __     lastByte != '(' && _// 
> PDFBOX-2579_{color}
>  {color:#505f79}     __     lastByte != 0 && _//See 
> sourceforge bug 853328_{color}
>  {color:#de350b}     __     *lastByte != '/' &&*{color}
>  {color:#505f79}    lastByte != -1 ){color}
>  {color:#505f79}     {{color}
> *Background:*
>  Our customer ran into an issue with certain documents that were converted to 
> PDF/A2 format with Qoppa jPDFPreflight 
> ([https://www.qoppa.com/pdfpreflight/]). In some instances pdfbox would 
> afterwards fail to open the document.
> (It is possible that the Qoppa conversion tool does something wrong and that 
> the resulting PDF is invalid somehow, but all other tools seem to open the 
> converted documents without any problems. We are not PDF experts, so this is 
> difficult for us to judge. If you determine that the problematic PDF document 
> is incorrect somehow, please notify us so that we can create a bug report at 
> Qoppa also.)
> I am attaching both an original version of the document (which pdfbox can 
> open just fine) and the converted version (which pdfbox cannot parse 
> correctly).
> *Additional information*
> **My colleague refers to ISO 32000-1 section 7.2.2 which describes all valid 
> white-space and delimiter characters for PDF.
> According to the list of delimiter/white-space characters the following 
> characters should also be handled in the readStringNumber method: '%','\{', 
> ')', ']', '}', '>' , FORM FEED, and HORIZONTAL TAB.
> Though again, as we are not experts on the PDF standard we recommend that you 
> check the mentioned standard documents yourself and determine what kind of 
> solution you want to implement (if any).
> *Final Note:*
> We are filing this bug report in the hope that you find it helpful. I have 
> tried to include all relevant information as well as I can, if you have 
> further questions, I would be happy to address them as well as I can.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4800) Parsing of numbers does not always terminate at actual end of number

2020-03-23 Thread Eckhart Pedersen (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064610#comment-17064610
 ] 

Eckhart Pedersen commented on PDFBOX-4800:
--

[~lehmi] 

Does your fix include the "-" sign as possible first character? Should it? We 
were not entirely familiar with how numbers can be formatted in PDF documents 
so we decided not to implement the fix as you described. But if you know that 
this is a safe thing to do, then that sounds good :)

I am happy to have helped you with this bug report, I appreciate the efforts 
you guys and other open source projects put in and it's the very least we can 
do to give back a bit.

> Parsing of numbers does not always terminate at actual end of number
> 
>
> Key: PDFBOX-4800
> URL: https://issues.apache.org/jira/browse/PDFBOX-4800
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.12, 2.0.15, 2.0.19
>Reporter: Eckhart Pedersen
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.20, 3.0.0 PDFBox
>
> Attachments: 1584634522723.txt, demobank_case_error_doc1.pdf, 
> demobank_case_ok_doc1.pdf
>
>
> *Short description:*
>  The method *readStringNumber* in *BaseParser.java* reads more characters 
> than desired when parsing numbers in certain documents. We have internally 
> fixed the issue by adding the following line ({color:#de350b}marked with 
> red{color}):
> {color:#505f79}while( (lastByte = seqSource.read() ) != _ASCII_SPACE_ ** 
> &&{color}
>  {color:#505f79}     lastByte != _ASCII_LF_ ** &&{color}
>  {color:#505f79}     lastByte != _ASCII_CR_ ** &&{color}
>  {color:#505f79}     lastByte != 60 && _//see sourceforge bug 
> 1714707_{color}
>  {color:#505f79}     __     lastByte != '[' && _// 
> PDFBOX-1845_{color}
>  {color:#505f79}     __     lastByte != '(' && _// 
> PDFBOX-2579_{color}
>  {color:#505f79}     __     lastByte != 0 && _//See 
> sourceforge bug 853328_{color}
>  {color:#de350b}     __     *lastByte != '/' &&*{color}
>  {color:#505f79}    lastByte != -1 ){color}
>  {color:#505f79}     {{color}
> *Background:*
>  Our customer ran into an issue with certain documents that were converted to 
> PDF/A2 format with Qoppa jPDFPreflight 
> ([https://www.qoppa.com/pdfpreflight/]). In some instances pdfbox would 
> afterwards fail to open the document.
> (It is possible that the Qoppa conversion tool does something wrong and that 
> the resulting PDF is invalid somehow, but all other tools seem to open the 
> converted documents without any problems. We are not PDF experts, so this is 
> difficult for us to judge. If you determine that the problematic PDF document 
> is incorrect somehow, please notify us so that we can create a bug report at 
> Qoppa also.)
> I am attaching both an original version of the document (which pdfbox can 
> open just fine) and the converted version (which pdfbox cannot parse 
> correctly).
> *Additional information*
> **My colleague refers to ISO 32000-1 section 7.2.2 which describes all valid 
> white-space and delimiter characters for PDF.
> According to the list of delimiter/white-space characters the following 
> characters should also be handled in the readStringNumber method: '%','\{', 
> ')', ']', '}', '>' , FORM FEED, and HORIZONTAL TAB.
> Though again, as we are not experts on the PDF standard we recommend that you 
> check the mentioned standard documents yourself and determine what kind of 
> solution you want to implement (if any).
> *Final Note:*
> We are filing this bug report in the hope that you find it helpful. I have 
> tried to include all relevant information as well as I can, if you have 
> further questions, I would be happy to address them as well as I can.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4800) Parsing of numbers does not always terminate at actual end of number

2020-03-21 Thread Jira



[ 
https://issues.apache.org/jira/browse/PDFBOX-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17063905#comment-17063905
 ] 

Andreas Lehmkühler commented on PDFBOX-4800:


I've fixed the issue by simply making all characters but digits a valid 
delimiter.

[~cryptomathic_epe] Your bug report was very helpful especially as it was that 
detailed. Such reports are very much appreciated as any other as well providing 
all information we might need :-)

> Parsing of numbers does not always terminate at actual end of number
> 
>
> Key: PDFBOX-4800
> URL: https://issues.apache.org/jira/browse/PDFBOX-4800
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.12, 2.0.15, 2.0.19
>Reporter: Eckhart Pedersen
>Assignee: Andreas Lehmkühler
>Priority: Major
> Attachments: 1584634522723.txt, demobank_case_error_doc1.pdf, 
> demobank_case_ok_doc1.pdf
>
>
> *Short description:*
>  The method *readStringNumber* in *BaseParser.java* reads more characters 
> than desired when parsing numbers in certain documents. We have internally 
> fixed the issue by adding the following line ({color:#de350b}marked with 
> red{color}):
> {color:#505f79}while( (lastByte = seqSource.read() ) != _ASCII_SPACE_ ** 
> &&{color}
>  {color:#505f79}     lastByte != _ASCII_LF_ ** &&{color}
>  {color:#505f79}     lastByte != _ASCII_CR_ ** &&{color}
>  {color:#505f79}     lastByte != 60 && _//see sourceforge bug 
> 1714707_{color}
>  {color:#505f79}     __     lastByte != '[' && _// 
> PDFBOX-1845_{color}
>  {color:#505f79}     __     lastByte != '(' && _// 
> PDFBOX-2579_{color}
>  {color:#505f79}     __     lastByte != 0 && _//See 
> sourceforge bug 853328_{color}
>  {color:#de350b}     __     *lastByte != '/' &&*{color}
>  {color:#505f79}    lastByte != -1 ){color}
>  {color:#505f79}     {{color}
> *Background:*
>  Our customer ran into an issue with certain documents that were converted to 
> PDF/A2 format with Qoppa jPDFPreflight 
> ([https://www.qoppa.com/pdfpreflight/]). In some instances pdfbox would 
> afterwards fail to open the document.
> (It is possible that the Qoppa conversion tool does something wrong and that 
> the resulting PDF is invalid somehow, but all other tools seem to open the 
> converted documents without any problems. We are not PDF experts, so this is 
> difficult for us to judge. If you determine that the problematic PDF document 
> is incorrect somehow, please notify us so that we can create a bug report at 
> Qoppa also.)
> I am attaching both an original version of the document (which pdfbox can 
> open just fine) and the converted version (which pdfbox cannot parse 
> correctly).
> *Additional information*
> **My colleague refers to ISO 32000-1 section 7.2.2 which describes all valid 
> white-space and delimiter characters for PDF.
> According to the list of delimiter/white-space characters the following 
> characters should also be handled in the readStringNumber method: '%','\{', 
> ')', ']', '}', '>' , FORM FEED, and HORIZONTAL TAB.
> Though again, as we are not experts on the PDF standard we recommend that you 
> check the mentioned standard documents yourself and determine what kind of 
> solution you want to implement (if any).
> *Final Note:*
> We are filing this bug report in the hope that you find it helpful. I have 
> tried to include all relevant information as well as I can, if you have 
> further questions, I would be happy to address them as well as I can.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4800) Parsing of numbers does not always terminate at actual end of number

2020-03-21 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17063901#comment-17063901
 ] 

ASF subversion and git services commented on PDFBOX-4800:
-

Commit 1875492 from le...@apache.org in branch 'pdfbox/branches/issue45'
[ https://svn.apache.org/r1875492 ]

PDFBOX-4800: use any character but a digit as delimiter when parsing a number

> Parsing of numbers does not always terminate at actual end of number
> 
>
> Key: PDFBOX-4800
> URL: https://issues.apache.org/jira/browse/PDFBOX-4800
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.12, 2.0.15, 2.0.19
>Reporter: Eckhart Pedersen
>Assignee: Andreas Lehmkühler
>Priority: Major
> Attachments: 1584634522723.txt, demobank_case_error_doc1.pdf, 
> demobank_case_ok_doc1.pdf
>
>
> *Short description:*
>  The method *readStringNumber* in *BaseParser.java* reads more characters 
> than desired when parsing numbers in certain documents. We have internally 
> fixed the issue by adding the following line ({color:#de350b}marked with 
> red{color}):
> {color:#505f79}while( (lastByte = seqSource.read() ) != _ASCII_SPACE_ ** 
> &&{color}
>  {color:#505f79}     lastByte != _ASCII_LF_ ** &&{color}
>  {color:#505f79}     lastByte != _ASCII_CR_ ** &&{color}
>  {color:#505f79}     lastByte != 60 && _//see sourceforge bug 
> 1714707_{color}
>  {color:#505f79}     __     lastByte != '[' && _// 
> PDFBOX-1845_{color}
>  {color:#505f79}     __     lastByte != '(' && _// 
> PDFBOX-2579_{color}
>  {color:#505f79}     __     lastByte != 0 && _//See 
> sourceforge bug 853328_{color}
>  {color:#de350b}     __     *lastByte != '/' &&*{color}
>  {color:#505f79}    lastByte != -1 ){color}
>  {color:#505f79}     {{color}
> *Background:*
>  Our customer ran into an issue with certain documents that were converted to 
> PDF/A2 format with Qoppa jPDFPreflight 
> ([https://www.qoppa.com/pdfpreflight/]). In some instances pdfbox would 
> afterwards fail to open the document.
> (It is possible that the Qoppa conversion tool does something wrong and that 
> the resulting PDF is invalid somehow, but all other tools seem to open the 
> converted documents without any problems. We are not PDF experts, so this is 
> difficult for us to judge. If you determine that the problematic PDF document 
> is incorrect somehow, please notify us so that we can create a bug report at 
> Qoppa also.)
> I am attaching both an original version of the document (which pdfbox can 
> open just fine) and the converted version (which pdfbox cannot parse 
> correctly).
> *Additional information*
> **My colleague refers to ISO 32000-1 section 7.2.2 which describes all valid 
> white-space and delimiter characters for PDF.
> According to the list of delimiter/white-space characters the following 
> characters should also be handled in the readStringNumber method: '%','\{', 
> ')', ']', '}', '>' , FORM FEED, and HORIZONTAL TAB.
> Though again, as we are not experts on the PDF standard we recommend that you 
> check the mentioned standard documents yourself and determine what kind of 
> solution you want to implement (if any).
> *Final Note:*
> We are filing this bug report in the hope that you find it helpful. I have 
> tried to include all relevant information as well as I can, if you have 
> further questions, I would be happy to address them as well as I can.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4800) Parsing of numbers does not always terminate at actual end of number

2020-03-21 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17063898#comment-17063898
 ] 

ASF subversion and git services commented on PDFBOX-4800:
-

Commit 1875490 from le...@apache.org in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1875490 ]

PDFBOX-4800: use any character but a digit as delimiter when parsing a number

> Parsing of numbers does not always terminate at actual end of number
> 
>
> Key: PDFBOX-4800
> URL: https://issues.apache.org/jira/browse/PDFBOX-4800
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.12, 2.0.15, 2.0.19
>Reporter: Eckhart Pedersen
>Assignee: Andreas Lehmkühler
>Priority: Major
> Attachments: 1584634522723.txt, demobank_case_error_doc1.pdf, 
> demobank_case_ok_doc1.pdf
>
>
> *Short description:*
>  The method *readStringNumber* in *BaseParser.java* reads more characters 
> than desired when parsing numbers in certain documents. We have internally 
> fixed the issue by adding the following line ({color:#de350b}marked with 
> red{color}):
> {color:#505f79}while( (lastByte = seqSource.read() ) != _ASCII_SPACE_ ** 
> &&{color}
>  {color:#505f79}     lastByte != _ASCII_LF_ ** &&{color}
>  {color:#505f79}     lastByte != _ASCII_CR_ ** &&{color}
>  {color:#505f79}     lastByte != 60 && _//see sourceforge bug 
> 1714707_{color}
>  {color:#505f79}     __     lastByte != '[' && _// 
> PDFBOX-1845_{color}
>  {color:#505f79}     __     lastByte != '(' && _// 
> PDFBOX-2579_{color}
>  {color:#505f79}     __     lastByte != 0 && _//See 
> sourceforge bug 853328_{color}
>  {color:#de350b}     __     *lastByte != '/' &&*{color}
>  {color:#505f79}    lastByte != -1 ){color}
>  {color:#505f79}     {{color}
> *Background:*
>  Our customer ran into an issue with certain documents that were converted to 
> PDF/A2 format with Qoppa jPDFPreflight 
> ([https://www.qoppa.com/pdfpreflight/]). In some instances pdfbox would 
> afterwards fail to open the document.
> (It is possible that the Qoppa conversion tool does something wrong and that 
> the resulting PDF is invalid somehow, but all other tools seem to open the 
> converted documents without any problems. We are not PDF experts, so this is 
> difficult for us to judge. If you determine that the problematic PDF document 
> is incorrect somehow, please notify us so that we can create a bug report at 
> Qoppa also.)
> I am attaching both an original version of the document (which pdfbox can 
> open just fine) and the converted version (which pdfbox cannot parse 
> correctly).
> *Additional information*
> **My colleague refers to ISO 32000-1 section 7.2.2 which describes all valid 
> white-space and delimiter characters for PDF.
> According to the list of delimiter/white-space characters the following 
> characters should also be handled in the readStringNumber method: '%','\{', 
> ')', ']', '}', '>' , FORM FEED, and HORIZONTAL TAB.
> Though again, as we are not experts on the PDF standard we recommend that you 
> check the mentioned standard documents yourself and determine what kind of 
> solution you want to implement (if any).
> *Final Note:*
> We are filing this bug report in the hope that you find it helpful. I have 
> tried to include all relevant information as well as I can, if you have 
> further questions, I would be happy to address them as well as I can.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4800) Parsing of numbers does not always terminate at actual end of number

2020-03-21 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/PDFBOX-4800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17063899#comment-17063899
 ] 

ASF subversion and git services commented on PDFBOX-4800:
-

Commit 1875491 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1875491 ]

PDFBOX-4800: use any character but a digit as delimiter when parsing a number

> Parsing of numbers does not always terminate at actual end of number
> 
>
> Key: PDFBOX-4800
> URL: https://issues.apache.org/jira/browse/PDFBOX-4800
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.12, 2.0.15, 2.0.19
>Reporter: Eckhart Pedersen
>Assignee: Andreas Lehmkühler
>Priority: Major
> Attachments: 1584634522723.txt, demobank_case_error_doc1.pdf, 
> demobank_case_ok_doc1.pdf
>
>
> *Short description:*
>  The method *readStringNumber* in *BaseParser.java* reads more characters 
> than desired when parsing numbers in certain documents. We have internally 
> fixed the issue by adding the following line ({color:#de350b}marked with 
> red{color}):
> {color:#505f79}while( (lastByte = seqSource.read() ) != _ASCII_SPACE_ ** 
> &&{color}
>  {color:#505f79}     lastByte != _ASCII_LF_ ** &&{color}
>  {color:#505f79}     lastByte != _ASCII_CR_ ** &&{color}
>  {color:#505f79}     lastByte != 60 && _//see sourceforge bug 
> 1714707_{color}
>  {color:#505f79}     __     lastByte != '[' && _// 
> PDFBOX-1845_{color}
>  {color:#505f79}     __     lastByte != '(' && _// 
> PDFBOX-2579_{color}
>  {color:#505f79}     __     lastByte != 0 && _//See 
> sourceforge bug 853328_{color}
>  {color:#de350b}     __     *lastByte != '/' &&*{color}
>  {color:#505f79}    lastByte != -1 ){color}
>  {color:#505f79}     {{color}
> *Background:*
>  Our customer ran into an issue with certain documents that were converted to 
> PDF/A2 format with Qoppa jPDFPreflight 
> ([https://www.qoppa.com/pdfpreflight/]). In some instances pdfbox would 
> afterwards fail to open the document.
> (It is possible that the Qoppa conversion tool does something wrong and that 
> the resulting PDF is invalid somehow, but all other tools seem to open the 
> converted documents without any problems. We are not PDF experts, so this is 
> difficult for us to judge. If you determine that the problematic PDF document 
> is incorrect somehow, please notify us so that we can create a bug report at 
> Qoppa also.)
> I am attaching both an original version of the document (which pdfbox can 
> open just fine) and the converted version (which pdfbox cannot parse 
> correctly).
> *Additional information*
> **My colleague refers to ISO 32000-1 section 7.2.2 which describes all valid 
> white-space and delimiter characters for PDF.
> According to the list of delimiter/white-space characters the following 
> characters should also be handled in the readStringNumber method: '%','\{', 
> ')', ']', '}', '>' , FORM FEED, and HORIZONTAL TAB.
> Though again, as we are not experts on the PDF standard we recommend that you 
> check the mentioned standard documents yourself and determine what kind of 
> solution you want to implement (if any).
> *Final Note:*
> We are filing this bug report in the hope that you find it helpful. I have 
> tried to include all relevant information as well as I can, if you have 
> further questions, I would be happy to address them as well as I can.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4800) Parsing of numbers does not always terminate at actual end of number

[jira] [Commented] (PDFBOX-4800) Parsing of numbers does not always terminate at actual end of number

[jira] [Commented] (PDFBOX-4800) Parsing of numbers does not always terminate at actual end of number

[jira] [Commented] (PDFBOX-4800) Parsing of numbers does not always terminate at actual end of number

[jira] [Commented] (PDFBOX-4800) Parsing of numbers does not always terminate at actual end of number

[jira] [Commented] (PDFBOX-4800) Parsing of numbers does not always terminate at actual end of number

[jira] [Commented] (PDFBOX-4800) Parsing of numbers does not always terminate at actual end of number

7 matches

Site Navigation

Mail list logo

Footer information