[
https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17665545#comment-17665545
]
Tika User commented on TIKA-3952:
-
Got it. Thanks
> Content mismatch
> -
>
>
[
https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656740#comment-17656740
]
Tilman Hausherr commented on TIKA-3952:
---
This online OCR page has the same error: https://ocr.space/
[
https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656736#comment-17656736
]
Tilman Hausherr commented on TIKA-3952:
---
You are doing OCR or it's the wrong file. The attached file
[
https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656062#comment-17656062
]
Tika User commented on TIKA-3952:
-
We are not doing any OCR for this. Simple native file and getting all
[
https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656063#comment-17656063
]
Tika User commented on TIKA-3952:
-
FYI. I attached PDF file for your reference.
> Content mismatch
>
[
https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656060#comment-17656060
]
Nick Burch commented on TIKA-3952:
--
Is the PDF a scan? Are you doing OCR?
> Content mismatch
>
[
https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656059#comment-17656059
]
Tika User commented on TIKA-3952:
-
[~nick] I ran this command :
java -jar pdfbox-app.2.0.27.jar
[
https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656049#comment-17656049
]
Nick Burch commented on TIKA-3952:
--
Can you try following the steps in