[GitHub] [tika] THausherr merged pull request #888: Bump aws.version from 1.12.380 to 1.12.381
THausherr merged PR #888: URL: https://github.com/apache/tika/pull/888 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tika] dependabot[bot] opened a new pull request, #888: Bump aws.version from 1.12.380 to 1.12.381
dependabot[bot] opened a new pull request, #888: URL: https://github.com/apache/tika/pull/888 Bumps `aws.version` from 1.12.380 to 1.12.381. Updates `aws-java-sdk-s3` from 1.12.380 to 1.12.381 Changelog Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-s3's changelog. 1.12.381 2023-01-09 AWS Network Firewall Features Network Firewall now supports the Suricata rule action reject, in addition to the actions pass, drop, and alert. AWS Resource Access Manager Features Enabled FIPS aws-us-gov endpoints in SDK. Amazon Elastic Container Registry Public Features This release for Amazon ECR Public makes several change to bring the SDK into sync with the API. Amazon Kendra Intelligent Ranking Features Introducing Amazon Kendra Intelligent Ranking, a new set of Kendra APIs that leverages Kendra semantic ranking capabilities to improve the quality of search results from other search services (i.e. OpenSearch, ElasticSearch, Solr). Amazon WorkSpaces Web Features This release adds support for a new portal authentication type: AWS IAM Identity Center (successor to AWS Single Sign-On). Commits https://github.com/aws/aws-sdk-java/commit/facc64566cc3e9c59ad472ccbe03da14cb1f0115;>facc645 AWS SDK for Java 1.12.381 https://github.com/aws/aws-sdk-java/commit/8095b21a9a6228a60bf264bf2e2be5a3994319ba;>8095b21 Update GitHub version number to 1.12.381-SNAPSHOT See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.380...1.12.381;>compare view Updates `aws-java-sdk-transcribe` from 1.12.380 to 1.12.381 Changelog Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-transcribe's changelog. 1.12.381 2023-01-09 AWS Network Firewall Features Network Firewall now supports the Suricata rule action reject, in addition to the actions pass, drop, and alert. AWS Resource Access Manager Features Enabled FIPS aws-us-gov endpoints in SDK. Amazon Elastic Container Registry Public Features This release for Amazon ECR Public makes several change to bring the SDK into sync with the API. Amazon Kendra Intelligent Ranking Features Introducing Amazon Kendra Intelligent Ranking, a new set of Kendra APIs that leverages Kendra semantic ranking capabilities to improve the quality of search results from other search services (i.e. OpenSearch, ElasticSearch, Solr). Amazon WorkSpaces Web Features This release adds support for a new portal authentication type: AWS IAM Identity Center (successor to AWS Single Sign-On). Commits https://github.com/aws/aws-sdk-java/commit/facc64566cc3e9c59ad472ccbe03da14cb1f0115;>facc645 AWS SDK for Java 1.12.381 https://github.com/aws/aws-sdk-java/commit/8095b21a9a6228a60bf264bf2e2be5a3994319ba;>8095b21 Update GitHub version number to 1.12.381-SNAPSHOT See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.380...1.12.381;>compare view Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the
[jira] [Commented] (TIKA-3952) Content mismatch
[ https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656062#comment-17656062 ] Tika User commented on TIKA-3952: - We are not doing any OCR for this. Simple native file and getting all metadata related to that document. > Content mismatch > - > > Key: TIKA-3952 > URL: https://issues.apache.org/jira/browse/TIKA-3952 > Project: Tika > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Tika User >Priority: Major > Attachments: download.pdf > > > While extracting content of attached file. We are seeing below content > mismatch. > Native file content : 95 (1972); Erznoznik v. City of Jacksonville > Content we got from Tika : 95 (1972); Er{*}e{*}noznik v. City of Jacksonville > > Native file content : 438 U.S.\n726 > Content we got from Tika : 438 {*}U-S{*}.\n726 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3952) Content mismatch
[ https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656063#comment-17656063 ] Tika User commented on TIKA-3952: - FYI. I attached PDF file for your reference. > Content mismatch > - > > Key: TIKA-3952 > URL: https://issues.apache.org/jira/browse/TIKA-3952 > Project: Tika > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Tika User >Priority: Major > Attachments: download.pdf > > > While extracting content of attached file. We are seeing below content > mismatch. > Native file content : 95 (1972); Erznoznik v. City of Jacksonville > Content we got from Tika : 95 (1972); Er{*}e{*}noznik v. City of Jacksonville > > Native file content : 438 U.S.\n726 > Content we got from Tika : 438 {*}U-S{*}.\n726 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3952) Content mismatch
[ https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656060#comment-17656060 ] Nick Burch commented on TIKA-3952: -- Is the PDF a scan? Are you doing OCR? > Content mismatch > - > > Key: TIKA-3952 > URL: https://issues.apache.org/jira/browse/TIKA-3952 > Project: Tika > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Tika User >Priority: Major > Attachments: download.pdf > > > While extracting content of attached file. We are seeing below content > mismatch. > Native file content : 95 (1972); Erznoznik v. City of Jacksonville > Content we got from Tika : 95 (1972); Er{*}e{*}noznik v. City of Jacksonville > > Native file content : 438 U.S.\n726 > Content we got from Tika : 438 {*}U-S{*}.\n726 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3952) Content mismatch
[ https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656059#comment-17656059 ] Tika User commented on TIKA-3952: - [~nick] I ran this command : java -jar pdfbox-app.2.0.27.jar ExtractText problematicPDF.pdf The txt file got created in same location but the file doesn't have any content in it. > Content mismatch > - > > Key: TIKA-3952 > URL: https://issues.apache.org/jira/browse/TIKA-3952 > Project: Tika > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Tika User >Priority: Major > Attachments: download.pdf > > > While extracting content of attached file. We are seeing below content > mismatch. > Native file content : 95 (1972); Erznoznik v. City of Jacksonville > Content we got from Tika : 95 (1972); Er{*}e{*}noznik v. City of Jacksonville > > Native file content : 438 U.S.\n726 > Content we got from Tika : 438 {*}U-S{*}.\n726 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-3952) Content mismatch
[ https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656049#comment-17656049 ] Nick Burch commented on TIKA-3952: -- Can you try following the steps in [https://cwiki.apache.org/confluence/display/TIKA/Troubleshooting+Tika#TroubleshootingTika-PDFTextProblems] ? > Content mismatch > - > > Key: TIKA-3952 > URL: https://issues.apache.org/jira/browse/TIKA-3952 > Project: Tika > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Tika User >Priority: Major > Attachments: download.pdf > > > While extracting content of attached file. We are seeing below content > mismatch. > Native file content : 95 (1972); Erznoznik v. City of Jacksonville > Content we got from Tika : 95 (1972); Er{*}e{*}noznik v. City of Jacksonville > > Native file content : 438 U.S.\n726 > Content we got from Tika : 438 {*}U-S{*}.\n726 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (TIKA-3952) Content mismatch
Tika User created TIKA-3952: --- Summary: Content mismatch Key: TIKA-3952 URL: https://issues.apache.org/jira/browse/TIKA-3952 Project: Tika Issue Type: Bug Affects Versions: 2.6.0 Reporter: Tika User Attachments: download.pdf While extracting content of attached file. We are seeing below content mismatch. Native file content : 95 (1972); Erznoznik v. City of Jacksonville Content we got from Tika : 95 (1972); Er{*}e{*}noznik v. City of Jacksonville Native file content : 438 U.S.\n726 Content we got from Tika : 438 {*}U-S{*}.\n726 -- This message was sent by Atlassian Jira (v8.20.10#820010)