[GitHub] [tika] THausherr merged pull request #891: Bump aws.version from 1.12.381 to 1.12.382

2023-01-10 Thread GitBox


THausherr merged PR #891:
URL: https://github.com/apache/tika/pull/891


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] THausherr merged pull request #892: Bump reactor-core from 3.5.1 to 3.5.2

2023-01-10 Thread GitBox


THausherr merged PR #892:
URL: https://github.com/apache/tika/pull/892


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] THausherr merged pull request #889: Bump reactor.netty.version from 1.1.1 to 1.1.2

2023-01-10 Thread GitBox


THausherr merged PR #889:
URL: https://github.com/apache/tika/pull/889


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] THausherr merged pull request #890: Bump junit5.version from 5.9.1 to 5.9.2

2023-01-10 Thread GitBox


THausherr merged PR #890:
URL: https://github.com/apache/tika/pull/890


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] dependabot[bot] opened a new pull request, #892: Bump reactor-core from 3.5.1 to 3.5.2

2023-01-10 Thread GitBox


dependabot[bot] opened a new pull request, #892:
URL: https://github.com/apache/tika/pull/892

   Bumps [reactor-core](https://github.com/reactor/reactor-core) from 3.5.1 to 
3.5.2.
   
   Release notes
   Sourced from https://github.com/reactor/reactor-core/releases;>reactor-core's 
releases.
   
   v3.5.2
   
   Reactor-Core 3.5.2 is part of 2022.0.2 
Release Train.
   What's Changed
   :lady_beetle: Bug fixes
   
   Add reflection hints for native-image support 
by https://github.com/violetagg;>@​violetagg in https://github-redirect.dependabot.com/reactor/reactor-core/issues/3325;>#3325
   
   Full Changelog: https://github.com/reactor/reactor-core/compare/v3.5.1...v3.5.2;>https://github.com/reactor/reactor-core/compare/v3.5.1...v3.5.2
   
   
   
   Commits
   
   https://github.com/reactor/reactor-core/commit/bc4d516bafcaa55ad2dcf015c940825a100eb1ae;>bc4d516
 [release] Prepare and release 3.5.2
   https://github.com/reactor/reactor-core/commit/9d47a04b3accfed745f8a5e7acaaf3b1b66d67e8;>9d47a04
 Add reflection hints for native-image support (https://github-redirect.dependabot.com/reactor/reactor-core/issues/3325;>#3325)
   https://github.com/reactor/reactor-core/commit/13f88e4d05febe7026ee1a0b57bb41ac1a7983c9;>13f88e4
 [release] Next development version 3.5.2-SNAPSHOT
   See full diff in https://github.com/reactor/reactor-core/compare/v3.5.1...v3.5.2;>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=io.projectreactor:reactor-core=maven=3.5.1=3.5.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] dependabot[bot] opened a new pull request, #891: Bump aws.version from 1.12.381 to 1.12.382

2023-01-10 Thread GitBox


dependabot[bot] opened a new pull request, #891:
URL: https://github.com/apache/tika/pull/891

   Bumps `aws.version` from 1.12.381 to 1.12.382.
   Updates `aws-java-sdk-s3` from 1.12.381 to 1.12.382
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-s3's
 changelog.
   
   1.12.382 2023-01-10
   Amazon Location Service
   
   
   Features
   
   This release adds support for two new route travel models, Bicycle and 
Motorcycle which can be used with Grab data source.
   
   
   
   Amazon Relational Database Service
   
   
   Features
   
   This release adds support for configuring allocated storage on the 
CreateDBInstanceReadReplica, RestoreDBInstanceFromDBSnapshot, and 
RestoreDBInstanceToPointInTime APIs.
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/e3db37bf59199e8d63326cbbe2fd134cf96564c4;>e3db37b
 AWS SDK for Java 1.12.382
   https://github.com/aws/aws-sdk-java/commit/96663a8a5ccb24727f030ef5eba1b86e2543c725;>96663a8
 Update GitHub version number to 1.12.382-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.381...1.12.382;>compare 
view
   
   
   
   
   Updates `aws-java-sdk-transcribe` from 1.12.381 to 1.12.382
   
   Changelog
   Sourced from https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md;>aws-java-sdk-transcribe's
 changelog.
   
   1.12.382 2023-01-10
   Amazon Location Service
   
   
   Features
   
   This release adds support for two new route travel models, Bicycle and 
Motorcycle which can be used with Grab data source.
   
   
   
   Amazon Relational Database Service
   
   
   Features
   
   This release adds support for configuring allocated storage on the 
CreateDBInstanceReadReplica, RestoreDBInstanceFromDBSnapshot, and 
RestoreDBInstanceToPointInTime APIs.
   
   
   
   
   
   
   Commits
   
   https://github.com/aws/aws-sdk-java/commit/e3db37bf59199e8d63326cbbe2fd134cf96564c4;>e3db37b
 AWS SDK for Java 1.12.382
   https://github.com/aws/aws-sdk-java/commit/96663a8a5ccb24727f030ef5eba1b86e2543c725;>96663a8
 Update GitHub version number to 1.12.382-SNAPSHOT
   See full diff in https://github.com/aws/aws-sdk-java/compare/1.12.381...1.12.382;>compare 
view
   
   
   
   
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [tika] dependabot[bot] opened a new pull request, #890: Bump junit5.version from 5.9.1 to 5.9.2

2023-01-10 Thread GitBox


dependabot[bot] opened a new pull request, #890:
URL: https://github.com/apache/tika/pull/890

   Bumps `junit5.version` from 5.9.1 to 5.9.2.
   Updates `junit-bom` from 5.9.1 to 5.9.2
   
   Release notes
   Sourced from https://github.com/junit-team/junit5/releases;>junit-bom's 
releases.
   
   JUnit 5.9.2 = Platform 1.9.2 + Jupiter 5.9.2 + Vintage 5.9.2
   See http://junit.org/junit5/docs/5.9.2/release-notes/;>Release 
Notes.
   
   
   
   Commits
   
   https://github.com/junit-team/junit5/commit/8ed3c66c7eb20b835cf92f50a7bf8830838c462e;>8ed3c66
 Release 5.9.2
   https://github.com/junit-team/junit5/commit/742f99fcce6d8b8fbd38c7f541c55bdda771e220;>742f99f
 Prepare 5.9.2 release notes
   https://github.com/junit-team/junit5/commit/a9a3cf5fb75ad9adf8c197224981226db8f41181;>a9a3cf5
 Fix bug and polish contribution
   https://github.com/junit-team/junit5/commit/825ea38857bff2dcbc200c6ceb7972dbc89482b0;>825ea38
 Introduce new @MethodSource syntax to differentiate overloaded 
local factor...
   https://github.com/junit-team/junit5/commit/0c40f5ef057c90a8d4b2249dd9a0b6e289426424;>0c40f5e
 Polish Javadoc
   https://github.com/junit-team/junit5/commit/7d54016421d611a13db8196ea9625dfe1d9036c8;>7d54016
 Update codecov-action
   https://github.com/junit-team/junit5/commit/bfeeac4d4142a3680737626a0ccdb9a708cabb2a;>bfeeac4
 Remove duplicate copyright comment
   https://github.com/junit-team/junit5/commit/b0d9083315426b69d2bf38153987cdeb83460257;>b0d9083
 Format integration test projects with Spotless as well
   https://github.com/junit-team/junit5/commit/c4ed325cb2ff825ecd0bda3870cf8444bdf646f0;>c4ed325
 Update copyright
   https://github.com/junit-team/junit5/commit/0e3a1d32e504c63a09dbafc8c1926b66df417774;>0e3a1d3
 Update upload-artifact action
   Additional commits viewable in https://github.com/junit-team/junit5/compare/r5.9.1...r5.9.2;>compare 
view
   
   
   
   
   Updates `junit-jupiter-api` from 5.9.1 to 5.9.2
   
   Release notes
   Sourced from https://github.com/junit-team/junit5/releases;>junit-jupiter-api's 
releases.
   
   JUnit 5.9.2 = Platform 1.9.2 + Jupiter 5.9.2 + Vintage 5.9.2
   See http://junit.org/junit5/docs/5.9.2/release-notes/;>Release 
Notes.
   
   
   
   Commits
   
   https://github.com/junit-team/junit5/commit/8ed3c66c7eb20b835cf92f50a7bf8830838c462e;>8ed3c66
 Release 5.9.2
   https://github.com/junit-team/junit5/commit/742f99fcce6d8b8fbd38c7f541c55bdda771e220;>742f99f
 Prepare 5.9.2 release notes
   https://github.com/junit-team/junit5/commit/a9a3cf5fb75ad9adf8c197224981226db8f41181;>a9a3cf5
 Fix bug and polish contribution
   https://github.com/junit-team/junit5/commit/825ea38857bff2dcbc200c6ceb7972dbc89482b0;>825ea38
 Introduce new @MethodSource syntax to differentiate overloaded 
local factor...
   https://github.com/junit-team/junit5/commit/0c40f5ef057c90a8d4b2249dd9a0b6e289426424;>0c40f5e
 Polish Javadoc
   https://github.com/junit-team/junit5/commit/7d54016421d611a13db8196ea9625dfe1d9036c8;>7d54016
 Update codecov-action
   https://github.com/junit-team/junit5/commit/bfeeac4d4142a3680737626a0ccdb9a708cabb2a;>bfeeac4
 Remove duplicate copyright comment
   https://github.com/junit-team/junit5/commit/b0d9083315426b69d2bf38153987cdeb83460257;>b0d9083
 Format integration test projects with Spotless as well
   https://github.com/junit-team/junit5/commit/c4ed325cb2ff825ecd0bda3870cf8444bdf646f0;>c4ed325
 Update copyright
   https://github.com/junit-team/junit5/commit/0e3a1d32e504c63a09dbafc8c1926b66df417774;>0e3a1d3
 Update upload-artifact action
   Additional commits viewable in https://github.com/junit-team/junit5/compare/r5.9.1...r5.9.2;>compare 
view
   
   
   
   
   Updates `junit-jupiter-engine` from 5.9.1 to 5.9.2
   
   Release notes
   Sourced from https://github.com/junit-team/junit5/releases;>junit-jupiter-engine's 
releases.
   
   JUnit 5.9.2 = Platform 1.9.2 + Jupiter 5.9.2 + Vintage 5.9.2
   See http://junit.org/junit5/docs/5.9.2/release-notes/;>Release 
Notes.
   
   
   
   Commits
   
   https://github.com/junit-team/junit5/commit/8ed3c66c7eb20b835cf92f50a7bf8830838c462e;>8ed3c66
 Release 5.9.2
   https://github.com/junit-team/junit5/commit/742f99fcce6d8b8fbd38c7f541c55bdda771e220;>742f99f
 Prepare 5.9.2 release notes
   https://github.com/junit-team/junit5/commit/a9a3cf5fb75ad9adf8c197224981226db8f41181;>a9a3cf5
 Fix bug and polish contribution
   https://github.com/junit-team/junit5/commit/825ea38857bff2dcbc200c6ceb7972dbc89482b0;>825ea38
 Introduce new @MethodSource syntax to differentiate overloaded 
local factor...
   https://github.com/junit-team/junit5/commit/0c40f5ef057c90a8d4b2249dd9a0b6e289426424;>0c40f5e
 Polish Javadoc
   https://github.com/junit-team/junit5/commit/7d54016421d611a13db8196ea9625dfe1d9036c8;>7d54016
 Update codecov-action
   https://github.com/junit-team/junit5/commit/bfeeac4d4142a3680737626a0ccdb9a708cabb2a;>bfeeac4
 Remove duplicate copyright comment
   

[GitHub] [tika] dependabot[bot] opened a new pull request, #889: Bump reactor.netty.version from 1.1.1 to 1.1.2

2023-01-10 Thread GitBox


dependabot[bot] opened a new pull request, #889:
URL: https://github.com/apache/tika/pull/889

   Bumps `reactor.netty.version` from 1.1.1 to 1.1.2.
   Updates `reactor-netty-core` from 1.1.1 to 1.1.2
   
   Release notes
   Sourced from https://github.com/reactor/reactor-netty/releases;>reactor-netty-core's 
releases.
   
   v1.1.2
   
   Reactor Netty 1.1.2 is part of 
2022.0.2 Release Train.
   This is a recommended update for all Reactor Netty 1.1.x 
users.
   What's Changed
   :sparkles: New features and improvements
   
   Depend on Reactor Core v3.5.2 by https://github.com/OlegDokuka;>@​OlegDokuka in 
e5b22740efe8b1335b823d27398c2774c752e196, see https://github.com/reactor/reactor-core/releases/tag/v3.5.2;>release 
notes.
   Depend on Netty QUIC Codec v0.0.35.Final by https://github.com/violetagg;>@​violetagg in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2612;>#2612
   Improve logging when connection closed after cancel by https://github.com/pderop;>@​pderop in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2585;>#2585
   Support configurable DNS resolver cache by https://github.com/samueldlightfoot;>@​samueldlightfoot 
in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2622;>#2622
   
   :lady_beetle: Bug fixes
   
   Ensure CL/TE headers are handled correctly for 
204/205/304 status codes when server 
configured with HTTP/2 by https://github.com/violetagg;>@​violetagg in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2623;>#2623
   Ensure HttpServer Active Connections metric is 
correct when connection closure before response write by https://github.com/violetagg;>@​violetagg in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2633;>#2633
   
   :book: Documentation, Tests and Build
   
   Documentation:
   
   Update ReactorNetty#IO_SELECT_COUNT javadoc by https://github.com/violetagg;>@​violetagg in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2626;>#2626
   Fix typos in comments of HttpClientConnect by https://github.com/sgc109;>@​sgc109 in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2627;>#2627
   Update ReadMe to have consistent formatting by https://github.com/esivakumar18;>@​esivakumar18 in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2632;>#2632
   Add Reactor Netty version of Telnet example by https://github.com/jchenga;>@​jchenga in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2617;>#2617
   
   
   Build:
   
   Use new GitHub Actions API for setting output by https://github.com/violetagg;>@​violetagg in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2635;>#2635
   
   
   
   :up: Dependency Upgrades
   
   Bump mockito-core to version 4.11.0 by https://github.com/dependabot;>@​dependabot in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2629;>#2629
   Bump build-info-extractor-gradle to version 
4.31.0 by https://github.com/dependabot;>@​dependabot in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2630;>#2630
   Bump com.diffplug.spotless to version 6.12.1 
by https://github.com/dependabot;>@​dependabot in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2631;>#2631
   Bump assertj-core to version 3.24.1 by https://github.com/dependabot;>@​dependabot in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2637;>#2637
   Bump netty-tcnative-boringssl-static to version 
2.0.55.Final by https://github.com/violetagg;>@​violetagg in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2638;>#2638
   
   New Contributors
   
   https://github.com/sgc109;>@​sgc109 made their 
first contribution in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2627;>#2627
   https://github.com/esivakumar18;>@​esivakumar18 made 
their first contribution in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2632;>#2632
   https://github.com/jchenga;>@​jchenga made 
their first contribution in https://github-redirect.dependabot.com/reactor/reactor-netty/issues/2617;>#2617
   
   Full Changelog: https://github.com/reactor/reactor-netty/compare/v1.1.1...v1.1.2;>https://github.com/reactor/reactor-netty/compare/v1.1.1...v1.1.2
   
   
   
   Commits
   
   https://github.com/reactor/reactor-netty/commit/e5b22740efe8b1335b823d27398c2774c752e196;>e5b2274
 [release] Prepare and release 1.1.2
   https://github.com/reactor/reactor-netty/commit/ed8714d7d99bb62de9b83546fb8e65046c73b508;>ed8714d
 Merge-ignore release 1.0.27 into 1.1.2
   https://github.com/reactor/reactor-netty/commit/f781949e7b4664d68d932871788da0218da2f720;>f781949
 [release] Back to snapshots, next is 1.0.28-SNAPSHOT
   https://github.com/reactor/reactor-netty/commit/018693b99736a84f43b1516f2d79336925347dac;>018693b
 [release] Prepare and release 1.0.27
   

[jira] [Commented] (TIKA-3952) Content mismatch

2023-01-10 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656740#comment-17656740
 ] 

Tilman Hausherr commented on TIKA-3952:
---

This online OCR page has the same error: https://ocr.space/ (use engine3)

> Content mismatch 
> -
>
> Key: TIKA-3952
> URL: https://issues.apache.org/jira/browse/TIKA-3952
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Tika User
>Priority: Major
> Attachments: download.pdf
>
>
> While extracting content of attached file. We are seeing below content 
> mismatch.
> Native file content  : 95 (1972); Erznoznik v. City of Jacksonville
> Content we got from Tika : 95 (1972); Er{*}e{*}noznik v. City of Jacksonville
>  
> Native file content   : 438 U.S.\n726
> Content we got from Tika : 438 {*}U-S{*}.\n726



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3952) Content mismatch

2023-01-10 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656736#comment-17656736
 ] 

Tilman Hausherr commented on TIKA-3952:
---

You are doing OCR or it's the wrong file. The attached file does not have any 
text, only a bitmap.

> Content mismatch 
> -
>
> Key: TIKA-3952
> URL: https://issues.apache.org/jira/browse/TIKA-3952
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Tika User
>Priority: Major
> Attachments: download.pdf
>
>
> While extracting content of attached file. We are seeing below content 
> mismatch.
> Native file content  : 95 (1972); Erznoznik v. City of Jacksonville
> Content we got from Tika : 95 (1972); Er{*}e{*}noznik v. City of Jacksonville
>  
> Native file content   : 438 U.S.\n726
> Content we got from Tika : 438 {*}U-S{*}.\n726



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


NER parsers package for Tika Server

2023-01-10 Thread Julien Massiera
Hi Tim,

 

First, I would like to wish you all the best for 2023 ! 

 

I am writing because I am using NER parsers with Tika Server, but to do so,
I had to build the NER package myself from the Tika repository. Indeed, for
Tika Server 2.x, I did not find any NER pre-made package to add to the
classpath to use NER parsers (like the scientific-package or the
sqlite3-package). Is it normal ? It would be great to have at least a
documentation explaining how to do for people wanting to use those parsers
with Tika Server 2.x, and to add the package to the downloadable content.

 

Best regards,

Julien 

 



[jira] [Comment Edited] (TIKA-3952) Content mismatch

2023-01-10 Thread Tika User (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656063#comment-17656063
 ] 

Tika User edited comment on TIKA-3952 at 1/10/23 12:43 PM:
---

[~nick]  FYI. I attached PDF file for your reference.


was (Author: vamsi452):
FYI. I attached PDF file for your reference.

> Content mismatch 
> -
>
> Key: TIKA-3952
> URL: https://issues.apache.org/jira/browse/TIKA-3952
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Tika User
>Priority: Major
> Attachments: download.pdf
>
>
> While extracting content of attached file. We are seeing below content 
> mismatch.
> Native file content  : 95 (1972); Erznoznik v. City of Jacksonville
> Content we got from Tika : 95 (1972); Er{*}e{*}noznik v. City of Jacksonville
>  
> Native file content   : 438 U.S.\n726
> Content we got from Tika : 438 {*}U-S{*}.\n726



--
This message was sent by Atlassian Jira
(v8.20.10#820010)