Gregory Lepore created TIKA-4208:
Summary: OOM error in SAS7BDATParser
Key: TIKA-4208
URL: https://issues.apache.org/jira/browse/TIKA-4208
Project: Tika
Issue Type: Bug
Affects Versions
Tim Allison created TIKA-4207:
-
Summary: PipesParser should have option to extract raw bytes of
embedded files
Key: TIKA-4207
URL: https://issues.apache.org/jira/browse/TIKA-4207
Project: Tika
[
https://issues.apache.org/jira/browse/TIKA-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824496#comment-17824496
]
ASF GitHub Bot commented on TIKA-3353:
--
lewismc commented on PR #429:
URL: https://github.com/apache
[
https://issues.apache.org/jira/browse/TIKA-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824328#comment-17824328
]
ASF GitHub Bot commented on TIKA-3353:
--
Opa- commented on PR #429:
URL: https://github.com/apache
[
https://issues.apache.org/jira/browse/TIKA-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jan Høydahl closed TIKA-496.
Resolution: Won't Do
Closing as this is only a problem for the original TIKA langid which is
superceded
[
https://issues.apache.org/jira/browse/TIKA-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823740#comment-17823740
]
ASF GitHub Bot commented on TIKA-3353:
--
tballison commented on PR #429:
URL: https://github.com
[
https://issues.apache.org/jira/browse/TIKA-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823734#comment-17823734
]
ASF GitHub Bot commented on TIKA-3353:
--
Opa- commented on PR #429:
URL: https://github.com/apache
[
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823577#comment-17823577
]
Hudson commented on TIKA-4199:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1540 (See
Gregory Lepore created TIKA-4206:
Summary: Variation on Zip Bomb
Key: TIKA-4206
URL: https://issues.apache.org/jira/browse/TIKA-4206
Project: Tika
Issue Type: Bug
Affects Versions: 3.0.0
[
https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822078#comment-17822078
]
Hudson commented on TIKA-4166:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1535 (See
[
https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821773#comment-17821773
]
Hudson commented on TIKA-4202:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1533 (See
[
https://issues.apache.org/jira/browse/TIKA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821772#comment-17821772
]
Hudson commented on TIKA-4204:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1533 (See
[
https://issues.apache.org/jira/browse/TIKA-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821774#comment-17821774
]
Hudson commented on TIKA-4205:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1533 (See
[
https://issues.apache.org/jira/browse/TIKA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4204.
---
Fix Version/s: 2.9.2
3.0.0
Resolution: Fixed
> ChmExtractor una
[
https://issues.apache.org/jira/browse/TIKA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821735#comment-17821735
]
Tim Allison commented on TIKA-4204:
---
I just cherry-picked the fix(es) back to {{branch_2x
[
https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4202.
---
Fix Version/s: 3.0.0
Resolution: Fixed
> Add page count of OCR'd pages in metadata for
[
https://issues.apache.org/jira/browse/TIKA-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4205.
---
Fix Version/s: 3.0.0
Resolution: Fixed
> Add more columns to profiles table in tika-e
[
https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821706#comment-17821706
]
ASF GitHub Bot commented on TIKA-4202:
--
tballison merged PR #1630:
URL: https://github.com/apache
[
https://issues.apache.org/jira/browse/TIKA-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821705#comment-17821705
]
ASF GitHub Bot commented on TIKA-4205:
--
tballison merged PR #1629:
URL: https://github.com/apache
[
https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821703#comment-17821703
]
Tim Allison commented on TIKA-4202:
---
The most recent commit actually increments the counter. I've also
[
https://issues.apache.org/jira/browse/TIKA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821697#comment-17821697
]
Tim Allison commented on TIKA-4204:
---
Ugh. I accidentally pushed to main instead of a dev branch. Sorry
[
https://issues.apache.org/jira/browse/TIKA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821698#comment-17821698
]
Tim Allison commented on TIKA-4204:
---
[~bossymr] thank you for opening this issue, thoroughly diagnosing
[
https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821692#comment-17821692
]
ASF GitHub Bot commented on TIKA-4202:
--
tballison opened a new pull request, #1630:
URL: https
[
https://issues.apache.org/jira/browse/TIKA-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821690#comment-17821690
]
ASF GitHub Bot commented on TIKA-4205:
--
tballison opened a new pull request, #1629:
URL: https
Tim Allison created TIKA-4205:
-
Summary: Add more columns to profiles table in tika-eval Profile
mode
Key: TIKA-4205
URL: https://issues.apache.org/jira/browse/TIKA-4205
Project: Tika
Issue
[
https://issues.apache.org/jira/browse/TIKA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison reassigned TIKA-4204:
-
Assignee: Tim Allison
> ChmExtractor unable to decompress f
[
https://issues.apache.org/jira/browse/TIKA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Fromholz updated TIKA-4204:
--
Attachment: 3HAC050917_TRM_RAPID_RW_6-en.chm
Environment: The file I am trying to parse
Robert Fromholz created TIKA-4204:
-
Summary: ChmExtractor unable to decompress file
Key: TIKA-4204
URL: https://issues.apache.org/jira/browse/TIKA-4204
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820366#comment-17820366
]
Hudson commented on TIKA-4203:
--
FAILURE: Integrated in Jenkins build Tika » tika-main-jdk11 #1528 (See
[
https://issues.apache.org/jira/browse/TIKA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-4203:
--
Fix Version/s: 3.0.0
> Add @deprecated annotation where nee
[
https://issues.apache.org/jira/browse/TIKA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-4203:
--
Affects Version/s: 3.0.0
> Add @deprecated annotation where nee
[
https://issues.apache.org/jira/browse/TIKA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr resolved TIKA-4203.
---
Resolution: Fixed
> Add @deprecated annotation where nee
Tilman Hausherr created TIKA-4203:
-
Summary: Add @deprecated annotation where needed
Key: TIKA-4203
URL: https://issues.apache.org/jira/browse/TIKA-4203
Project: Tika
Issue Type: Task
[
https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820223#comment-17820223
]
Hudson commented on TIKA-4202:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1526 (See
[
https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820203#comment-17820203
]
ASF GitHub Bot commented on TIKA-4202:
--
tballison merged PR #1621:
URL: https://github.com/apache
[
https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820165#comment-17820165
]
ASF GitHub Bot commented on TIKA-4202:
--
tballison opened a new pull request, #1621:
URL: https
[
https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4202:
--
Summary: Add page count of OCR'd pages in metadata for PDF files (was: Add
page count of OCR'd pages
Tim Allison created TIKA-4202:
-
Summary: Add page count of OCR'd pages in PDF's metadata
Key: TIKA-4202
URL: https://issues.apache.org/jira/browse/TIKA-4202
Project: Tika
Issue Type: New Feature
[
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819093#comment-17819093
]
Hudson commented on TIKA-4199:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1520 (See
[
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-4199:
--
Fix Version/s: 2.9.2
3.0.0
> commons-compress 1.26.0 breaks Apache T
[
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818937#comment-17818937
]
Tilman Hausherr commented on TIKA-4199:
---
I tried an another solution
{code:java
[
https://issues.apache.org/jira/browse/TIKA-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818916#comment-17818916
]
Hudson commented on TIKA-4201:
--
FAILURE: Integrated in Jenkins build Tika » tika-main-jdk11 #1518 (See
[
https://issues.apache.org/jira/browse/TIKA-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818907#comment-17818907
]
ASF GitHub Bot commented on TIKA-4201:
--
tballison merged PR #1608:
URL: https://github.com/apache
[
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818905#comment-17818905
]
Hudson commented on TIKA-4199:
--
FAILURE: Integrated in Jenkins build Tika » tika-main-jdk11 #1517 (See
[
https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818906#comment-17818906
]
Hudson commented on TIKA-4198:
--
FAILURE: Integrated in Jenkins build Tika » tika-main-jdk11 #1517 (See
[
https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818882#comment-17818882
]
ASF GitHub Bot commented on TIKA-4198:
--
tballison merged PR #1607:
URL: https://github.com/apache
[
https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4198.
---
Fix Version/s: 3.0.0
Resolution: Fixed
> Skip blob fields in geopkg fi
[
https://issues.apache.org/jira/browse/TIKA-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818880#comment-17818880
]
ASF GitHub Bot commented on TIKA-4201:
--
tballison opened a new pull request, #1608:
URL: https
[
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818877#comment-17818877
]
Hudson commented on TIKA-4199:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1516 (See
[
https://issues.apache.org/jira/browse/TIKA-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818873#comment-17818873
]
Tilman Hausherr commented on TIKA-4201:
---
Yeah, makes sense.
> Add hard limit to stream read
[
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818871#comment-17818871
]
Tim Allison commented on TIKA-4199:
---
I opened TIKA-4201 to add a hard limit to the read
[
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818867#comment-17818867
]
Tilman Hausherr edited comment on TIKA-4199 at 2/20/24 3:37 PM:
{quote}I'm
[
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818867#comment-17818867
]
Tilman Hausherr commented on TIKA-4199:
---
{quote}I'm not declaring this a problem with commons
Tim Allison created TIKA-4201:
-
Summary: Add hard limit to stream reading in
IWorksParser#detectType
Key: TIKA-4201
URL: https://issues.apache.org/jira/browse/TIKA-4201
Project: Tika
Issue Type
[
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818853#comment-17818853
]
Tim Allison commented on TIKA-4199:
---
As I look at the IWorkPackageParser and the detectType(), I think
[
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818846#comment-17818846
]
Tim Allison commented on TIKA-4199:
---
Thank you [~tilman] for working on this! I'm sorry I opened
[
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818823#comment-17818823
]
Tilman Hausherr commented on TIKA-4199:
---
After merging I discovered that the SevenZWrapper class
[
https://issues.apache.org/jira/browse/TIKA-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr closed TIKA-4200.
-
Resolution: Duplicate
Our CI is failing because of the CVE :-( Duplicate of TIKA-4199. I'm still
[
https://issues.apache.org/jira/browse/TIKA-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818790#comment-17818790
]
Tim Allison commented on TIKA-4200:
---
Argh. Sorry. [~tilman] is already working on this:
https
[
https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818781#comment-17818781
]
ASF GitHub Bot commented on TIKA-4198:
--
tballison opened a new pull request, #1607:
URL: https
Tim Allison created TIKA-4200:
-
Summary: Fix broken build after upgrade to commons-compress
Key: TIKA-4200
URL: https://issues.apache.org/jira/browse/TIKA-4200
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818774#comment-17818774
]
Tilman Hausherr edited comment on TIKA-4199 at 2/20/24 11:57 AM:
-
I'm
[
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818774#comment-17818774
]
Tilman Hausherr commented on TIKA-4199:
---
I'm working on it
https://github.com/apache/pdfbox/pull
Alexander Veit created TIKA-4199:
Summary: commons-compress 1.26.0 breaks Apache Tika 2.9.1
Key: TIKA-4199
URL: https://issues.apache.org/jira/browse/TIKA-4199
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818069#comment-17818069
]
Gregory Lepore commented on TIKA-4198:
--
For this set of data from the Bureau of Land Management
[
https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818068#comment-17818068
]
Tim Allison commented on TIKA-4198:
---
[~g...@rhobard.com], based on your knowledge of the format, should
[
https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818018#comment-17818018
]
Gregory Lepore commented on TIKA-4198:
--
This would make a huge difference in my agency's ability
[
https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817988#comment-17817988
]
Tim Allison commented on TIKA-4198:
---
On one 130MB file, the processing time went from 320 seconds ->
[
https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817981#comment-17817981
]
Tim Allison commented on TIKA-4198:
---
Turns out there's a "geom" field and also a &q
Tim Allison created TIKA-4198:
-
Summary: Skip blob fields in geopkg files
Key: TIKA-4198
URL: https://issues.apache.org/jira/browse/TIKA-4198
Project: Tika
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817970#comment-17817970
]
Hudson commented on TIKA-4166:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1512 (See
[
https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816896#comment-17816896
]
Lonzak commented on TIKA-3784:
--
One other possibility would be to combine both approaches:
Usage
[
https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816811#comment-17816811
]
Lonzak edited comment on TIKA-3784 at 2/13/24 8:16 AM:
---
PKCS12 is not the easiest
[
https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816811#comment-17816811
]
Lonzak edited comment on TIKA-3784 at 2/13/24 8:14 AM:
---
PKCS12 is not the easiest
[
https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816827#comment-17816827
]
Tim Allison commented on TIKA-3784:
---
Well, sure, if you want to make it easy! Y, let's go with something
[
https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816811#comment-17816811
]
Lonzak edited comment on TIKA-3784 at 2/12/24 11:16 PM:
PKCS12 is not the easiest
[
https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816811#comment-17816811
]
Lonzak edited comment on TIKA-3784 at 2/12/24 11:15 PM:
PKCS12 is not the easiest
[
https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lonzak updated TIKA-4194:
-
Description:
We use tika to detect the type of a file which is uploaded. In most cases this
works quite well
[
https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816811#comment-17816811
]
Lonzak commented on TIKA-3784:
--
PKCS12 is not the easiest format :-|
The oid for pkcs12 starts
[
https://issues.apache.org/jira/browse/TIKA-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816804#comment-17816804
]
Hudson commented on TIKA-4191:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1505 (See
[
https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816788#comment-17816788
]
Nick Burch commented on TIKA-3784:
--
>From [https://datatracker.ietf.org/doc/rfc7292/] it looks l
[
https://issues.apache.org/jira/browse/TIKA-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816779#comment-17816779
]
Hudson commented on TIKA-4196:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1504 (See
[
https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816781#comment-17816781
]
Hudson commented on TIKA-4194:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1504 (See
[
https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816780#comment-17816780
]
Hudson commented on TIKA-4195:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1504 (See
[
https://issues.apache.org/jira/browse/TIKA-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816715#comment-17816715
]
ASF GitHub Bot commented on TIKA-4191:
--
tballison merged PR #1575:
URL: https://github.com/apache
[
https://issues.apache.org/jira/browse/TIKA-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4197.
---
Fix Version/s: 2.9.2
Resolution: Fixed
> Downgrade jackrabbit in
Tim Allison created TIKA-4197:
-
Summary: Downgrade jackrabbit in 2.x
Key: TIKA-4197
URL: https://issues.apache.org/jira/browse/TIKA-4197
Project: Tika
Issue Type: Bug
Reporter: Tim
[
https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4195:
--
Description:
The JSoupParser runs encoding detection on the InputStream. If the result is
null
[
https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4195.
---
Fix Version/s: 3.0.0
Resolution: Fixed
> JSoupParser conceals null from the EncodingDetec
[
https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816694#comment-17816694
]
ASF GitHub Bot commented on TIKA-4195:
--
tballison merged PR #1591:
URL: https://github.com/apache
[
https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816689#comment-17816689
]
Tim Allison commented on TIKA-4194:
---
Merged and cherry-picked into branch_2x.
[~tom_1st] if you do have
[
https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816687#comment-17816687
]
ASF GitHub Bot commented on TIKA-4194:
--
tballison merged PR #1589:
URL: https://github.com/apache
[
https://issues.apache.org/jira/browse/TIKA-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816680#comment-17816680
]
ASF GitHub Bot commented on TIKA-4196:
--
tballison merged PR #1590:
URL: https://github.com/apache
[
https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816679#comment-17816679
]
ASF GitHub Bot commented on TIKA-4195:
--
tballison opened a new pull request, #1591:
URL: https
[
https://issues.apache.org/jira/browse/TIKA-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4196:
--
Description: The ICU4j and the StandardHtmlEncodingDetector detectors
include a bom detector
[
https://issues.apache.org/jira/browse/TIKA-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816668#comment-17816668
]
ASF GitHub Bot commented on TIKA-4196:
--
tballison opened a new pull request, #1590:
URL: https
Tim Allison created TIKA-4196:
-
Summary: Add a BOM charset detector
Key: TIKA-4196
URL: https://issues.apache.org/jira/browse/TIKA-4196
Project: Tika
Issue Type: New Feature
Reporter
[
https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816661#comment-17816661
]
Tim Allison commented on TIKA-4194:
---
Thank you for this! I'll try to take a look later today
Tim Allison created TIKA-4195:
-
Summary: JSoupParser conceals null from the EncodingDetector
Key: TIKA-4195
URL: https://issues.apache.org/jira/browse/TIKA-4195
Project: Tika
Issue Type
[
https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816607#comment-17816607
]
Lonzak edited comment on TIKA-4194 at 2/12/24 1:47 PM:
---
Interestingly
601 - 700 of 31029 matches
Mail list logo