[
https://issues.apache.org/jira/browse/TIKA-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830867#comment-17830867
]
Nick Burch commented on TIKA-4223:
--
A lot of the early file extension allocations were taken from the
[
https://issues.apache.org/jira/browse/TIKA-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827017#comment-17827017
]
Nick Burch commented on TIKA-4210:
--
The attached file seems to be an RTF file. I'm not sure what a ".mega
[
https://issues.apache.org/jira/browse/TIKA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824965#comment-17824965
]
Nick Burch commented on TIKA-4208:
--
I would expect that the json output version would need a bit more
[
https://issues.apache.org/jira/browse/TIKA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824874#comment-17824874
]
Nick Burch commented on TIKA-4208:
--
How much heap size do you have allocated?
The error suggests that
[
https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816788#comment-17816788
]
Nick Burch commented on TIKA-3784:
--
>From [https://datatracker.ietf.org/doc/rfc7292/] it looks like
[
https://issues.apache.org/jira/browse/TIKA-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787608#comment-17787608
]
Nick Burch commented on TIKA-4148:
--
For detection of the OLE2 based files, we don't need to find unique
[
https://issues.apache.org/jira/browse/TIKA-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch updated TIKA-4119:
-
Component/s: mime
> Return media type "text/javascript" instead of "application/javascript to
> follow
[
https://issues.apache.org/jira/browse/TIKA-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch updated TIKA-4119:
-
Labels: tika-3x (was: )
> Return media type "text/javascript" instead of "application/javascript to
>
[
https://issues.apache.org/jira/browse/TIKA-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759921#comment-17759921
]
Nick Burch commented on TIKA-4119:
--
I wonder if this is a big enough change around Detection that we
[
https://issues.apache.org/jira/browse/TIKA-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17750344#comment-17750344
]
Nick Burch commented on TIKA-4062:
--
Between holidays and the length of time needed for regression runs +
[
https://issues.apache.org/jira/browse/TIKA-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748454#comment-17748454
]
Nick Burch commented on TIKA-4064:
--
Depends if anyone else on the PMC has the time to be release manager
[
https://issues.apache.org/jira/browse/TIKA-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748452#comment-17748452
]
Nick Burch commented on TIKA-3948:
--
[~solomax] I think the first task is to identify any other areas of
[
https://issues.apache.org/jira/browse/TIKA-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741578#comment-17741578
]
Nick Burch commented on TIKA-4098:
--
The more bytes beyond the start we check for the PDF marker, the more
[
https://issues.apache.org/jira/browse/TIKA-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17730728#comment-17730728
]
Nick Burch commented on TIKA-4060:
--
I'm a muppet... had forgotten to escape the hex characters in the
[
https://issues.apache.org/jira/browse/TIKA-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-4060.
--
Fix Version/s: 2.8.1
Resolution: Fixed
> Add magic to audio/aac in tika-mimetypes.xml
>
[
https://issues.apache.org/jira/browse/TIKA-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17730649#comment-17730649
]
Nick Burch commented on TIKA-4060:
--
0x494443 is the string ID3, which I think ought to be at the start.
[
https://issues.apache.org/jira/browse/TIKA-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17730304#comment-17730304
]
Nick Burch commented on TIKA-4060:
--
I have created some small test AAC files using ffmpeg, and then had a
[
https://issues.apache.org/jira/browse/TIKA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17728992#comment-17728992
]
Nick Burch commented on TIKA-4051:
--
Last time I asked the MPXJ project they weren't interested in
[
https://issues.apache.org/jira/browse/TIKA-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725561#comment-17725561
]
Nick Burch commented on TIKA-3999:
--
Oh, this brings back memories... good memories :)
Unless we can
[
https://issues.apache.org/jira/browse/TIKA-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724302#comment-17724302
]
Nick Burch commented on TIKA-4045:
--
I guess this could also apply for other row-based formats like SQLite
[
https://issues.apache.org/jira/browse/TIKA-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718674#comment-17718674
]
Nick Burch commented on TIKA-4025:
--
Would a video metadata specification's frame count be a better home?
[
https://issues.apache.org/jira/browse/TIKA-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693140#comment-17693140
]
Nick Burch commented on TIKA-3981:
--
Is this happening for all executables on your machine, or just some?
[
https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689199#comment-17689199
]
Nick Burch commented on TIKA-3973:
--
If you only care about container-aware detection for Ogg based
[
https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689176#comment-17689176
]
Nick Burch commented on TIKA-3973:
--
For all container formats you want {{tika-parsers}} or
[
https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689161#comment-17689161
]
Nick Burch edited comment on TIKA-3973 at 2/15/23 2:38 PM:
---
For container-based
[
https://issues.apache.org/jira/browse/TIKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689161#comment-17689161
]
Nick Burch commented on TIKA-3973:
--
For container-based detection (such as the Ogg container format), you
[
https://issues.apache.org/jira/browse/TIKA-3960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17682352#comment-17682352
]
Nick Burch commented on TIKA-3960:
--
If possible, please include a small test file and update
[
https://issues.apache.org/jira/browse/TIKA-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677364#comment-17677364
]
Nick Burch commented on TIKA-3703:
--
I guess we could include a data package metadata file to better
[
https://issues.apache.org/jira/browse/TIKA-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677326#comment-17677326
]
Nick Burch commented on TIKA-3703:
--
A zip file gives you compression, and most clients won't accidentally
[
https://issues.apache.org/jira/browse/TIKA-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675914#comment-17675914
]
Nick Burch commented on TIKA-3955:
--
The Tika App is intended as a "batteries included" standalone app.
[
https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656060#comment-17656060
]
Nick Burch commented on TIKA-3952:
--
Is the PDF a scan? Are you doing OCR?
> Content mismatch
>
[
https://issues.apache.org/jira/browse/TIKA-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17656049#comment-17656049
]
Nick Burch commented on TIKA-3952:
--
Can you try following the steps in
[
https://issues.apache.org/jira/browse/TIKA-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17627638#comment-17627638
]
Nick Burch commented on TIKA-2536:
--
We can only depend on versions in maven central, we can't depend on
[
https://issues.apache.org/jira/browse/TIKA-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620633#comment-17620633
]
Nick Burch commented on TIKA-3890:
--
DOCX files are compressed XML. Text compresses very well. Already
[
https://issues.apache.org/jira/browse/TIKA-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620610#comment-17620610
]
Nick Burch commented on TIKA-3890:
--
The only way to be sure of how many pages are in a Word document is
[
https://issues.apache.org/jira/browse/TIKA-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17603483#comment-17603483
]
Nick Burch commented on TIKA-3850:
--
The kind of statistical language model used in Tika struggles with
[
https://issues.apache.org/jira/browse/TIKA-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17603038#comment-17603038
]
Nick Burch commented on TIKA-3308:
--
Our HTML mime type has both root-XML tags for well-formed documents,
[
https://issues.apache.org/jira/browse/TIKA-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575814#comment-17575814
]
Nick Burch commented on TIKA-3832:
--
Any chance you could try with Apache PDFBox directly? They've got a
[
https://issues.apache.org/jira/browse/TIKA-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-3830.
--
Resolution: Duplicate
> Kaspersky identified a file as riskware
>
[
https://issues.apache.org/jira/browse/TIKA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17574656#comment-17574656
]
Nick Burch commented on TIKA-3829:
--
Can you share a file that triggers this bug?
The method in question
[
https://issues.apache.org/jira/browse/TIKA-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566991#comment-17566991
]
Nick Burch commented on TIKA-3814:
--
I have a feeling that the Text content handler might rely on these
[
https://issues.apache.org/jira/browse/TIKA-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch updated TIKA-3814:
-
Priority: Trivial (was: Blocker)
> Extracted text from HTML file does not exclude newline chars from
[
https://issues.apache.org/jira/browse/TIKA-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562599#comment-17562599
]
Nick Burch commented on TIKA-3811:
--
Maybe [~tallison] has an idea on the config part, he's been working
[
https://issues.apache.org/jira/browse/TIKA-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562537#comment-17562537
]
Nick Burch commented on TIKA-3811:
--
You should not be using Apache Tika's detection for anything security
[
https://issues.apache.org/jira/browse/TIKA-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-3810.
--
Fix Version/s: 2.4.2
Resolution: Fixed
> Vtt file (encoding UTF-8 with BOM) seen as text/plain
>
[
https://issues.apache.org/jira/browse/TIKA-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562532#comment-17562532
]
Nick Burch commented on TIKA-3810:
--
Looks like we had detection magic for the UTF16 variant BOMs but not
[
https://issues.apache.org/jira/browse/TIKA-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562484#comment-17562484
]
Nick Burch commented on TIKA-3809:
--
If the uncompressed XML is 250mb, then you're going to need a heap a
[
https://issues.apache.org/jira/browse/TIKA-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557343#comment-17557343
]
Nick Burch commented on TIKA-3798:
--
With no file, no thread dump and no stack trace, it won't be easy to
[
https://issues.apache.org/jira/browse/TIKA-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17557319#comment-17557319
]
Nick Burch commented on TIKA-3798:
--
Do you have a sample file that shows the problem? A thread dump
[
https://issues.apache.org/jira/browse/TIKA-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552078#comment-17552078
]
Nick Burch commented on TIKA-3768:
--
If we can put something into a properly typed + structured metadata
[
https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550223#comment-17550223
]
Nick Burch commented on TIKA-3784:
--
We don't currently have any Mime Magic for PKCS12 files
Based on
[
https://issues.apache.org/jira/browse/TIKA-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550216#comment-17550216
]
Nick Burch commented on TIKA-3768:
--
I wouldn't expect to find those in the textual content after parsing,
[
https://issues.apache.org/jira/browse/TIKA-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539993#comment-17539993
]
Nick Burch commented on TIKA-3771:
--
The PNG magic is priority 50, which is also what our EML min-match 2
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539594#comment-17539594
]
Nick Burch commented on TIKA-3710:
--
As a "normal" html file wouldn't start with these snippets, and
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539582#comment-17539582
]
Nick Burch commented on TIKA-3710:
--
I was thinking we'd do (open)h1(close) or (open)h1(space) to cover
[
https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538896#comment-17538896
]
Nick Burch commented on TIKA-3710:
--
The h1 isn't quite as unique as we might like, and maybe not as good
[
https://issues.apache.org/jira/browse/TIKA-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529977#comment-17529977
]
Nick Burch commented on TIKA-3571:
--
Some formats support the concept of pages and we can pass that along
[
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529918#comment-17529918
]
Nick Burch commented on TIKA-3742:
--
Sure! Potentially easiest is if you create your own fork of Tika on
[
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529417#comment-17529417
]
Nick Burch commented on TIKA-3742:
--
I believe {{readNBytes}} only came in with Java 9, and the particular
[
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529101#comment-17529101
]
Nick Burch commented on TIKA-3742:
--
Assuming we just want type=17 text elements of a DGNv7 file (as per
[
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529038#comment-17529038
]
Nick Burch commented on TIKA-3742:
--
In theory you shouldn't need any java code at all if you don't want,
[
https://issues.apache.org/jira/browse/TIKA-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529029#comment-17529029
]
Nick Burch commented on TIKA-3742:
--
If it can just be run standalone and then {{ExternalParser}} +
[
https://issues.apache.org/jira/browse/TIKA-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528157#comment-17528157
]
Nick Burch commented on TIKA-3731:
--
We already do a prefix for several other formats for custom metadata
[
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527158#comment-17527158
]
Nick Burch commented on TIKA-3719:
--
Linux and Mac will need quotes around arguments containing spaces. As
[
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526776#comment-17526776
]
Nick Burch commented on TIKA-3721:
--
We already have a few file types which we send to {{OfficeParser}}
[
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526352#comment-17526352
]
Nick Burch commented on TIKA-3721:
--
The mime types mentioned at
[
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526336#comment-17526336
]
Nick Burch commented on TIKA-3721:
--
We've had the OK from the author of the tika-dgn-detector
I'd
[
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526324#comment-17526324
]
Nick Burch commented on TIKA-3721:
--
That detector is written in Kotlin, but should be pretty easy to
[
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525747#comment-17525747
]
Nick Burch commented on TIKA-3719:
--
Those look like the steps needed. I'd suggest we create ours as
[
https://issues.apache.org/jira/browse/TIKA-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525588#comment-17525588
]
Nick Burch commented on TIKA-3725:
--
Something like OAuth would be pretty different to basic auth, due to
[
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525578#comment-17525578
]
Nick Burch commented on TIKA-3719:
--
For testing it, I'd be tempted to create a self-signed certificate
[
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17524718#comment-17524718
]
Nick Burch commented on TIKA-3721:
--
After a quick look, I can't spot any free tools or libraries for
[
https://issues.apache.org/jira/browse/TIKA-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517818#comment-17517818
]
Nick Burch commented on TIKA-3571:
--
It has been a quite a while since I last used jodconverter, but the
[
https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516459#comment-17516459
]
Nick Burch commented on TIKA-3711:
--
I'd lean towards putting the file name as an attribute of the img
[
https://issues.apache.org/jira/browse/TIKA-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504378#comment-17504378
]
Nick Burch commented on TIKA-3696:
--
Shouldn't it be more like {{application/x-wacz}} since it isn't a
[
https://issues.apache.org/jira/browse/TIKA-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504150#comment-17504150
]
Nick Burch commented on TIKA-3684:
--
Same as Tika 2.x - pass a {{--config}} flag when you start the server
[
https://issues.apache.org/jira/browse/TIKA-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-3694.
--
Fix Version/s: 2.3.1
Resolution: Fixed
> Tika Server endpoint to return more details on a mime
[
https://issues.apache.org/jira/browse/TIKA-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502627#comment-17502627
]
Nick Burch commented on TIKA-3694:
--
I've added new HTML and JSON endpoints {{/mime-types/type/subtype}}
Nick Burch created TIKA-3694:
Summary: Tika Server endpoint to return more details on a mime type
Key: TIKA-3694
URL: https://issues.apache.org/jira/browse/TIKA-3694
Project: Tika
Issue Type:
[
https://issues.apache.org/jira/browse/TIKA-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500804#comment-17500804
]
Nick Burch commented on TIKA-3686:
--
Detecting types of text-based files with magic is always going to
[
https://issues.apache.org/jira/browse/TIKA-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489597#comment-17489597
]
Nick Burch commented on TIKA-3676:
--
As long as we provide sensible instructions on what to do, I'm happy
[
https://issues.apache.org/jira/browse/TIKA-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17480955#comment-17480955
]
Nick Burch commented on TIKA-3656:
--
That POM is your problem, you aren't including any of the container
[
https://issues.apache.org/jira/browse/TIKA-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17479981#comment-17479981
]
Nick Burch commented on TIKA-3656:
--
How are you calling Tika? And do you have the office parsers on your
[
https://issues.apache.org/jira/browse/TIKA-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475269#comment-17475269
]
Nick Burch commented on TIKA-3646:
--
I think this is probably the same issue as TIKA-2935 - the same work
[
https://issues.apache.org/jira/browse/TIKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444644#comment-17444644
]
Nick Burch commented on TIKA-3590:
--
[~salmira] Are you able to create us a few sample dmg files to test
[
https://issues.apache.org/jira/browse/TIKA-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434493#comment-17434493
]
Nick Burch commented on TIKA-3582:
--
Bit fiddly, but how about a config option on the server for the
[
https://issues.apache.org/jira/browse/TIKA-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428246#comment-17428246
]
Nick Burch commented on TIKA-3570:
--
[~delmaestro_l] Does that sample file load in the program that
[
https://issues.apache.org/jira/browse/TIKA-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427920#comment-17427920
]
Nick Burch commented on TIKA-3570:
--
Do you have a small sample file that you can share with us, ideally
[
https://issues.apache.org/jira/browse/TIKA-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418531#comment-17418531
]
Nick Burch commented on TIKA-3559:
--
I'm not sure if the example in the spec is under a suitable license.
[
https://issues.apache.org/jira/browse/TIKA-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418505#comment-17418505
]
Nick Burch commented on TIKA-3559:
--
As we get more JSON-based formats, I wonder if we should do a
[
https://issues.apache.org/jira/browse/TIKA-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418157#comment-17418157
]
Nick Burch commented on TIKA-3558:
--
That seems to be a vulnerability in the libflac C code, so shouldn't
[
https://issues.apache.org/jira/browse/TIKA-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416044#comment-17416044
]
Nick Burch commented on TIKA-3554:
--
Just to emphasise what Tim has written, file type detection in Apache
[
https://issues.apache.org/jira/browse/TIKA-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416009#comment-17416009
]
Nick Burch commented on TIKA-3554:
--
If possible, wrap your {{InputStream}} as a {{TikaInputStream}}
[
https://issues.apache.org/jira/browse/TIKA-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415782#comment-17415782
]
Nick Burch commented on TIKA-3555:
--
Doesn't that make us look more dodgy, and more likely to trigger an
[
https://issues.apache.org/jira/browse/TIKA-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415460#comment-17415460
]
Nick Burch commented on TIKA-3554:
--
If you want Apache Tika to do detection only on the file contents
[
https://issues.apache.org/jira/browse/TIKA-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415439#comment-17415439
]
Nick Burch commented on TIKA-3555:
--
See TIKA-259
This file will make an underpowered computer unhappy if
[
https://issues.apache.org/jira/browse/TIKA-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411814#comment-17411814
]
Nick Burch commented on TIKA-3544:
--
Apache POI provides the DataFormatter class which attempts to turn
[
https://issues.apache.org/jira/browse/TIKA-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411774#comment-17411774
]
Nick Burch commented on TIKA-3544:
--
You need to be aware that Excel itself only stored numbers-as-numbers
[
https://issues.apache.org/jira/browse/TIKA-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402788#comment-17402788
]
Nick Burch commented on TIKA-3534:
--
This class is used by the bits of Apache Tika (mostly parsers) that
[
https://issues.apache.org/jira/browse/TIKA-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400934#comment-17400934
]
Nick Burch commented on TIKA-3528:
--
The specification document from Microsoft documents the following
1 - 100 of 1479 matches
Mail list logo