On Fri, 7 Jan 2022, Josh Burchard wrote:
I wrote to Tim about making a small update to
https://cwiki.apache.org/confluence/display/TIKA/TikaServerEndpointsCompared
and he suggested that I email this dev list to see if someone could grant
me editor access. Is that a possibility?
Can you sign up
On Wed, 15 Dec 2021, Tim Allison wrote:
Sounds good, Nick. Unless there are objections, I'll add an EOL
September 30, 2022 for the 1.x branch on our github README and maybe our
site somewhere?
Maybe just mention it in the news section at the end any 1.x fix releases?
Nick
On Wed, 15 Dec 2021, Tim Allison wrote:
I think we should keep the 1.x branch open for security upgrades for a
bit...middle of next year? I have _not_ been adding new features or
even some bug fixes to 1.x, and I encourage people to migrate to 2.x.
We've seen quite a few queries from people
[
https://issues.apache.org/jira/browse/TIKA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444644#comment-17444644
]
Nick Burch commented on TIKA-3590:
--
[~salmira] Are you able to create us a few sample dmg files to test
[
https://issues.apache.org/jira/browse/TIKA-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434493#comment-17434493
]
Nick Burch commented on TIKA-3582:
--
Bit fiddly, but how about a config option on the server
[
https://issues.apache.org/jira/browse/TIKA-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428246#comment-17428246
]
Nick Burch commented on TIKA-3570:
--
[~delmaestro_l] Does that sample file load in the program
[
https://issues.apache.org/jira/browse/TIKA-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427920#comment-17427920
]
Nick Burch commented on TIKA-3570:
--
Do you have a small sample file that you can share with us, ideally
[
https://issues.apache.org/jira/browse/TIKA-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418531#comment-17418531
]
Nick Burch commented on TIKA-3559:
--
I'm not sure if the example in the spec is under a suitable license
[
https://issues.apache.org/jira/browse/TIKA-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418505#comment-17418505
]
Nick Burch commented on TIKA-3559:
--
As we get more JSON-based formats, I wonder if we should do
[
https://issues.apache.org/jira/browse/TIKA-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418157#comment-17418157
]
Nick Burch commented on TIKA-3558:
--
That seems to be a vulnerability in the libflac C code, so shouldn't
[
https://issues.apache.org/jira/browse/TIKA-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416044#comment-17416044
]
Nick Burch commented on TIKA-3554:
--
Just to emphasise what Tim has written, file type detection in Apache
[
https://issues.apache.org/jira/browse/TIKA-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416009#comment-17416009
]
Nick Burch commented on TIKA-3554:
--
If possible, wrap your {{InputStream}} as a {{TikaInputStream
[
https://issues.apache.org/jira/browse/TIKA-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415782#comment-17415782
]
Nick Burch commented on TIKA-3555:
--
Doesn't that make us look more dodgy, and more likely to trigger
[
https://issues.apache.org/jira/browse/TIKA-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415460#comment-17415460
]
Nick Burch commented on TIKA-3554:
--
If you want Apache Tika to do detection only on the file contents
[
https://issues.apache.org/jira/browse/TIKA-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415439#comment-17415439
]
Nick Burch commented on TIKA-3555:
--
See TIKA-259
This file will make an underpowered computer unhappy
[
https://issues.apache.org/jira/browse/TIKA-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411814#comment-17411814
]
Nick Burch commented on TIKA-3544:
--
Apache POI provides the DataFormatter class which attempts to turn
[
https://issues.apache.org/jira/browse/TIKA-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411774#comment-17411774
]
Nick Burch commented on TIKA-3544:
--
You need to be aware that Excel itself only stored numbers-as-numbers
[
https://issues.apache.org/jira/browse/TIKA-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402788#comment-17402788
]
Nick Burch commented on TIKA-3534:
--
This class is used by the bits of Apache Tika (mostly parsers
[
https://issues.apache.org/jira/browse/TIKA-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400934#comment-17400934
]
Nick Burch commented on TIKA-3528:
--
The specification document from Microsoft documents the following
[
https://issues.apache.org/jira/browse/TIKA-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400920#comment-17400920
]
Nick Burch commented on TIKA-3528:
--
Currently we detect to the video format based on the overall
On Wed, 11 Aug 2021, Tim Allison wrote:
A) I think we should maintain the 1.x branch and continue to put out
bug fixes for a bit. Any objections to nominally calling the next
release 1.27.1 on JIRA at least?
I agree we should probably try to keep 1.x going for at least a few
months, to
Hi All
I came across Kaitai - http://kaitai.io/ - yesterday. Based on the
experiences documented in this twitter thread on understanding + parsing
an embedded filesystem:
https://twitter.com/wrongbaud/status/1424380510671880198
Looks like it might be worth a look for if we need to write our
On Mon, 26 Jul 2021, Tim Allison wrote:
Currently the OpenSearch emitter works with the 7.x version of
Elasticsearch. Going forward, when the projects diverge:
a) do we want to support Elasticsearch and
I think we should try, but I'm not sure if it should be "we = Apache Tika"
or "we = Tika
[
https://issues.apache.org/jira/browse/TIKA-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386428#comment-17386428
]
Nick Burch commented on TIKA-3496:
--
If there is no timezone stored in the original file, I don't think we
[
https://issues.apache.org/jira/browse/TIKA-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17385514#comment-17385514
]
Nick Burch commented on TIKA-3489:
--
I'm not keen on us throwing away information we can easily return
[
https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377953#comment-17377953
]
Nick Burch commented on TIKA-3466:
--
[~psakkanan] You really need to be doing some xml parsing
[
https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377317#comment-17377317
]
Nick Burch commented on TIKA-3466:
--
I'm happy to add the xmlns version as a match, that seems pretty
[
https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376689#comment-17376689
]
Nick Burch commented on TIKA-3466:
--
I've never seen a file that like before, but I'm sure Tim will pop
[
https://issues.apache.org/jira/browse/TIKA-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363054#comment-17363054
]
Nick Burch commented on TIKA-3445:
--
I think that's an email file, Tika thinks that's an email file, seems
[
https://issues.apache.org/jira/browse/TIKA-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362930#comment-17362930
]
Nick Burch commented on TIKA-3445:
--
This file does seem to be a series of emails. Checking
[
https://issues.apache.org/jira/browse/TIKA-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355770#comment-17355770
]
Nick Burch commented on TIKA-3431:
--
Could this be a PDF where there is a scan + already-OCR'd text
[
https://issues.apache.org/jira/browse/TIKA-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355185#comment-17355185
]
Nick Burch commented on TIKA-3429:
--
Most bits of Tika need the mime entries loading, even if you
[
https://issues.apache.org/jira/browse/TIKA-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352069#comment-17352069
]
Nick Burch commented on TIKA-3421:
--
For the obsolete part, how about we follow the pattern of
{{text
[
https://issues.apache.org/jira/browse/TIKA-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352015#comment-17352015
]
Nick Burch commented on TIKA-3421:
--
For the specific case of {{message/news}} I think we probably need
[
https://issues.apache.org/jira/browse/TIKA-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352009#comment-17352009
]
Nick Burch commented on TIKA-3421:
--
If a type used to be used, I think we should keep it in Tika. Though
[
https://issues.apache.org/jira/browse/TIKA-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17349322#comment-17349322
]
Nick Burch commented on TIKA-3411:
--
The current Tika matching logic is:
* If we have only a filename
[
https://issues.apache.org/jira/browse/TIKA-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17349149#comment-17349149
]
Nick Burch commented on TIKA-3408:
--
Ah, I wonder if that's a bug in the version of the mp4 library used
[
https://issues.apache.org/jira/browse/TIKA-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17349123#comment-17349123
]
Nick Burch commented on TIKA-3411:
--
The 10 byte magic should be fine, even though it's mostly text
[
https://issues.apache.org/jira/browse/TIKA-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17348193#comment-17348193
]
Nick Burch commented on TIKA-3408:
--
I'm not sure what you mean by an epoch date here, and I can't see any
[
https://issues.apache.org/jira/browse/TIKA-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347750#comment-17347750
]
Nick Burch commented on TIKA-3409:
--
I'm not sure if we'd want to put this on MediaTypeRegistry
[
https://issues.apache.org/jira/browse/TIKA-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347741#comment-17347741
]
Nick Burch commented on TIKA-3408:
--
What date do you think is in the MP3 that you aren't getting? The ID3
[
https://issues.apache.org/jira/browse/TIKA-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347494#comment-17347494
]
Nick Burch edited comment on TIKA-3409 at 5/19/21, 11:34 AM:
-
As well
[
https://issues.apache.org/jira/browse/TIKA-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347494#comment-17347494
]
Nick Burch commented on TIKA-3409:
--
As well as the primary type that Tika detects, also check the aliases
[
https://issues.apache.org/jira/browse/TIKA-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347418#comment-17347418
]
Nick Burch edited comment on TIKA-3409 at 5/19/21, 8:47 AM:
Do you want
[
https://issues.apache.org/jira/browse/TIKA-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347418#comment-17347418
]
Nick Burch commented on TIKA-3409:
--
Do you want to know if Apache Tika can parse the file? Or if you
[
https://issues.apache.org/jira/browse/TIKA-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343485#comment-17343485
]
Nick Burch commented on TIKA-3392:
--
[~tallison] What about the other Tika "own" XML files
[
https://issues.apache.org/jira/browse/TIKA-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342898#comment-17342898
]
Nick Burch commented on TIKA-3392:
--
Not sure how easy / possible / user friendly this would be, but... my
[
https://issues.apache.org/jira/browse/TIKA-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17333295#comment-17333295
]
Nick Burch commented on TIKA-3373:
--
You can't override a built-in type. For now, just grab the updated
[
https://issues.apache.org/jira/browse/TIKA-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17333178#comment-17333178
]
Nick Burch commented on TIKA-3373:
--
Thanks for that SO post, very helpful to see what people are commonly
[
https://issues.apache.org/jira/browse/TIKA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17330827#comment-17330827
]
Nick Burch commented on TIKA-3364:
--
I'm not sure if we already have outlines/bookmarks elsewhere in other
[
https://issues.apache.org/jira/browse/TIKA-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304115#comment-17304115
]
Nick Burch commented on TIKA-3331:
--
Almost certainly a GUI bug from what you describe, but possibly also
[
https://issues.apache.org/jira/browse/TIKA-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304102#comment-17304102
]
Nick Burch commented on TIKA-3328:
--
We give a file starting %PDF a high magic priority, but one starting
[
https://issues.apache.org/jira/browse/TIKA-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304092#comment-17304092
]
Nick Burch commented on TIKA-3331:
--
Many parsers will return
[http://tika.apache.org/1.25/api/org/apache
[
https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-3310.
--
Fix Version/s: 1.26
2.0
Resolution: Fixed
> MP4 video detected as applicat
[
https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17301280#comment-17301280
]
Nick Burch commented on TIKA-3310:
--
Thanks for all your help on this [~peterkronenberg] !
> MP4 vi
[
https://issues.apache.org/jira/browse/TIKA-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17301278#comment-17301278
]
Nick Burch commented on TIKA-3316:
--
Mimetype wise, my view, for what it's worth...
It depends on how
[
https://issues.apache.org/jira/browse/TIKA-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17301276#comment-17301276
]
Nick Burch commented on TIKA-3318:
--
MP3 parser (+tests) updated
[
https://issues.apache.org/jira/browse/TIKA-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-3318.
--
Fix Version/s: 1.26
2.0
Resolution: Fixed
> MP3 parser using wr
Nick Burch created TIKA-3318:
Summary: MP3 parser using wrong xmpDM:duration units (which aren't
clearly documented)
Key: TIKA-3318
URL: https://issues.apache.org/jira/browse/TIKA-3318
Project: Tika
On Tue, 9 Mar 2021, Tim Allison wrote:
Would this be better?
tika-parsers-basic
tika-parsers-complex
tika-parsers-¯\_(ツ)_/¯
GStreamer has 4 levels of plugins, Base, Good, Ugly and Bad. Descriptions
of what qualifies for what at https://gstreamer.freedesktop.org/modules/ .
I can see
[
https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297714#comment-17297714
]
Nick Burch commented on TIKA-3310:
--
Yup, I'm happy with that, thanks for all the work and the revisions
[
https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296103#comment-17296103
]
Nick Burch commented on TIKA-3310:
--
I think we need to do the loop twice though, once checking major
[
https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296076#comment-17296076
]
Nick Burch commented on TIKA-3310:
--
My worry is, though I don't know if it could happen, is eg major=3g2c
Hi All
For those who don't follow dev@commons, there's yet another fulling tool
on the block! Details below. Looks pretty neat, and is now being used on a
few Apache Commons projects, including Commons Compress which we use
What do people think about more fuzzing? Worth doing? Or just too
[
https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295342#comment-17295342
]
Nick Burch commented on TIKA-3310:
--
Could there be a situation where both a major and a compatible brand
[
https://issues.apache.org/jira/browse/TIKA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17295333#comment-17295333
]
Nick Burch commented on TIKA-3310:
--
FYI There's a few unrelated changes in the pull request, including
[
https://issues.apache.org/jira/browse/TIKA-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17290041#comment-17290041
]
Nick Burch commented on TIKA-3290:
--
[~Vamsi452] You do appear to have mistake a free open source project
On Tue, 9 Feb 2021, Tim Allison wrote:
Would we just swap to throwing an Exception if a parser can't be found /
loaded?
Y, that'd be my inclination.
Seems ok to me
what do we do if someone gives us a Tika Config
that references a Parser that doesn't exist?
My preference would be to throw
On Mon, 8 Feb 2021, Tim Allison wrote:
Do we still need the LoadErrorHandler for TikaConfig 2.x? IIRC, we
added that so that folks who didn't want a dependency could prevent
the loading of the dependency and then silence complaints -- if set to
ignore.
Would we just swap to throwing an
[
https://issues.apache.org/jira/browse/TIKA-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17279750#comment-17279750
]
Nick Burch commented on TIKA-3294:
--
This code is reading something that someone else has already
[
https://issues.apache.org/jira/browse/TIKA-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277257#comment-17277257
]
Nick Burch commented on TIKA-3290:
--
We did some work fairly recently to increase the chances of real
[
https://issues.apache.org/jira/browse/TIKA-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276999#comment-17276999
]
Nick Burch commented on TIKA-3290:
--
At first glance, this does seem to be a series of emails, so
[
https://issues.apache.org/jira/browse/TIKA-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272774#comment-17272774
]
Nick Burch commented on TIKA-3282:
--
Perfect, thanks for checking!
[~tallison] any chance you feel like
[
https://issues.apache.org/jira/browse/TIKA-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272754#comment-17272754
]
Nick Burch commented on TIKA-3282:
--
Thanks for this patch and test file
If you have a copy of OneNote
On Mon, 18 Jan 2021, Tim Allison wrote:
I did only minimal updates to our site so that there's still mostly info
about 1.25, javadocs, etc. are still 1.25. I want to make it clear that
that is the "production" release. If desired, I can do the full suite
of updates for 2.0.0-ALPHA. Let me
[
https://issues.apache.org/jira/browse/TIKA-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264705#comment-17264705
]
Nick Burch commented on TIKA-3274:
--
Two possible issues with the move spring to mind:
* It becomes
On Mon, 11 Jan 2021, Tim Allison wrote:
Thanks to a recommendation from a user and the developer of datasette, I
configured the proxy correctly so that this now works:
https://corpora.tika.apache.org/datasette/
Yey, thanks for tracking that down and getting to the fix!
Nick
[
https://issues.apache.org/jira/browse/TIKA-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17260705#comment-17260705
]
Nick Burch commented on TIKA-3267:
--
I have a feeling this may be due to the magic reflection-based stuff
[
https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17260440#comment-17260440
]
Nick Burch commented on TIKA-3258:
--
I can see beginner users, especially non-Java ones using Tika via
[
https://issues.apache.org/jira/browse/TIKA-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258822#comment-17258822
]
Nick Burch commented on TIKA-3260:
--
If we can make a script that's valid python 2 + 3, that'd be ideal
[
https://issues.apache.org/jira/browse/TIKA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253643#comment-17253643
]
Nick Burch commented on TIKA-3255:
--
The 6mb MP3 file seems to be 2.75mb of ID3 tags, which seems pretty
[
https://issues.apache.org/jira/browse/TIKA-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253511#comment-17253511
]
Nick Burch commented on TIKA-3254:
--
Tika tries to give you clean, semantically meaningful XHTML
Hi All
I'm having some issues with the datasette instance on the vm. The main
table pages are working, but csv/json/queries seem to be giving a 404.
Happy - https://corpora.tika.apache.org/datasette/file_profiles/file_profiles
Unhappy -
On Thu, 19 Nov 2020, Nick Burch wrote:
On Thu, 19 Nov 2020, Tim Allison wrote:
Looks like 'scale' needs to be taken into consideration? See 1.2.6.9
https://www.adobe.com/content/dam/acom/en/devnet/xmp/pdfs/XMPSDKReleasecc-2020/XMPSpecificationPart2.pdf
Ah, yes, check the spec!
1.2.6.5
On Mon, 30 Nov 2020, Tim Allison wrote:
Now that 1.25 is released, I'm going to work on refactoring tika-eval and
tika-server shortly. Then add back in the osgi bundle. After that, shall
we go with 2.0.0-ALPHA?
Seems ok to me, assuming you're happy to do the work! :)
Thanks
Nick
, even though it's a
breaking change.
Thoughts?
Nick
On Wed, Nov 18, 2020 at 3:26 PM Nick Burch wrote:
Hi All
This question promoted by https://stackoverflow.com/q/64888488/685641
Is there / should there be fixed units on the xmpDM:duration metadata
property? And if so, what?
Currently
Hi All
This question promoted by https://stackoverflow.com/q/64888488/685641
Is there / should there be fixed units on the xmpDM:duration metadata
property? And if so, what?
Currently, MP3 seems to use milliseconds via
[
https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229349#comment-17229349
]
Nick Burch commented on TIKA-1735:
--
It has been a while since I last looked at this parser, and I'd
[
https://issues.apache.org/jira/browse/TIKA-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226820#comment-17226820
]
Nick Burch commented on TIKA-3218:
--
I think the idea of this was so that eg Parsers would have user
[
https://issues.apache.org/jira/browse/TIKA-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218859#comment-17218859
]
Nick Burch commented on TIKA-3211:
--
7zip is mostly LGPL, so we wouldn't be able to include
On Wed, 21 Oct 2020, Alexander Klimetschek wrote:
Regarding xmpcore: I would love to help but it‘s a different department :-)
If you can use internal contacts to find the people we need to prod /
lobby / smile at, that'd be a big help! And/or if you can try to bribe
that team with sending
[
https://issues.apache.org/jira/browse/TIKA-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216827#comment-17216827
]
Nick Burch commented on TIKA-3209:
--
I've taken a look at the code in POI and Tika today, and back when
On Tue, 13 Oct 2020, Tim Allison wrote:
Ha, y, this file exercises those bits of code:
https://github.com/apache/tika/blob/main/tika-parser-modules/tika-parser-microsoft-module/src/test/resources/test-documents/testPPT_oleWorkbook.ppt
Nick, does this match the features of the SO question?
On Fri, 9 Oct 2020, Tim Allison wrote:
Do you think we should follow up on the Tika side? Do we know if we can
handle this?
I thought we did, but checking POIFSContainerDetector I can't actually see
that case covered
I think we (Tika) can handle it in a similar way to CompObj
Over on
Hey All
Just a quick heads-up that for TIKA-3205 I generated a few new small
private keys (RSA, DSA, EC) and added them to the parser test documents
folder, for unit testing the new mime magics for keys and certificates.
They're not protecting or using anything.
One automated security
[
https://issues.apache.org/jira/browse/TIKA-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204054#comment-17204054
]
Nick Burch commented on TIKA-3205:
--
Magic added for PEM and DER encoded certificates, and public/private
[
https://issues.apache.org/jira/browse/TIKA-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195296#comment-17195296
]
Nick Burch commented on TIKA-3195:
--
Currently, Tika has container-based detection for OLE2, Zip, Ogg
[
https://issues.apache.org/jira/browse/TIKA-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194176#comment-17194176
]
Nick Burch commented on TIKA-3195:
--
We need to turn those non-stream types into an InputStream, and use
[
https://issues.apache.org/jira/browse/TIKA-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194067#comment-17194067
]
Nick Burch commented on TIKA-3195:
--
This is expected behaviour. Ogg is a container format. It isn't
[
https://issues.apache.org/jira/browse/TIKA-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192276#comment-17192276
]
Nick Burch commented on TIKA-3193:
--
That is one interesting blog post! Probably the best I've come across
101 - 200 of 2030 matches
Mail list logo