[
https://issues.apache.org/jira/browse/TIKA-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176261#comment-17176261
]
Nick Burch commented on TIKA-3159:
--
That wikipedia page states _Office documents that conform
[
https://issues.apache.org/jira/browse/TIKA-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175415#comment-17175415
]
Nick Burch edited comment on TIKA-3155 at 8/11/20, 9:50 AM:
If we can use
[
https://issues.apache.org/jira/browse/TIKA-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175415#comment-17175415
]
Nick Burch commented on TIKA-3155:
--
If we can use quote mode we should, it will make the output from Tika
[
https://issues.apache.org/jira/browse/TIKA-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175043#comment-17175043
]
Nick Burch commented on TIKA-3153:
--
We talked about using a regex for simplifying the matching of non
On Mon, 3 Aug 2020, Peter Lee wrote:
I'm working with TIKA-3141 recently and pushed a PR in github. As Keith
suggested in the PR, maybe we should add Commons Lang to tika-core, as
it seems Commons Lang are being used elsewhere in tika but not
tika-core.
Historically, we have tried to keep
[
https://issues.apache.org/jira/browse/TIKA-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167819#comment-17167819
]
Nick Burch commented on TIKA-3144:
--
Generally you need to use the {{x-}} prefix on the subtype to mark
[
https://issues.apache.org/jira/browse/TIKA-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166439#comment-17166439
]
Nick Burch commented on TIKA-3141:
--
Unsetting the environment variable seems like the right way to handle
[
https://issues.apache.org/jira/browse/TIKA-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166300#comment-17166300
]
Nick Burch commented on TIKA-3144:
--
After a quick google, I can't seem to find any canonical or even
[
https://issues.apache.org/jira/browse/TIKA-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154600#comment-17154600
]
Nick Burch commented on TIKA-3121:
--
Don't think so, I think we need to ask infra to make the change
[
https://issues.apache.org/jira/browse/TIKA-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153235#comment-17153235
]
Nick Burch commented on TIKA-3115:
--
The Avro metadata files seem to be JSON, so not much hope
On Wed, 24 Jun 2020, Tim Allison wrote:
Thank you, Maruan!
I’ll open a ticket w datasette.
Would a ProxyPassReverse work for this?
Nick
[
https://issues.apache.org/jira/browse/TIKA-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143776#comment-17143776
]
Nick Burch commented on TIKA-3104:
--
Any chance you could create / find a small XML Memgraph file for us
On Mon, 22 Jun 2020, Vegard Stikbakke wrote:
I would like to update outdated installation instructions here:
https://cwiki.apache.org/confluence/display/TIKA/TikaOCR Specifically,
installation on Mac.
So I'm kindly requesting access to edit!
Can you please create yourself an account on our
Hi All
As I understand it (which might be wrong!), Tim is generating a bunch of
reports on things in the corpa / how different tools analyse the corpa /
how Tika works on the stuff there, mostly as SQL databases
Those databases are then available to anyone who is interest to download
and
[
https://issues.apache.org/jira/browse/TIKA-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17135820#comment-17135820
]
Nick Burch commented on TIKA-3113:
--
Any ideas on this scientific-looking format [~lewismc
On Mon, 15 Jun 2020, Maruan Sahyoun wrote:
browsing is now available from https://corpora.tika.apache.org/base/
Let me know what you think or if it doesn't work for you.
Is it worth adding a header and/or footer to the auto-index pages, to
explain what is there + where to get more details?
[
https://issues.apache.org/jira/browse/TIKA-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133944#comment-17133944
]
Nick Burch commented on TIKA-3113:
--
I'm not sure what this is, but I'm fairly sure it isn't latex
Hi All
At the moment, to detect RFC822 emails, we try and check for a bunch of
common header lines right at the start. If not, we check for a few "could
be an unusual header, could be some text", followed by checking for common
headers in a larger area of text below.
For example, starts
On Thu, 4 Jun 2020, Dupinder Singh wrote:
My project is gralde based, so I was trying to resolve the build as you
described in your documentations, but this is not resolving the dependency.
dependencies {
runtime 'org.apache.tika:tika-parsers:1.24.1'
}
That looks like it ought to be fine,
On Thu, 4 Jun 2020, Tim Allison wrote:
Following guidance from https://issues.apache.org/jira/browse/INFRA-20376,
I've requested a corpora-...@tika.apache.org mail list. If we need
separate user/private, we can request those. Let me know.
I don't think we need user or private at this stage -
[
https://issues.apache.org/jira/browse/TIKA-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126443#comment-17126443
]
Nick Burch commented on TIKA-3106:
--
You ought to be able to point your gradle build at the snapshots repo
[
https://issues.apache.org/jira/browse/TIKA-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126324#comment-17126324
]
Nick Burch commented on TIKA-3107:
--
This is a bug in Apache POI, one of the libraries that Tika depends
[
https://issues.apache.org/jira/browse/TIKA-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125545#comment-17125545
]
Nick Burch commented on TIKA-3106:
--
This email starts with a series of long {{ARC-}} headers, which means
[
https://issues.apache.org/jira/browse/TIKA-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125536#comment-17125536
]
Nick Burch commented on TIKA-3104:
--
A mimetype of {{application/x-itunes-bplist}} seems a sensible choice
[
https://issues.apache.org/jira/browse/TIKA-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124936#comment-17124936
]
Nick Burch commented on TIKA-3104:
--
Yup!
https://github.com/apache/tika/blob/master/tika-parsers/src
[
https://issues.apache.org/jira/browse/TIKA-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124934#comment-17124934
]
Nick Burch commented on TIKA-3105:
--
At a quick glance, that first 4 bytes isn't unique-enough. There look
[
https://issues.apache.org/jira/browse/TIKA-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124614#comment-17124614
]
Nick Burch commented on TIKA-3104:
--
At this point, volunteer-permitting, I think we could now also write
[
https://issues.apache.org/jira/browse/TIKA-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119272#comment-17119272
]
Nick Burch commented on TIKA-3104:
--
There's an unmaintained but suitably licensed bplist parser in Java
[
https://issues.apache.org/jira/browse/TIKA-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch updated TIKA-3104:
-
Attachment: memgraph.xml
> Detection of memgraph files exported from Xc
[
https://issues.apache.org/jira/browse/TIKA-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118364#comment-17118364
]
Nick Burch commented on TIKA-3104:
--
You can't currently parse the files, only detect them. Parsing
[
https://issues.apache.org/jira/browse/TIKA-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118359#comment-17118359
]
Nick Burch commented on TIKA-3104:
--
{{bplist}} is an Apple file format for storing property listings
[
https://issues.apache.org/jira/browse/TIKA-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118352#comment-17118352
]
Nick Burch commented on TIKA-3104:
--
At some point we might want to add a dedicated bplist detector
[
https://issues.apache.org/jira/browse/TIKA-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118332#comment-17118332
]
Nick Burch commented on TIKA-3104:
--
Looks like these are based off the bplist format. Not sure if we can
[
https://issues.apache.org/jira/browse/TIKA-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109825#comment-17109825
]
Nick Burch commented on TIKA-2961:
--
Based on
[https://developer.apple.com/library/archive/documentation
On Wed, 15 Apr 2020, hans.mei...@avident-it.se wrote:
I have encountered an issue with Tika running locally on a box that the
Java runtime goes up to over 200% CPU, after running a bulk load of
documents over a couple of days, it is more than 3 million documents.
Can you do a thread dump to
[
https://issues.apache.org/jira/browse/TIKA-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083096#comment-17083096
]
Nick Burch commented on TIKA-3089:
--
Since several parsers need changing... Maybe a new kind of `Config
On Mon, 6 Apr 2020, Eric Pugh wrote:
Maybe this needs better documentation, however this is a “works as
designed” feature!
To avoid the build failing, run mvn package -Dossindex.fail=false
Should we maybe have this set to false by default, and only enabled
on release builds?
(We shouldn't
[
https://issues.apache.org/jira/browse/TIKA-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060277#comment-17060277
]
Nick Burch commented on TIKA-3072:
--
I have just tried your file with the latest version of Apache Tika
[
https://issues.apache.org/jira/browse/TIKA-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057039#comment-17057039
]
Nick Burch commented on TIKA-2714:
--
Seems good to me.
Since we know the magic for v4, we can add
[
https://issues.apache.org/jira/browse/TIKA-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057022#comment-17057022
]
Nick Burch commented on TIKA-2714:
--
>From [https://www.rarlab.com/technote.htm]
h3. RAR
[
https://issues.apache.org/jira/browse/TIKA-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054431#comment-17054431
]
Nick Burch commented on TIKA-3063:
--
Based on the error message, it looks like the file is either
[
https://issues.apache.org/jira/browse/TIKA-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17036421#comment-17036421
]
Nick Burch commented on TIKA-3043:
--
If you are building an all-in-one jar, you need to merge certain
[
https://issues.apache.org/jira/browse/TIKA-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-3023.
--
Fix Version/s: 1.24
Resolution: Fixed
> Text files starting with MOVI are detected as X-
[
https://issues.apache.org/jira/browse/TIKA-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031507#comment-17031507
]
Nick Burch commented on TIKA-3023:
--
The FFMpeg project have some sample SGI Movie files at
[https
[
https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030753#comment-17030753
]
Nick Burch commented on TIKA-3037:
--
The website is generated with Maven, source code at
https
[
https://issues.apache.org/jira/browse/TIKA-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030686#comment-17030686
]
Nick Burch commented on TIKA-3034:
--
We tend to do 3ish releases a year. Last release was in December, so
[
https://issues.apache.org/jira/browse/TIKA-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029736#comment-17029736
]
Nick Burch commented on TIKA-3034:
--
Mathematica does have a fairly unusual start-of-comment structure, so
[
https://issues.apache.org/jira/browse/TIKA-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027621#comment-17027621
]
Nick Burch commented on TIKA-3034:
--
Can you try and pass the filename along with the contents when you
[
https://issues.apache.org/jira/browse/TIKA-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025810#comment-17025810
]
Nick Burch commented on TIKA-3031:
--
This looks like an underlying Apache PDFBox bug to me
[
https://issues.apache.org/jira/browse/TIKA-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025586#comment-17025586
]
Nick Burch commented on TIKA-3030:
--
Pretty sure we've got a test file in Apache POI like this - some
[
https://issues.apache.org/jira/browse/TIKA-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024895#comment-17024895
]
Nick Burch commented on TIKA-3028:
--
The formatting of the raw values into nice strings is handled
On Mon, 27 Jan 2020, Saurabh Bhardwaj wrote:
Currently, Tika is able to figure out whether given file is AMR file or
not but doesn't return one of the most useful information for an AMR
file i.e. its duration.
Generally that means we have mime-magic for detection, but don't have a
parser for
[
https://issues.apache.org/jira/browse/TIKA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018947#comment-17018947
]
Nick Burch commented on TIKA-2294:
--
For fully accurate OOXML (and other zip-subtype) detection, you need
[
https://issues.apache.org/jira/browse/TIKA-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011634#comment-17011634
]
Nick Burch commented on TIKA-3023:
--
Assuming that the byte after MOVI is part of a version or length
[
https://issues.apache.org/jira/browse/TIKA-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998213#comment-16998213
]
Nick Burch commented on TIKA-3007:
--
See
[https://cwiki.apache.org/confluence/display/TIKA
[
https://issues.apache.org/jira/browse/TIKA-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998112#comment-16998112
]
Nick Burch commented on TIKA-3007:
--
There is currently no Parser for HEIC files, only mime detection
[
https://issues.apache.org/jira/browse/TIKA-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995525#comment-16995525
]
Nick Burch commented on TIKA-3007:
--
Mime magic detection is all in Tika Core, so there shouldn't be any
[
https://issues.apache.org/jira/browse/TIKA-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994553#comment-16994553
]
Nick Burch commented on TIKA-3009:
--
That sounds like a "fun" WebLogic bug... Would calling r
[
https://issues.apache.org/jira/browse/TIKA-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989907#comment-16989907
]
Nick Burch commented on TIKA-2929:
--
At the moment, Apache Tika needs to be on the Java Classpath
[
https://issues.apache.org/jira/browse/TIKA-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989856#comment-16989856
]
Nick Burch commented on TIKA-2912:
--
See also
https://github.com/protobufjs/protobuf.js/wiki/How
[
https://issues.apache.org/jira/browse/TIKA-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988981#comment-16988981
]
Nick Burch commented on TIKA-2830:
--
I think we might have solved some of this with TIKA-2942, would you
On Sun, 24 Nov 2019, Nicholas DiPiazza wrote:
Basically I just need some help understanding some of the finer details of
the OneNote format and how to extract info from it.
https://stackoverflow.com/questions/59008205/onenote-parsing-how-to-get-to-the-text-blobs-in-the-document
On Thu, 21 Nov 2019, Oleg Tikhonov wrote:
My question is more pragmatic.
What we put inside the Dockerfile, on which image it will be based on (say
Ubuntu) ...
What will contain an entrypoint? Tika Server? Should we "install" a
tesseract? Anything more?
If we want to be trendy, then Sergey
On Wed, 20 Nov 2019, Tim Allison wrote:
Eric Pugh recently asked on another channel if we had any plans to
release an official docker image for 1.23.
Depending on what we put in the container, we do need to be a little
careful. There's "platform dependencies" under non-compatible licenses
[
https://issues.apache.org/jira/browse/TIKA-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978265#comment-16978265
]
Nick Burch commented on TIKA-2992:
--
Most likely you have an older version of ASM on your classpath which
[
https://issues.apache.org/jira/browse/TIKA-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977661#comment-16977661
]
Nick Burch commented on TIKA-2986:
--
Based on the current {{Detector}} and {{DefaultDetector
[
https://issues.apache.org/jira/browse/TIKA-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977615#comment-16977615
]
Nick Burch commented on TIKA-2986:
--
How do we know which ones are a _must_ though? Many we expect
[
https://issues.apache.org/jira/browse/TIKA-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977613#comment-16977613
]
Nick Burch commented on TIKA-2988:
--
If it's not an official one, I believe we're supposed to prefix
[
https://issues.apache.org/jira/browse/TIKA-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977314#comment-16977314
]
Nick Burch commented on TIKA-2986:
--
Maybe we could add a second mode to Detect for this case? Current
[
https://issues.apache.org/jira/browse/TIKA-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976851#comment-16976851
]
Nick Burch commented on TIKA-2986:
--
Based on [https://cwiki.apache.org/confluence/display/TIKA
[
https://issues.apache.org/jira/browse/TIKA-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976528#comment-16976528
]
Nick Burch edited comment on TIKA-2224 at 11/18/19 4:22 PM:
The Tika Parsers
[
https://issues.apache.org/jira/browse/TIKA-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch updated TIKA-2224:
-
Summary: OneNote formats support - Mime Magic and Parser (was: Mime magic
for OneNote formats
[
https://issues.apache.org/jira/browse/TIKA-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976670#comment-16976670
]
Nick Burch commented on TIKA-2986:
--
I seem to recall that we allow the filename only to win for things
[
https://issues.apache.org/jira/browse/TIKA-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-2942.
--
Fix Version/s: 1.23
Resolution: Fixed
> HEIC files are detected as "video/quicktime&quo
[
https://issues.apache.org/jira/browse/TIKA-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976609#comment-16976609
]
Nick Burch commented on TIKA-2942:
--
Nokia have produced a Java library for the file format -
[https
[
https://issues.apache.org/jira/browse/TIKA-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976528#comment-16976528
]
Nick Burch commented on TIKA-2224:
--
The Tika Parsers project depends on Guava, currently `28.1-jre`
Feel
[
https://issues.apache.org/jira/browse/TIKA-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973054#comment-16973054
]
Nick Burch commented on TIKA-2982:
--
If it's at the bottom of the if block, I think it's fine to drop
[
https://issues.apache.org/jira/browse/TIKA-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971490#comment-16971490
]
Nick Burch commented on TIKA-2942:
--
Do you have a small sample file that you can share with us?
We
[
https://issues.apache.org/jira/browse/TIKA-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965278#comment-16965278
]
Nick Burch commented on TIKA-2972:
--
I see the "send the results to a remote network service&q
On Wed, 30 Oct 2019, Eric Pugh wrote:
I’ve been going through the Wiki a lot over the past three months, and
I’d love to go through and clean out/update the old content.
Wonderful, thanks!
In case you're also feeling keen, the source for the website is
[
https://issues.apache.org/jira/browse/TIKA-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963956#comment-16963956
]
Nick Burch commented on TIKA-2972:
--
It doesn't quite feel like a perfect solution, but I can't think
On Tue, 29 Oct 2019, Tim Allison wrote:
Anyone object if I grant write access to our wiki to Eric Pugh. He
slacked me a request.
I'd almost be tempted to say that we should grant access to all ASF
Committers to our wiki. (Note - not all confluence users, as that includes
fresh spamy
On Wed, 18 Sep 2019, Dan Becker wrote:
I am trying to build the master branch from Ubuntu 18.04, but I am getting
the following error:
[ERROR] Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
1.409 s <<< FAILURE! - in org.apache.tika.server.UnpackerResourceTest
[ERROR]
On Wed, 18 Sep 2019, Tim Allison wrote:
I'm good w '\n'.
I think the issue is that the mvn tooling might not be if you're on
something other than linux/bsd. It seems, as best as I can tell, to create
everything in native line endings no matter what the input files are in.
(I can't spot any
[
https://issues.apache.org/jira/browse/TIKA-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932475#comment-16932475
]
Nick Burch commented on TIKA-2947:
--
So, it turns out that that page is auto-generated, which is why I
[
https://issues.apache.org/jira/browse/TIKA-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-2947.
--
Fix Version/s: 1.23
Resolution: Fixed
> Following Tika documentation results in a build of T
Hi All
I've just done a build of the website for TIKA-2947, and most of the files
changed. From a quick look, it seems to just be line endings though
Currently, the source APT files and the output HTML files don't have any
line endings set in svn. I'm tempted to set the eol style on all
[
https://issues.apache.org/jira/browse/TIKA-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932424#comment-16932424
]
Nick Burch commented on TIKA-2947:
--
I'd be tempted to update that link (and the same in older versions
On Tue, 4 Dec 2018, Tim Allison wrote:
I had to revoke my signing key: EF0CF38A. I have a couple of leads, but
if you know of anyone in the Washington, DC region who might be
interested in signing my new key (944FFD51), let me know.
Send a message to party@ and suggest an after-work Apache
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680404#comment-16680404
]
Nick Burch commented on TIKA-2771:
--
I'm not sure we do. We have documents along with the encoding
[
https://issues.apache.org/jira/browse/TIKA-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662467#comment-16662467
]
Nick Burch commented on TIKA-2765:
--
Oracle hid all the useful Zip security stuff in recent Java releases
[
https://issues.apache.org/jira/browse/TIKA-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655229#comment-16655229
]
Nick Burch commented on TIKA-2744:
--
Nope, it doesn't work that way. All RSS files are XML files
[
https://issues.apache.org/jira/browse/TIKA-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655084#comment-16655084
]
Nick Burch commented on TIKA-2744:
--
{{application/rss+xml}} is a subtype of {{application/xml}} so
[
https://issues.apache.org/jira/browse/TIKA-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653808#comment-16653808
]
Nick Burch commented on TIKA-2744:
--
I've added a test RSS 2.0 file to Tika's test documents, and it's
[
https://issues.apache.org/jira/browse/TIKA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653804#comment-16653804
]
Nick Burch commented on TIKA-2543:
--
Great find Tim! Looks like an excellent resource on this.
Assuming
[
https://issues.apache.org/jira/browse/TIKA-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648881#comment-16648881
]
Nick Burch commented on TIKA-2752:
--
Based on https://wiki.apache.org/tika/ErrorsAndExceptions , I'd say
[
https://issues.apache.org/jira/browse/TIKA-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639748#comment-16639748
]
Nick Burch commented on TIKA-2747:
--
We'll certainly need a sample file with some of these properties
[
https://issues.apache.org/jira/browse/TIKA-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch updated TIKA-2747:
-
Priority: Minor (was: Blocker)
> Expose custom MAPI properties as a result of the OutlookExtrac
[
https://issues.apache.org/jira/browse/TIKA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627544#comment-16627544
]
Nick Burch commented on TIKA-2734:
--
That looks like the print page footer, could it be that?
> T
On Mon, 24 Sep 2018, Tim Allison wrote:
Aside from the problem with users and non-standard XML parsers, were
there any other show-stoppers in POI 4.0.0? Is there a reason to wait
for POI 4.0.1?
I think, in terms of Tika affecting bugs, it was the xml parser stuff, and
commons compress
201 - 300 of 2030 matches
Mail list logo