[
https://issues.apache.org/jira/browse/TIKA-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678151#comment-15678151
]
Nick Burch commented on TIKA-1804:
--
Ted Dunning has produced a hopefully drop-in replacement (based
[
https://issues.apache.org/jira/browse/TIKA-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch reopened TIKA-1804:
--
The ASF legal team have recently changed their mind on the license (see
https://lists.apache.org
[
https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651535#comment-15651535
]
Nick Burch commented on TIKA-2159:
--
Given that we don't control all the parsers, I'm worried things my
[
https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15636102#comment-15636102
]
Nick Burch commented on TIKA-2146:
--
My guess is it's about 2-3 weeks of work at the POI level to add
[
https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614327#comment-15614327
]
Nick Burch commented on TIKA-2146:
--
As per https://poi.apache.org/encryption.html, there's no support
[
https://issues.apache.org/jira/browse/TIKA-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610867#comment-15610867
]
Nick Burch commented on TIKA-2144:
--
Do you know how the file in question was generated? It seems to have
[
https://issues.apache.org/jira/browse/TIKA-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579692#comment-15579692
]
Nick Burch commented on TIKA-2122:
--
I'm not sure if we want to be dumping these raw into the Tika metadata
On Wed, 12 Oct 2016, Simone Tripodi wrote:
while upgrading the system where I've been working on, I updated Apache POI
to version 3.15, then Tika (currently tika-parsers-1.7, I am
testing tika-parsers-1.14-SNAPSHOT)
You can't just upgrade one jar. You need to use all of the POI jars
together
On Wed, 5 Oct 2016, Apache Jenkins Server wrote:
The Apache Jenkins build system has built tika-2.x (build #156)
Check console output at https://builds.apache.org/job/tika-2.x/156/ to view the
results.
Another one for our Jenkins experts. Looks like it needs a bit more memory
for the job,
On Wed, 5 Oct 2016, Apache Jenkins Server wrote:
The Apache Jenkins build system has built tika-2.x-windows (build #60)
Check console output at https://builds.apache.org/job/tika-2.x-windows/60/ to
view the results.
Anyone with Jenkins-foo able to fix our Windows Jenkin builds? This failed
[
https://issues.apache.org/jira/browse/TIKA-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548427#comment-15548427
]
Nick Burch commented on TIKA-2107:
--
The attached file is an old Word 2 file, not supported by POI
[
https://issues.apache.org/jira/browse/TIKA-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542115#comment-15542115
]
Nick Burch commented on TIKA-2107:
--
What error are you getting? How are you calling Tika? Are you really
[
https://issues.apache.org/jira/browse/TIKA-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534266#comment-15534266
]
Nick Burch commented on TIKA-2099:
--
This patch removes some special handling put in place for COMPRESS-117
On Mon, 19 Sep 2016, Bob Paulin wrote:
I think it's a good thing to discuss. I know there are other features
that are targeted for 2.0. Do we have a general sense of where those
features are at?
I think the big one we need to crack is allowing multiple parsers to run
against a file. OCR is
[
https://issues.apache.org/jira/browse/TIKA-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509659#comment-15509659
]
Nick Burch commented on TIKA-2087:
--
This is an invalid XML file. You need to fix it so
[
https://issues.apache.org/jira/browse/TIKA-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509655#comment-15509655
]
Nick Burch commented on TIKA-2086:
--
How are you calling Apache Tika? Is this happening for all files
[
https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504023#comment-15504023
]
Nick Burch commented on TIKA-1997:
--
Running your file through the openssl tool {{ asn1parse }}, it shows
[
https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493924#comment-15493924
]
Nick Burch commented on TIKA-2069:
--
Yes! If you wrote a VB Script, and zipped it up, it'd be a {{text/x
[
https://issues.apache.org/jira/browse/TIKA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15490844#comment-15490844
]
Nick Burch commented on TIKA-2058:
--
The code posted above isn't calling {{close}} on the {{MAPIMessage
[
https://issues.apache.org/jira/browse/TIKA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15490717#comment-15490717
]
Nick Burch commented on TIKA-2058:
--
Looking at the patch, I'm not sure how it will help? When
On Wed, 14 Sep 2016, Allison, Timothy B. wrote:
Would it be as much of a disaster to require the user to allow the
fileUrl capability on the commandline at server startup? We could add
some menacing "all bets are off, we hope you know what you're doing"
warning.
With a special switch, and a
On Sun, 11 Sep 2016, Bob Paulin wrote:
I'd like to propose a new Tika App for the 2.0 branch. One of the
reasons we broke apart the Tika parsers into modules was due to the
complexity of having to deal with all the parser dependencies and
transitive dependencies. Now developers can use just
[
https://issues.apache.org/jira/browse/TIKA-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15488175#comment-15488175
]
Nick Burch commented on TIKA-2064:
--
Are you happy to dual-license it as Apache License, Version 2.0? We
[
https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15487662#comment-15487662
]
Nick Burch commented on TIKA-2069:
--
I think the idea of a Macro is probably general enough across a range
[
https://issues.apache.org/jira/browse/TIKA-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15487585#comment-15487585
]
Nick Burch commented on TIKA-2064:
--
Magic added in 3c0abc8eb. No unit tests yet though, we can add them
On Tue, 13 Sep 2016, John Dougrez-Lewis wrote:
Surely the security vulnerability could have been fixed by disallowing
"file://" variants in the URL rather than removing the feature altogether?
Or were there other implementation issues relating to the fileUrl feature
that meant it was best
[
https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484834#comment-15484834
]
Nick Burch commented on TIKA-2069:
--
I think that, given both how big macros can get and how they logically
[
https://issues.apache.org/jira/browse/TIKA-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484812#comment-15484812
]
Nick Burch commented on TIKA-2064:
--
If you could, that would be most helpful!
> Document type detec
[
https://issues.apache.org/jira/browse/TIKA-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446089#comment-15446089
]
Nick Burch commented on TIKA-2064:
--
>From a quick google, `application/x-stata-dta` seems to be what ot
On Thu, 11 Aug 2016, Bob Paulin wrote:
I know it's been a little bit since we talked about 2.0. We had
discussed holding off while some API changes that were under
consideration. Has any progress been made on this?
I think we're still trying to come up with a plan for how to allow
multiple
[
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409311#comment-15409311
]
Nick Burch commented on TIKA-1367:
--
The code exists and you can check out the more modular parsers already
[
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409272#comment-15409272
]
Nick Burch commented on TIKA-1367:
--
This should be largely fixed on the 2.x branch, which has more modular
[
https://issues.apache.org/jira/browse/TIKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401757#comment-15401757
]
Nick Burch commented on TIKA-2046:
--
As per the troubleshooting guide, if one of your files doesn't work
[
https://issues.apache.org/jira/browse/TIKA-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401089#comment-15401089
]
Nick Burch commented on TIKA-2046:
--
Can you try following the steps in
https://wiki.apache.org/tika
[
https://issues.apache.org/jira/browse/TIKA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397499#comment-15397499
]
Nick Burch commented on TIKA-2045:
--
Sounds like it's checking permissions then skipping extraction, so
[
https://issues.apache.org/jira/browse/TIKA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397442#comment-15397442
]
Nick Burch commented on TIKA-2045:
--
As per https://wiki.apache.org/tika/Troubleshooting%20Tika
[
https://issues.apache.org/jira/browse/TIKA-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396477#comment-15396477
]
Nick Burch commented on TIKA-2044:
--
Are you able to reproduce this in a simple junit unit test case
[
https://issues.apache.org/jira/browse/TIKA-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15395282#comment-15395282
]
Nick Burch commented on TIKA-2041:
--
Running "git log" and "git diff" on the file s
[
https://issues.apache.org/jira/browse/TIKA-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394458#comment-15394458
]
Nick Burch commented on TIKA-2041:
--
We added the {{EBCDIC_500_}} family of detectors into our own copy
[
https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393617#comment-15393617
]
Nick Burch commented on TIKA-2042:
--
Fixed in {{72d2d88b381ba75942ae791042ef54af33ee1f38}} - your test file
[
https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-2042.
--
Resolution: Fixed
Fix Version/s: 1.14
> MBOX file detected wrongly as text/h
[
https://issues.apache.org/jira/browse/TIKA-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-2037.
--
Resolution: Fixed
Fix Version/s: 1.14
Fixed in 952fb54 along with a simpler unit test inspired
[
https://issues.apache.org/jira/browse/TIKA-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386159#comment-15386159
]
Nick Burch commented on TIKA-2037:
--
I've just tried with a 1.14 snapshot build, and both are detected
[
https://issues.apache.org/jira/browse/TIKA-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386093#comment-15386093
]
Nick Burch commented on TIKA-2032:
--
It is contained in the {{tika-langdetect}} module, which is optional
[
https://issues.apache.org/jira/browse/TIKA-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357125#comment-15357125
]
Nick Burch commented on TIKA-2025:
--
We could always test the formatted value for {{E+}} (or {{E
[
https://issues.apache.org/jira/browse/TIKA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347398#comment-15347398
]
Nick Burch commented on TIKA-2017:
--
The server ought to be pushing the XML out to the client
[
https://issues.apache.org/jira/browse/TIKA-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15344390#comment-15344390
]
Nick Burch commented on TIKA-1358:
--
The OOXML stuff uses a {{.version suffix}}, so if we followed
[
https://issues.apache.org/jira/browse/TIKA-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343896#comment-15343896
]
Nick Burch commented on TIKA-1358:
--
Commons Compress 1.12 is out, with our required snappy support
[
https://issues.apache.org/jira/browse/TIKA-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338814#comment-15338814
]
Nick Burch commented on TIKA-2015:
--
Fixed on the POI side in r1749213, will be included in POI 3.15 beta 2
[
https://issues.apache.org/jira/browse/TIKA-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331585#comment-15331585
]
Nick Burch commented on TIKA-2004:
--
Wikipedia claims - https://en.wikipedia.org/wiki
[
https://issues.apache.org/jira/browse/TIKA-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327727#comment-15327727
]
Nick Burch commented on TIKA-2003:
--
Looks like David hasn't added his GPG key fingerprint to his profile
[
https://issues.apache.org/jira/browse/TIKA-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322404#comment-15322404
]
Nick Burch commented on TIKA-2001:
--
What's the output of `--detect` on the problematic file?
> Pars
[
https://issues.apache.org/jira/browse/TIKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-1989.
--
Resolution: Fixed
Fix Version/s: 1.14
Markup fixed in r1745867, and deployed to the site
[
https://issues.apache.org/jira/browse/TIKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305313#comment-15305313
]
Nick Burch commented on TIKA-1989:
--
There's more text in the src apt file - I'll have to work out why
On Fri, 27 May 2016, Rahul Khandelwal wrote:
I am using detect/stream api to retrieve the MIME type of the file. But
it's not returning exact MIME type for some document if i am passing 1KB
of data of that file.
That's expected
For example - For open office document it's returning
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300737#comment-15300737
]
Nick Burch commented on TIKA-1513:
--
I haven't read much on the format, but I'd be tempted to maybe have
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300737#comment-15300737
]
Nick Burch edited comment on TIKA-1513 at 5/25/16 7:52 PM:
---
I haven't read much
[
https://issues.apache.org/jira/browse/TIKA-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296225#comment-15296225
]
Nick Burch commented on TIKA-1979:
--
For production use, I'd suggest you switch to the Tika Server
On Fri, 20 May 2016, Joseph Naegele wrote:
I introduced a regression in the HtmlParser in TIKA-1938, which added the
ability to emit parsed tags found in the HTML .