[
https://issues.apache.org/jira/browse/TIKA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16207598#comment-16207598
]
Nick Burch commented on TIKA-2478:
--
Following the outlook parser model seems likely to deliver "
[
https://issues.apache.org/jira/browse/TIKA-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195579#comment-16195579
]
Nick Burch commented on TIKA-2473:
--
I've added some test files, mime magic and detection. The magic
On Fri, 29 Sep 2017, Giuseppe Totaro wrote:
To sum up, I would like to quickly discuss the following aspects:
- As you all mentioned, the HTTP headers for configuring the
ContentHandler to be used are better suited for the dynamic cases.
Specifically, a ContentHadler can be given through
On Thu, 28 Sep 2017, Giuseppe Totaro wrote:
if I am not wrong, currently you cannot configure a specific ContentHandler
while using tika-server. I mean that you can configure your own parser [0]
but you cannot control which ContentHandler the parser leverages to extract
text and metadata (e.g.,
[
https://issues.apache.org/jira/browse/TIKA-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167672#comment-16167672
]
Nick Burch commented on TIKA-2466:
--
Thanks [~rombert]. I'll give it a day or so for people to ponder
[
https://issues.apache.org/jira/browse/TIKA-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167557#comment-16167557
]
Nick Burch commented on TIKA-2466:
--
[~talli...@mitre.org] The methods not being static on {{ParseContext
[
https://issues.apache.org/jira/browse/TIKA-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167063#comment-16167063
]
Nick Burch commented on TIKA-2462:
--
I've just had a quick try with the library, against a test SAS file
[
https://issues.apache.org/jira/browse/TIKA-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167063#comment-16167063
]
Nick Burch edited comment on TIKA-2462 at 9/14/17 10:37 PM:
I've just had
[
https://issues.apache.org/jira/browse/TIKA-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166949#comment-16166949
]
Nick Burch commented on TIKA-2466:
--
If we're going to use {{DocumentBuilderFactory}}, then we need to make
[
https://issues.apache.org/jira/browse/TIKA-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158736#comment-16158736
]
Nick Burch commented on TIKA-2461:
--
This may be tricky - I've just tried with our test Quattro Pro 7/8
[
https://issues.apache.org/jira/browse/TIKA-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158695#comment-16158695
]
Nick Burch commented on TIKA-2460:
--
I'd tweak the comment to {{System property to set a path
[
https://issues.apache.org/jira/browse/TIKA-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158632#comment-16158632
]
Nick Burch commented on TIKA-2461:
--
Assuming you have the Tika App jar to hand, you can just run
[
https://issues.apache.org/jira/browse/TIKA-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158539#comment-16158539
]
Nick Burch commented on TIKA-2461:
--
Could you try running {{org.apache.poi.poifs.dev.POIFSLister}} against
[
https://issues.apache.org/jira/browse/TIKA-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156703#comment-16156703
]
Nick Burch commented on TIKA-2460:
--
For $DAYJOB, we've configured Tomcat to have
{{$\{catalina.base
[
https://issues.apache.org/jira/browse/TIKA-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147322#comment-16147322
]
Nick Burch commented on TIKA-2450:
--
In Windows, right click on a folder, New then Word Document
[
https://issues.apache.org/jira/browse/TIKA-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-2447.
--
Resolution: Fixed
Fix Version/s: 1.17
> PSDParser creates unnecessary large byte ar
[
https://issues.apache.org/jira/browse/TIKA-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-2445.
--
Resolution: Fixed
Fix Version/s: 1.17
Both are now detected as {{application/x-bat}} , which
Nick Burch created TIKA-2445:
Summary: Windows BAT / CMD detection
Key: TIKA-2445
URL: https://issues.apache.org/jira/browse/TIKA-2445
Project: Tika
Issue Type: Bug
Components: mime
[
https://issues.apache.org/jira/browse/TIKA-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138374#comment-16138374
]
Nick Burch commented on TIKA-2443:
--
Tika doesn't care where you put the file, as long as the classloader
[
https://issues.apache.org/jira/browse/TIKA-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130726#comment-16130726
]
Nick Burch commented on TIKA-2443:
--
It doesn't matter what priority we put on the Date magic
[
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-1367.
--
Resolution: Invalid
Glad to hear it's sorted! Based on the stackoverflow post, it's a tricky
artifact
[
https://issues.apache.org/jira/browse/TIKA-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106121#comment-16106121
]
Nick Burch commented on TIKA-2436:
--
In a similar way to how we handle WMZ files, I've added a new mime
[
https://issues.apache.org/jira/browse/TIKA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100166#comment-16100166
]
Nick Burch commented on TIKA-2433:
--
As it's in a deprecated part of the codebase, I'm not sure we'd do
[
https://issues.apache.org/jira/browse/TIKA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-2433.
--
Resolution: Fixed
Fix Version/s: 1.17
I can reproduce the problem, hopefully fixed
[
https://issues.apache.org/jira/browse/TIKA-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100021#comment-16100021
]
Nick Burch commented on TIKA-2433:
--
What are the arguments you are passing to the Tika App?
> Tika 1
[
https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089013#comment-16089013
]
Nick Burch commented on TIKA-2042:
--
[~mcaruanagalizia] I've added some more rfc822 magic, which I think
[
https://issues.apache.org/jira/browse/TIKA-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085747#comment-16085747
]
Nick Burch commented on TIKA-2042:
--
[~mcaruanagalizia] I've added some more patterns
[
https://issues.apache.org/jira/browse/TIKA-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-2422.
--
Resolution: Fixed
Fix Version/s: 1.16
Thanks for this! Patch applied
> Improve detect
[
https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074825#comment-16074825
]
Nick Burch commented on TIKA-2399:
--
We can properly fix this in 2.x when we sort out how to have multiple
[
https://issues.apache.org/jira/browse/TIKA-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074806#comment-16074806
]
Nick Burch commented on TIKA-2419:
--
One fix might be to drop the priority of the XML magic to 40 to match
Nick Burch created TIKA-2419:
Summary: Try HTML mime magic on broken XML files
Key: TIKA-2419
URL: https://issues.apache.org/jira/browse/TIKA-2419
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074649#comment-16074649
]
Nick Burch commented on TIKA-2418:
--
Hopefully fixed in 0815b2144cf013e1a0803cee72d8076e8c544716 - I've
[
https://issues.apache.org/jira/browse/TIKA-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-2418.
--
Resolution: Fixed
Fix Version/s: 1.16
> English ASCII text classified as video/quickt
[
https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074648#comment-16074648
]
Nick Burch commented on TIKA-1367:
--
[~talli...@mitre.org] I'm not sure there is - we've fixed it in Tika
On Mon, 3 Jul 2017, Allison, Timothy B. wrote:
To help a user configure a parameter in the PDFParser, I just started:
https://wiki.apache.org/tika/TikaConfig. I realize, though, that I
probably should update: https://tika.apache.org/1.15/configuring.html
instead.
Preferences,
[
https://issues.apache.org/jira/browse/TIKA-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-2409.
--
Resolution: Not A Problem
> Tar has different mime type by name vs conte
[
https://issues.apache.org/jira/browse/TIKA-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071387#comment-16071387
]
Nick Burch commented on TIKA-2409:
--
This is as expected. GTar is a specialisation of tar. Not all tar
[
https://issues.apache.org/jira/browse/TIKA-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070055#comment-16070055
]
Nick Burch commented on TIKA-2407:
--
You'd be best off reporting this to the Apache PDFBox project, which
[
https://issues.apache.org/jira/browse/TIKA-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055588#comment-16055588
]
Nick Burch commented on TIKA-2399:
--
The latest gradle has an experimental plugin for generating a maven
[
https://issues.apache.org/jira/browse/TIKA-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050220#comment-16050220
]
Nick Burch commented on TIKA-2394:
--
PST support is provided by libjava-pst. It looks like we're
[
https://issues.apache.org/jira/browse/TIKA-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043571#comment-16043571
]
Nick Burch commented on TIKA-1945:
--
I don't know exactly what Tim'll do, but assuming it's similar to what
[
https://issues.apache.org/jira/browse/TIKA-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042865#comment-16042865
]
Nick Burch commented on TIKA-1945:
--
A small sample file we can use for unit testing is needed, one per
[
https://issues.apache.org/jira/browse/TIKA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-2388.
--
Resolution: Fixed
Fix Version/s: 1.16
> Problem in Tika().detect for ODB (Open Office datab
[
https://issues.apache.org/jira/browse/TIKA-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025536#comment-16025536
]
Nick Burch commented on TIKA-2378:
--
This looks to be a bug in Jackcess, the underlying Java library
[
https://issues.apache.org/jira/browse/TIKA-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024671#comment-16024671
]
Nick Burch commented on TIKA-2376:
--
Tika Parsers already has a dependency on both {{com.googlecode.json
[
https://issues.apache.org/jira/browse/TIKA-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024433#comment-16024433
]
Nick Burch commented on TIKA-2376:
--
I seem to recall that someone (Ted Dunning perhaps?) has written
On 2017-05-18 17:02 (-0400), Nick Burch wrote:
Hi All>
I've just been caught out by the Tika App's -z on a PDF not extracting the >
embedded images. I think we probably shouldn't tweak the default config >
for the other Tika App modes, but what about extract? Any reason why we >
sh
On Mon, 22 May 2017, Allison, Timothy B. wrote:
Last I remember, Tyler had some detailed notes...anyone remember where
those are?
https://wiki.apache.org/tika/ReleaseProcess
Nick
Hi All
I've just been caught out by the Tika App's -z on a PDF not extracting the
embedded images. I think we probably shouldn't tweak the default config
for the other Tika App modes, but what about extract? Any reason why we
shouldn't turn on the PDF Parser option "extractInlineImages" when
[
https://issues.apache.org/jira/browse/TIKA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016324#comment-16016324
]
Nick Burch commented on TIKA-2372:
--
For a GPL licensed library, catacombae
<https://sourceforge.ne
Nick Burch created TIKA-2372:
Summary: OSX DMG support
Key: TIKA-2372
URL: https://issues.apache.org/jira/browse/TIKA-2372
Project: Tika
Issue Type: Improvement
Components: parser
[
https://issues.apache.org/jira/browse/TIKA-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014065#comment-16014065
]
Nick Burch commented on TIKA-2365:
--
Looks like Batik may have inlined some or all of Commons IO. I'd
[
https://issues.apache.org/jira/browse/TIKA-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014059#comment-16014059
]
Nick Burch commented on TIKA-2362:
--
Which format(s) are you having that problem with? Is that all
On Tue, 16 May 2017, Eric Pugh wrote:
It was great to read through
http://events.linuxfoundation.org/sites/events/files/slides/WhatsNewWithApacheTika_1.pdf…
Wow there is a lot in Tika.
And I think that might be the one challenge with the talk structure,
there is SOO much information.
The
Hi All
Last year in Seville, I gave a talk on Tika entitled "Apache Tika - What’s
new with 2.0?". For ApacheCon Miami next week, I've been roped into giving
an updated version...
https://apachecon2017.sched.com/event/9zvD/apache-tika-whats-new-with-20-nick-burch-apache-software-foun
[
https://issues.apache.org/jira/browse/TIKA-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005441#comment-16005441
]
Nick Burch commented on TIKA-1867:
--
I've just tried with your config file and the Tika App. I'm seeing
[
https://issues.apache.org/jira/browse/TIKA-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-2353.
--
Resolution: Invalid
{{grep}} ?
However, please don't use JIRA for asking usage questions. Please direct
[
https://issues.apache.org/jira/browse/TIKA-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992896#comment-15992896
]
Nick Burch commented on TIKA-2351:
--
Just {{java -jar tika-app-1.15-snapshot.jar --text problem.doc
[
https://issues.apache.org/jira/browse/TIKA-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992845#comment-15992845
]
Nick Burch commented on TIKA-2351:
--
I've just tried with a recent nightly build, and no error was reported
[
https://issues.apache.org/jira/browse/TIKA-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992821#comment-15992821
]
Nick Burch commented on TIKA-2351:
--
Can you attach the failing document?
If not, could you try grabbing
[
https://issues.apache.org/jira/browse/TIKA-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988951#comment-15988951
]
Nick Burch commented on TIKA-2346:
--
Thanks Tim! I think we probably don't want it for PPT / PPTX otherwise
[
https://issues.apache.org/jira/browse/TIKA-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-2346.
--
Resolution: Fixed
Implemented in aa4954fb44f707779693faea785acc219739ccd5
Nick Burch created TIKA-2346:
Summary: Allow Office format parsers to exclude parsing shapes
Key: TIKA-2346
URL: https://issues.apache.org/jira/browse/TIKA-2346
Project: Tika
Issue Type
[
https://issues.apache.org/jira/browse/TIKA-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-2345.
--
Resolution: Fixed
Implemented
Note that ExecutorService still requires serialisation to have a complete
Nick Burch created TIKA-2345:
Summary: TikaConfigSerializer should expose EncodingDetector
details
Key: TIKA-2345
URL: https://issues.apache.org/jira/browse/TIKA-2345
Project: Tika
Issue Type
[
https://issues.apache.org/jira/browse/TIKA-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984479#comment-15984479
]
Nick Burch commented on TIKA-2099:
--
[~talli...@mitre.org] has been doing some work on Commons Compress
[
https://issues.apache.org/jira/browse/TIKA-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch resolved TIKA-2327.
--
Resolution: Information Provided
Step 1 - upgrade to a more recent version of Apache Tika
Step 2
On Wed, 29 Mar 2017, Konstantin Gribov wrote:
I've been surprised by such separation, what was the reason to separate
them?
I think partly history (we split in 1.x), partly how the split was done
(osgi folks amongst the most keen), and partly a desire not to have
non-OSGi users getting a
[
https://issues.apache.org/jira/browse/TIKA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945477#comment-15945477
]
Nick Burch commented on TIKA-2313:
--
That check may well not be correct for the older formats. I'd start
[
https://issues.apache.org/jira/browse/TIKA-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945193#comment-15945193
]
Nick Burch commented on TIKA-2313:
--
Opening the document in OpenOffice, it looks to be in French, complete
[
https://issues.apache.org/jira/browse/TIKA-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943238#comment-15943238
]
Nick Burch commented on TIKA-2311:
--
How about we have package parser say "if no mimetype set or cu
[
https://issues.apache.org/jira/browse/TIKA-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937437#comment-15937437
]
Nick Burch commented on TIKA-1772:
--
Thanks for the test file! I've committed it, along with a similar
[
https://issues.apache.org/jira/browse/TIKA-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Burch reopened TIKA-2253:
--
Sadly http://tika.apache.org/1.14/miredot/ and friends remain broken. Could
someone who understands miredot
[
https://issues.apache.org/jira/browse/TIKA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904849#comment-15904849
]
Nick Burch commented on TIKA-2294:
--
That way of calling Tika doesn't pass in the filename, so it'll
[
https://issues.apache.org/jira/browse/TIKA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902906#comment-15902906
]
Nick Burch commented on TIKA-2294:
--
To correctly detect the OOXML sub-type, you either need the filename
On Thu, 9 Mar 2017, Avtar Singh Mehra wrote:
I am new to Apache Tika but have plenty of experience with other Apache
Softwares like Apache Solr, Apache Lucene, Apache Velocity etc. I would
like to start contributing to Apache Tika community. It would be great help
if someone could guide me
[
https://issues.apache.org/jira/browse/TIKA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900140#comment-15900140
]
Nick Burch commented on TIKA-2288:
--
I've got a feeling that this was partly because we didn't have as-good
On Tue, 7 Mar 2017, Thejan Wijesinghe wrote:
I have already use the Tess4j API to rewrite the TesseractOCRParser class,
Although It successfully extracts content from most of the file types, it
fails some particular unit tests in the TesseractOCRParserTest class. I can
solve that. However, I
[
https://issues.apache.org/jira/browse/TIKA-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878174#comment-15878174
]
Nick Burch commented on TIKA-2271:
--
Why are you setting a character limit on your ContentHandler if you
[
https://issues.apache.org/jira/browse/TIKA-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870398#comment-15870398
]
Nick Burch commented on TIKA-1332:
--
Unless we really need a Lucene 6 feature, for now to avoid surprises
[
https://issues.apache.org/jira/browse/TIKA-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868028#comment-15868028
]
Nick Burch commented on TIKA-1332:
--
Apache Ignite seems to use H2, and a google of H2 + apache.org shows
[
https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826453#comment-15826453
]
Nick Burch commented on TIKA-2241:
--
Support added in git in {{320a1f1ede36cf1f62f6f2b8cab468cd78094606
[
https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826325#comment-15826325
]
Nick Burch commented on TIKA-2241:
--
Can you please open a fresh bug for the grobid issue? That's unrelated
[
https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825786#comment-15825786
]
Nick Burch commented on TIKA-2241:
--
To get the list of mime types listed as supported by each parser
[
https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825729#comment-15825729
]
Nick Burch commented on TIKA-2241:
--
You only need to specify a mimetype for a parser if you want to bind
[
https://issues.apache.org/jira/browse/TIKA-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825700#comment-15825700
]
Nick Burch commented on TIKA-2241:
--
The example is no longer the recommended way to generate or test
[
https://issues.apache.org/jira/browse/TIKA-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820902#comment-15820902
]
Nick Burch edited comment on TIKA-2194 at 1/12/17 12:38 PM:
Ah, I've found
[
https://issues.apache.org/jira/browse/TIKA-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820902#comment-15820902
]
Nick Burch commented on TIKA-2194:
--
Ah, I've found the problem with your filename case. In the tika
[
https://issues.apache.org/jira/browse/TIKA-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781782#comment-15781782
]
Nick Burch commented on TIKA-2224:
--
They very much are on github! See
https://github.com/apache/tika
[
https://issues.apache.org/jira/browse/TIKA-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771750#comment-15771750
]
Nick Burch commented on TIKA-2224:
--
Thanks for the test file, I've added it to git and created a unit test
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768945#comment-15768945
]
Nick Burch commented on TIKA-1946:
--
Ideally different file formats would have different mimetypes
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768929#comment-15768929
]
Nick Burch commented on TIKA-1946:
--
I believe it's only normal to have non-ASF headers for code that we're
[
https://issues.apache.org/jira/browse/TIKA-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768699#comment-15768699
]
Nick Burch commented on TIKA-2224:
--
Mime magic now added for `.one` and `.onetoc`. `.onepkg` is actually
Nick Burch created TIKA-2224:
Summary: Mime magic for OneNote formats
Key: TIKA-2224
URL: https://issues.apache.org/jira/browse/TIKA-2224
Project: Tika
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15753837#comment-15753837
]
Nick Burch commented on TIKA-2208:
--
I wonder if we need to put an extra catch
[
https://issues.apache.org/jira/browse/TIKA-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15748104#comment-15748104
]
Nick Burch commented on TIKA-2208:
--
Rather than doing it in code, what happens if you specify a Tika
[
https://issues.apache.org/jira/browse/TIKA-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15741077#comment-15741077
]
Nick Burch commented on TIKA-2194:
--
Matlab files lack a unique magic pattern at the start, which makes
On Wed, 30 Nov 2016, Allison, Timothy B. wrote:
ApacheCon and Apache Big Data will be held at the Intercontinental in
Miami, Florida, May 16-18, 2017
I plan to attend.
Who's in? Any idea if there will be another "content" track like we had
in Austin?
If we want a Content track, then we'd
[
https://issues.apache.org/jira/browse/TIKA-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690279#comment-15690279
]
Nick Burch commented on TIKA-2183:
--
Ping [~chrismattmann] (he's the maintainer of those bindings at
https
[
https://issues.apache.org/jira/browse/TIKA-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690022#comment-15690022
]
Nick Burch commented on TIKA-2183:
--
How are you calling Tika? I'd guess some sort of Python wrapper? If so
401 - 500 of 2030 matches
Mail list logo