[
https://issues.apache.org/jira/browse/TIKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160099#comment-15160099
]
Tim Allison commented on TIKA-1866:
---
Not TikaInputStream's fault.
This looks to be a bug deep within
[
https://issues.apache.org/jira/browse/TIKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160022#comment-15160022
]
Tim Allison commented on TIKA-1866:
---
Strike that...image handling is not the problem. If I save the
[
https://issues.apache.org/jira/browse/TIKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159659#comment-15159659
]
Chris A. Mattmann commented on TIKA-1696:
-
[~lewistre] FYI
> Language Identification with Text
Thanks Ken.
We are working on bringing in Text.jl and prefer at this point
to work on 1.x branch aka master. I’ve asked Trevor to take a look
at the 1.x branch and pulling your code from 2.x for tika-detect
module into 1.x. Then to look at adding text.jl from MIT-LL as a
corresponding
[
https://issues.apache.org/jira/browse/TIKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris A. Mattmann reassigned TIKA-1696:
---
Assignee: Chris A. Mattmann
> Language Identification with Text Processing Toolkit
Hi Trevor,
1. I assume the benchmark was using a pre-2.0 version of Tika, yes?
It would be great to try out the current support in the 2.0 branch, as a
comparison with what we had previously.
Also, details on the test corpus used would be useful.
2. I started using the ServiceLoader pattern
Hi all,
I am Trevor and I am a grad student at USC currently working with Prof.
Chris Mattmann and Paul Ramirez, on integrating Tika with MIT Lincoln Lab’s
Text.jl library for language detection.
https://issues.apache.org/jira/browse/TIKA-1696
Since, Text.jl is written in Julia I have created a
[
https://issues.apache.org/jira/browse/TIKA-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159042#comment-15159042
]
Nick Burch commented on TIKA-1867:
--
You should be able to exclude the CompositeExternalParser with a ~5
Roman Kratochvil created TIKA-1867:
--
Summary: Tika external parsers cannot be turned off without
patching the tika-app-XX.jar
Key: TIKA-1867
URL: https://issues.apache.org/jira/browse/TIKA-1867
[
https://issues.apache.org/jira/browse/TIKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158894#comment-15158894
]
Tim Allison commented on TIKA-1866:
---
Looks like something in the image handling is causing problems.
[
https://issues.apache.org/jira/browse/TIKA-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158871#comment-15158871
]
Tim Allison commented on TIKA-1866:
---
That's exciting. I'll take a look.
> Out of memory error on Word
11 matches
Mail list logo