[jira] [Commented] (TIKA-2830) Detect Media type of HEIF file correctly

2019-12-05 Thread Christian (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989467#comment-16989467 ] Christian commented on TIKA-2830: - Seems like the heif detection is as complete as currently possible with

[jira] [Commented] (TIKA-2830) Detect Media type of HEIF file correctly

2019-12-05 Thread Nick Burch (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988981#comment-16988981 ] Nick Burch commented on TIKA-2830: -- I think we might have solved some of this with TIKA-2942, would you

Re: Wrong language detection in tika server 1.22

2019-12-05 Thread Tim Allison
I just updated our wiki. Please let me know if we can improve it further. https://cwiki.apache.org/confluence/display/TIKA/TikaJAXRS#TikaJAXRS-LanguageResource On Thu, Dec 5, 2019 at 10:44 AM Tim Allison wrote: > In looking at the source code for this (for the first time?)...it looks > like

Re: Wrong language detection in tika server 1.22

2019-12-05 Thread Tim Allison
In looking at the source code for this (for the first time?)...it looks like that endpoint expects UTF-8 text. It does not parse the file and then run lang id on the parsed text. On Thu, Dec 5, 2019 at 6:43 AM Juan Elosua wrote: > Hi all, > > Since this is my first email allow me to give some

Wrong language detection in tika server 1.22

2019-12-05 Thread Juan Elosua
Hi all, Since this is my first email allow me to give some context: my name is Juan Elosua and I have come across tika for document parsing for an information security project we are working on. First of all sorry if this is not the way to send potential issues along but I was unsure how to