[jira] [Commented] (TIKA-1363) .mat files not parsing

2014-07-08 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054966#comment-14054966 ] Ken Krugler commented on TIKA-1363: --- For issues like these (where it could be a problem w

[jira] [Commented] (TIKA-1368) Improve the modularity of tika-parsers

2014-07-15 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062024#comment-14062024 ] Ken Krugler commented on TIKA-1368: --- While I also wish there was a better way to get only

[jira] [Commented] (TIKA-1365) Incorrectly MimeType detection for Apache Lucene web site

2014-07-16 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063467#comment-14063467 ] Ken Krugler commented on TIKA-1365: --- Turning the XML parser into a fuzzy parser (like wha

[jira] [Commented] (TIKA-1365) Incorrectly MimeType detection for Apache Lucene web site

2014-07-16 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063744#comment-14063744 ] Ken Krugler commented on TIKA-1365: --- Hi Tyler - the response from fetching http://lucene

[jira] [Commented] (TIKA-1365) Incorrectly MimeType detection for Apache Lucene web site

2014-07-16 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063750#comment-14063750 ] Ken Krugler commented on TIKA-1365: --- Hi Matthias - I agree that HTML's priority should be

[jira] [Commented] (TIKA-1302) Let's run Tika against a large batch of docs nightly

2014-10-20 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14176900#comment-14176900 ] Ken Krugler commented on TIKA-1302: --- Andrew - that sounds amazing! Could you provide an e

[jira] [Assigned] (TIKA-1296) Add case insensitive matching for text/html mime type

2014-11-07 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler reassigned TIKA-1296: - Assignee: Ken Krugler > Add case insensitive matching for text/html mime type > --

[jira] [Commented] (TIKA-1484) Boilerpipe dependency is evil

2014-11-19 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218553#comment-14218553 ] Ken Krugler commented on TIKA-1484: --- 1. I assume you can exclude the Boilerpipe jar from

[jira] [Commented] (TIKA-1551) Building the Tika source code using Java 1.8 causes Build Failure of OSGi Bundle on Windows 8.1 and Ubuntu 14.10.

2015-02-17 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324639#comment-14324639 ] Ken Krugler commented on TIKA-1551: --- Hi Abhinav, Thanks for the report. In order to debu

[jira] [Commented] (TIKA-1551) Building the Tika source code using Java 1.8 causes Build Failure of OSGi Bundle on Windows 8.1 and Ubuntu 14.10.

2015-02-17 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324922#comment-14324922 ] Ken Krugler commented on TIKA-1551: --- The specific error is "java.lang.ClassNotFoundExcept

[jira] [Commented] (TIKA-539) Encoding detection is too biased by encoding in meta tag

2015-03-01 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342524#comment-14342524 ] Ken Krugler commented on TIKA-539: -- Hi Tyler - I see you closed this as fixed, but I don't

[jira] [Commented] (TIKA-465) LanguageIdentifier API enhancements

2015-03-01 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342525#comment-14342525 ] Ken Krugler commented on TIKA-465: -- I'm actually working on a new language detector, so I t

[jira] [Closed] (TIKA-465) LanguageIdentifier API enhancements

2015-03-01 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler closed TIKA-465. Resolution: Won't Fix The change to the API to return more information about the detected languages is still

[jira] [Commented] (TIKA-354) ProfilingHandler should take a length-limiting parameter

2015-03-01 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342529#comment-14342529 ] Ken Krugler commented on TIKA-354: -- Better speed is still important, as a 2x improvement fr

[jira] [Commented] (TIKA-369) Improve accuracy of language detection

2015-03-01 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342531#comment-14342531 ] Ken Krugler commented on TIKA-369: -- Hi Tyler - detection speed is an issue, but Tika also s

[jira] [Reopened] (TIKA-539) Encoding detection is too biased by encoding in meta tag

2015-03-01 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler reopened TIKA-539: -- > Encoding detection is too biased by encoding in meta tag > -

[jira] [Updated] (TIKA-539) Encoding detection is too biased by encoding in meta tag

2015-03-01 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler updated TIKA-539: - Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) > Encoding detection is too biased b

[jira] [Commented] (TIKA-456) Support timeouts for parsers

2015-03-18 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367172#comment-14367172 ] Ken Krugler commented on TIKA-456: -- Re killing a thread - yes, that's not possible to do sa

[jira] [Comment Edited] (TIKA-456) Support timeouts for parsers

2015-03-18 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367172#comment-14367172 ] Ken Krugler edited comment on TIKA-456 at 3/18/15 2:18 PM: --- Re kil

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-20 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371397#comment-14371397 ] Ken Krugler commented on TIKA-1581: --- This is used by the source code parser that was rece

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-20 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371415#comment-14371415 ] Ken Krugler commented on TIKA-1581: --- BTW, I've contacted i...@uwyn.com to see if they can

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-20 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371435#comment-14371435 ] Ken Krugler commented on TIKA-1581: --- Hi Steve - good call, I've created an issue at the G

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-20 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371458#comment-14371458 ] Ken Krugler commented on TIKA-1581: --- I heard back from Geert - he says: bq. It's been ag

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-20 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371659#comment-14371659 ] Ken Krugler commented on TIKA-1581: --- Turns out the fork at GitHub wasn't based on the lat

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-26 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14382819#comment-14382819 ] Ken Krugler commented on TIKA-1581: --- See https://github.com/codelibs/jhighlight/issues/4

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-26 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383227#comment-14383227 ] Ken Krugler commented on TIKA-1581: --- Also it seems like we'll need to do some extra work

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-26 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383228#comment-14383228 ] Ken Krugler commented on TIKA-1581: --- Hi Hong-Thai, As per my comment below, can you take

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-28 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385331#comment-14385331 ] Ken Krugler commented on TIKA-1581: --- Hi Tyler, JHighlight has been updated in Central, a

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-28 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385346#comment-14385346 ] Ken Krugler commented on TIKA-1581: --- Based on what I see in other projects (e.g. the Luce

[jira] [Commented] (TIKA-456) Support timeouts for parsers

2015-04-07 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484026#comment-14484026 ] Ken Krugler commented on TIKA-456: -- Hi Tim, Yes, I'm interested in integrating Tika into C

[jira] [Commented] (TIKA-1519) Don't allow whatever is in http-equiv Content-Type to overwrite actual Content-Type in HtmlParser

2015-04-09 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487696#comment-14487696 ] Ken Krugler commented on TIKA-1519: --- After thinking about this more, I don't think it's a

[jira] [Created] (TIKA-1599) Switch from TagSoup to JSoup

2015-04-08 Thread Ken Krugler (JIRA)
Ken Krugler created TIKA-1599: - Summary: Switch from TagSoup to JSoup Key: TIKA-1599 URL: https://issues.apache.org/jira/browse/TIKA-1599 Project: Tika Issue Type: Improvement Component

[jira] [Commented] (TIKA-1599) Switch from TagSoup to JSoup

2015-04-08 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486293#comment-14486293 ] Ken Krugler commented on TIKA-1599: --- It also might be interesting to try both TagSoup and

[jira] [Created] (TIKA-1606) Tika has a dependency on a very old version of Guava

2015-04-15 Thread Ken Krugler (JIRA)
Ken Krugler created TIKA-1606: - Summary: Tika has a dependency on a very old version of Guava Key: TIKA-1606 URL: https://issues.apache.org/jira/browse/TIKA-1606 Project: Tika Issue Type: Improve

[jira] [Updated] (TIKA-1606) Tika has a dependency on a very old version of Guava

2015-04-15 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler updated TIKA-1606: -- Description: I've run into one problem while testing Tika 1.8-rc2 with Bixo It involves a dependency iss

[jira] [Commented] (TIKA-1606) Tika has a dependency on a very old version of Guava

2015-04-15 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497184#comment-14497184 ] Ken Krugler commented on TIKA-1606: --- Hi Lewis - yes, CDM depends on Guava 17.0, as per my

[jira] [Assigned] (TIKA-1606) Tika has a dependency on a very old version of Guava

2015-04-19 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler reassigned TIKA-1606: - Assignee: Ken Krugler > Tika has a dependency on a very old version of Guava > ---

[jira] [Commented] (TIKA-1606) Tika has a dependency on a very old version of Guava

2015-04-19 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502163#comment-14502163 ] Ken Krugler commented on TIKA-1606: --- Hi ~lewismc - thanks, I committed (rev 1674706). Th

[jira] [Comment Edited] (TIKA-1606) Tika has a dependency on a very old version of Guava

2015-04-19 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502163#comment-14502163 ] Ken Krugler edited comment on TIKA-1606 at 4/19/15 11:32 PM: - H

[jira] [Resolved] (TIKA-1606) Tika has a dependency on a very old version of Guava

2015-04-19 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler resolved TIKA-1606. --- Resolution: Fixed Fixed in revision 1674706 > Tika has a dependency on a very old version of Guava > -

[jira] [Commented] (TIKA-241) Rar archive support

2015-04-27 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514496#comment-14514496 ] Ken Krugler commented on TIKA-241: -- Hi Gil, Sorry, not sure what you mean by "as BR/R". R

[jira] [Assigned] (TIKA-1624) Syntax error in DOAP file release section

2015-05-08 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler reassigned TIKA-1624: - Assignee: Ken Krugler > Syntax error in DOAP file release section > --

[jira] [Commented] (TIKA-1624) Syntax error in DOAP file release section

2015-05-08 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535097#comment-14535097 ] Ken Krugler commented on TIKA-1624: --- I've fixed the formatting, and added the missing ent

[jira] [Comment Edited] (TIKA-1624) Syntax error in DOAP file release section

2015-05-08 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535097#comment-14535097 ] Ken Krugler edited comment on TIKA-1624 at 5/8/15 5:55 PM: --- I've

[jira] [Commented] (TIKA-1624) Syntax error in DOAP file release section

2015-05-14 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544193#comment-14544193 ] Ken Krugler commented on TIKA-1624: --- As per Chris Mattmann's email, "You should only have

[jira] [Closed] (TIKA-1624) Syntax error in DOAP file release section

2015-05-14 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler closed TIKA-1624. - Resolution: Done With Tyler's change to the release procedure doc on the wiki (https://wiki.apache.org/tik

[jira] [Commented] (TIKA-1675) please avoid xmlbeans dependency

2015-07-08 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619365#comment-14619365 ] Ken Krugler commented on TIKA-1675: --- Not sure why the above discussion is being classifie

[jira] [Commented] (TIKA-1696) Language Identification with Text Processing Toolkit from MITLL

2015-07-23 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639257#comment-14639257 ] Ken Krugler commented on TIKA-1696: --- Hi Paul - see https://issues.apache.org/jira/browse/

[jira] [Created] (TIKA-1723) Integrate language-detector into Tika

2015-08-27 Thread Ken Krugler (JIRA)
Ken Krugler created TIKA-1723: - Summary: Integrate language-detector into Tika Key: TIKA-1723 URL: https://issues.apache.org/jira/browse/TIKA-1723 Project: Tika Issue Type: Improvement Affect

[jira] [Updated] (TIKA-1723) Integrate language-detector into Tika

2015-08-27 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler updated TIKA-1723: -- Attachment: TIKA-1723.patch > Integrate language-detector into Tika > ---

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2015-08-27 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717768#comment-14717768 ] Ken Krugler commented on TIKA-1723: --- Part of this work is looking to make the API for lan

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2015-08-27 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717772#comment-14717772 ] Ken Krugler commented on TIKA-1723: --- The above work added the language-detector dependenc

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2015-08-27 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717775#comment-14717775 ] Ken Krugler commented on TIKA-1723: --- There are a number of TODO comments in the code, man

[jira] [Updated] (TIKA-1723) Integrate language-detector into Tika

2015-08-27 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler updated TIKA-1723: -- Component/s: languageidentifier > Integrate language-detector into Tika > ---

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2015-08-28 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720675#comment-14720675 ] Ken Krugler commented on TIKA-1723: --- Hi Tim - thanks for the fast review. 1. Re confiden

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2015-08-28 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720700#comment-14720700 ] Ken Krugler commented on TIKA-1723: --- Hi Tim - re putting language detection into the hand

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2015-08-28 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720702#comment-14720702 ] Ken Krugler commented on TIKA-1723: --- I've also been thinking about how to use lang=xx and

[jira] [Commented] (TIKA-369) Improve accuracy of language detection

2015-08-29 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721159#comment-14721159 ] Ken Krugler commented on TIKA-369: -- Initial results from integrating language-detector (see

[jira] [Assigned] (TIKA-856) Support CJK (Chinese, Japanese and Korean) language detection

2015-08-29 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler reassigned TIKA-856: Assignee: Ken Krugler > Support CJK (Chinese, Japanese and Korean) language detection > -

[jira] [Updated] (TIKA-1723) Integrate language-detector into Tika

2015-09-01 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler updated TIKA-1723: -- Attachment: TIKA-1723-2.patch Version 2 of my patch (not be be confused with Tim's patch, which is about

[jira] [Comment Edited] (TIKA-1723) Integrate language-detector into Tika

2015-09-01 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726205#comment-14726205 ] Ken Krugler edited comment on TIKA-1723 at 9/1/15 9:46 PM: --- Versi

[jira] [Updated] (TIKA-1723) Integrate language-detector into Tika

2015-09-01 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler updated TIKA-1723: -- Attachment: TIKA-1723-3.patch New patch which uses Locale to handle language names (language tags). > In

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2015-09-01 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726266#comment-14726266 ] Ken Krugler commented on TIKA-1723: --- Hi Tim - I just attached a new version of my patch,

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2015-09-03 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729250#comment-14729250 ] Ken Krugler commented on TIKA-1723: --- Regarding the current detection code... I'm going t

[jira] [Assigned] (TIKA-568) Language Detection isReasonablyCertain() hides valuable information

2015-09-03 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler reassigned TIKA-568: Assignee: Ken Krugler > Language Detection isReasonablyCertain() hides valuable information > ---

[jira] [Commented] (TIKA-568) Language Detection isReasonablyCertain() hides valuable information

2015-09-03 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729414#comment-14729414 ] Ken Krugler commented on TIKA-568: -- The new LanguageDetector API has a getRawScore() call o

[jira] [Commented] (TIKA-856) Support CJK (Chinese, Japanese and Korean) language detection

2015-09-03 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729416#comment-14729416 ] Ken Krugler commented on TIKA-856: -- The language-detector project has support for Japanese,

[jira] [Commented] (TIKA-492) Add language identification support for North Sami, Lule Sami and South Sami

2015-09-03 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729427#comment-14729427 ] Ken Krugler commented on TIKA-492: -- Currently the language-detector library I'm integrating

[jira] [Assigned] (TIKA-491) Add language identification support for Norwegian Bokmål and Norwegian Nynorsk

2015-09-03 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler reassigned TIKA-491: Assignee: Ken Krugler > Add language identification support for Norwegian Bokmål and Norwegian Nynors

[jira] [Commented] (TIKA-491) Add language identification support for Norwegian Bokmål and Norwegian Nynorsk

2015-09-03 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729432#comment-14729432 ] Ken Krugler commented on TIKA-491: -- Currently the language-detector library I'm integrating

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2015-09-03 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729588#comment-14729588 ] Ken Krugler commented on TIKA-1723: --- Hi Tim, 1. Not sure about "Make language detection

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2015-09-03 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729595#comment-14729595 ] Ken Krugler commented on TIKA-1723: --- Biggest remaining issue before I commit is how to de

[jira] [Commented] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path

2015-09-21 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901434#comment-14901434 ] Ken Krugler commented on TIKA-1726: --- [~talli...@apache.org] had asked for input on this -

[jira] [Commented] (TIKA-1443) Add a junk text detector to Tika

2015-10-31 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984111#comment-14984111 ] Ken Krugler commented on TIKA-1443: --- Hi [~talli...@apache.org] - I did look at it, and re

[jira] [Commented] (TIKA-1794) TXTParser removes form feed characters

2015-11-16 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006743#comment-15006743 ] Ken Krugler commented on TIKA-1794: --- The output of the Tika parse process is XHTML, and I

[jira] [Commented] (TIKA-1794) TXTParser removes form feed characters

2015-11-16 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006797#comment-15006797 ] Ken Krugler commented on TIKA-1794: --- Tika uses XHTML 1.0, which doesn't allow the form-fe

[jira] [Commented] (TIKA-1808) Head section closed too eager

2015-12-08 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047029#comment-15047029 ] Ken Krugler commented on TIKA-1808: --- Hi Markus - I don't think this is actually a bug. I

[jira] [Commented] (TIKA-1599) Switch from TagSoup to JSoup

2015-12-09 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048773#comment-15048773 ] Ken Krugler commented on TIKA-1599: --- I'm hoping we could use one or the other, as I don't

[jira] [Commented] (TIKA-1599) Switch from TagSoup to JSoup

2015-12-09 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048806#comment-15048806 ] Ken Krugler commented on TIKA-1599: --- Hi [~markus.jel...@openindex.io] - I was actually ta

[jira] [Commented] (TIKA-1599) Switch from TagSoup to JSoup

2015-12-09 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048819#comment-15048819 ] Ken Krugler commented on TIKA-1599: --- I think we'd be wanting to parse the raw crawl resul

[jira] [Commented] (TIKA-1836) Convertion DOC->TXT failed due to POI issue

2016-01-19 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106908#comment-15106908 ] Ken Krugler commented on TIKA-1836: --- This seems to be an issue for POI, as per the messag

[jira] [Commented] (TIKA-1838) Just a quick question regarding compatibility

2016-01-20 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15109054#comment-15109054 ] Ken Krugler commented on TIKA-1838: --- Hi Raymond - this is a question that you should post

[jira] [Assigned] (TIKA-1835) LinkContentHandler skips iframe and rel tags

2016-01-21 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler reassigned TIKA-1835: - Assignee: Ken Krugler > LinkContentHandler skips iframe and rel tags > ---

[jira] [Resolved] (TIKA-1835) LinkContentHandler skips iframe and rel tags

2016-01-21 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ken Krugler resolved TIKA-1835. --- Resolution: Fixed Git commit 489ab93..fe841bc > LinkContentHandler skips iframe and rel tags > ---

[jira] [Comment Edited] (TIKA-1835) LinkContentHandler skips iframe and rel tags

2016-01-21 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1558#comment-1558 ] Ken Krugler edited comment on TIKA-1835 at 1/21/16 7:36 PM: Git

[jira] [Commented] (TIKA-1848) Address issues with Tika 1.12rc#1

2016-02-03 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130666#comment-15130666 ] Ken Krugler commented on TIKA-1848: --- Unless I'm not understanding the issues properly, I

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2016-02-03 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130676#comment-15130676 ] Ken Krugler commented on TIKA-1723: --- [~talli...@apache.org] I must admit, focusing on thi

[jira] [Commented] (TIKA-1824) Tika 2.0 - Create Initial Parser Modules

2016-02-03 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131749#comment-15131749 ] Ken Krugler commented on TIKA-1824: --- As someone who regularly deals with 100s of jars in

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2016-02-04 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132961#comment-15132961 ] Ken Krugler commented on TIKA-1723: --- Good idea re gathering input - I just emailed the de

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-04 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133624#comment-15133624 ] Ken Krugler commented on TIKA-1851: --- Hi [~talli...@apache.org] - I'm also getting a local

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-04 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133629#comment-15133629 ] Ken Krugler commented on TIKA-1851: --- I'm also curious why we have Groovy code and shell s

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-05 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135336#comment-15135336 ] Ken Krugler commented on TIKA-1851: --- I did a top-level "mvn clean install", which failed

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-05 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135342#comment-15135342 ] Ken Krugler commented on TIKA-1851: --- Hmm, now the top-level build fails on the tika parse

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-06 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136003#comment-15136003 ] Ken Krugler commented on TIKA-1851: --- I got a clean build w/o any pre-installed modules, s

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2016-02-06 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136077#comment-15136077 ] Ken Krugler commented on TIKA-1723: --- OK, I've committed this code to a new tika-langdetec

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-06 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136079#comment-15136079 ] Ken Krugler commented on TIKA-1851: --- After poking around a bit, my vote would be to (a) m

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-10 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141632#comment-15141632 ] Ken Krugler commented on TIKA-1851: --- Hi [~talli...@apache.org] - thanks for generating th

[jira] [Commented] (TIKA-1851) Tika 2.0 - Move test resources from core to test-resources

2016-02-12 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15145135#comment-15145135 ] Ken Krugler commented on TIKA-1851: --- +1 for the proposal. Let me know if you want me to t

[jira] [Commented] (TIKA-1858) Unable to extract content from chunked portion of large file

2016-02-17 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150618#comment-15150618 ] Ken Krugler commented on TIKA-1858: --- Hi Raghu, This is a great question for the user mai

[jira] [Commented] (TIKA-1855) TIka 2.0 - Move shared test-code back to tika-core and distribute test files to parser modules

2016-02-24 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15165642#comment-15165642 ] Ken Krugler commented on TIKA-1855: --- I'm ok with having some duplicated test files - thou

  1   2   3   4   >