[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914482#comment-16914482
]
Ken Krugler commented on TIKA-1599:
---
>From TIKA-2928, an example of text that fails with TagSoup but
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-1599:
--
Priority: Major (was: Minor)
> Switch from TagSoup to JSoup
>
>
>
[
https://issues.apache.org/jira/browse/TIKA-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914481#comment-16914481
]
Ken Krugler commented on TIKA-2928:
---
Hi [~Sargent_D] - thanks for trying this out! I'm going to bump the
[
https://issues.apache.org/jira/browse/TIKA-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2928:
--
Issue Type: Improvement (was: Bug)
Priority: Minor (was: Major)
> Less than sign within tag
[
https://issues.apache.org/jira/browse/TIKA-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913382#comment-16913382
]
Ken Krugler commented on TIKA-2928:
---
The issue isn't that this is "somewhat non-standard" HTML - it's
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869004#comment-16869004
]
Ken Krugler commented on TIKA-2790:
---
Hi [~talli...@apache.org] - I finally got around to looking at your
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856107#comment-16856107
]
Ken Krugler commented on TIKA-2790:
---
[~talli...@apache.org] - I'd have to look at the code used to
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856052#comment-16856052
]
Ken Krugler commented on TIKA-2790:
---
Yalder processes the entire string. I thought Optimaize's version
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836738#comment-16836738
]
Ken Krugler commented on TIKA-2790:
---
Hi [~talli...@apache.org] - thanks for running the comparisons.
[
https://issues.apache.org/jira/browse/TIKA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812492#comment-16812492
]
Ken Krugler commented on TIKA-2849:
---
Hi [~boris-petrov] - two things here. First, do you have the call
[
https://issues.apache.org/jira/browse/TIKA-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710767#comment-16710767
]
Ken Krugler commented on TIKA-2794:
---
Hi [~phallett] - it's better if you first post something like this
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707521#comment-16707521
]
Ken Krugler commented on TIKA-2790:
---
Yalder is about 2-2.5x faster than language-detector, depending on
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707343#comment-16707343
]
Ken Krugler commented on TIKA-2790:
---
My concern with OpenNLP is that during a web crawl, even with the
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707292#comment-16707292
]
Ken Krugler commented on TIKA-2790:
---
Hi [~talli...@apache.org] - Is there an issue with the current
[
https://issues.apache.org/jira/browse/TIKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658028#comment-16658028
]
Ken Krugler commented on TIKA-2758:
---
[~markus17] - My comment above was about the previous change (from
[
https://issues.apache.org/jira/browse/TIKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657976#comment-16657976
]
Ken Krugler edited comment on TIKA-2758 at 10/20/18 7:51 PM:
-
At least for the
[
https://issues.apache.org/jira/browse/TIKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657976#comment-16657976
]
Ken Krugler commented on TIKA-2758:
---
At least for the "detroidnews.html" file, I believe the reason why
[
https://issues.apache.org/jira/browse/TIKA-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler resolved TIKA-2683.
---
Resolution: Fixed
Fixed via [PR
[
https://issues.apache.org/jira/browse/TIKA-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler reassigned TIKA-2683:
-
Assignee: Ken Krugler
> Missing space and inappropriate new-line in Boilerpipe extracted text
>
[
https://issues.apache.org/jira/browse/TIKA-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536396#comment-16536396
]
Ken Krugler commented on TIKA-2648:
---
[~wastl-nagel] - you mentioned that you thought this solution was
[
https://issues.apache.org/jira/browse/TIKA-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2671:
--
Description:
org.apache.tika.parser.html.HtmlEncodingDetector ignores the document's
metadata. So when
[
https://issues.apache.org/jira/browse/TIKA-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2671:
--
Component/s: detector
> HtmlEncodingDetector doesnt take provided metadata into account
>
[
https://issues.apache.org/jira/browse/TIKA-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516644#comment-16516644
]
Ken Krugler commented on TIKA-2671:
---
Hi [~gbouchar] - I'm curious how much testing you did, and with
[
https://issues.apache.org/jira/browse/TIKA-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514355#comment-16514355
]
Ken Krugler commented on TIKA-2671:
---
Unfortunately there's no great solution here. Ideally we'd have a
[
https://issues.apache.org/jira/browse/TIKA-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493927#comment-16493927
]
Ken Krugler commented on TIKA-2654:
---
Hi Ankit - for problems encountered while building/using Tika, it's
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482586#comment-16482586
]
Ken Krugler commented on TIKA-2643:
---
When you've got conflicting jars on the classpath, you often run
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481791#comment-16481791
]
Ken Krugler commented on TIKA-2643:
---
Looking at the crash log, I see the following duplicate jars
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481786#comment-16481786
]
Ken Krugler commented on TIKA-2643:
---
Hi [~fyemaple] - how do you know that Tika 1.5 (or any of the jars
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479468#comment-16479468
]
Ken Krugler commented on TIKA-2643:
---
[~fyemaple] - yes, but note that {{kill -QUIT doesn't kill the
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477811#comment-16477811
]
Ken Krugler commented on TIKA-2643:
---
[~talli...@apache.org] - different versions of framework jars, I'd
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477513#comment-16477513
]
Ken Krugler commented on TIKA-2643:
---
If I was going to guess, it's that your Cloudera installation has
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384242#comment-16384242
]
Ken Krugler commented on TIKA-2592:
---
[~AndreasMeier] - I assume when you said:
{quote}I don't think we
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2592:
--
Attachment: IANA Charset names.txt
> HTML with charset unicode handled as utf-16 instead utf-8
>
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2592:
--
Priority: Minor (was: Major)
> HTML with charset unicode handled as utf-16 instead utf-8
>
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2592:
--
Issue Type: Improvement (was: Bug)
> HTML with charset unicode handled as utf-16 instead utf-8
>
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382330#comment-16382330
]
Ken Krugler commented on TIKA-2592:
---
Before making this kind of change (default "unicode" to UTF-8),
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380874#comment-16380874
]
Ken Krugler commented on TIKA-2592:
---
Hi [~AndreasMeier] - actually "unicode" is a supported charset name
[
https://issues.apache.org/jira/browse/TIKA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379747#comment-16379747
]
Ken Krugler commented on TIKA-2576:
---
[~talli...@mitre.org] - After some grepping, I found the Jira issue
[
https://issues.apache.org/jira/browse/TIKA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377744#comment-16377744
]
Ken Krugler commented on TIKA-2576:
---
Is this going to trigger more warnings in the logs? :)
{code:java}
[
https://issues.apache.org/jira/browse/TIKA-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler resolved TIKA-2539.
---
Resolution: Duplicate
> TagSoup HTML parser is project EOL
> --
>
>
[
https://issues.apache.org/jira/browse/TIKA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215838#comment-16215838
]
Ken Krugler commented on TIKA-2478:
---
Hi [~talli...@apache.org] - I've attached two mixed examples I'd
[
https://issues.apache.org/jira/browse/TIKA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2478:
--
Attachment: mixed-simple
mixed-with-pdf-inline
> MBOX import includes redundant copies
[
https://issues.apache.org/jira/browse/TIKA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214491#comment-16214491
]
Ken Krugler commented on TIKA-2478:
---
I recently had to dig into extracting text from emails, and it isn't
[
https://issues.apache.org/jira/browse/TIKA-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213150#comment-16213150
]
Ken Krugler commented on TIKA-2471:
---
Hi [~talli...@apache.org] - I don't think using MBoxIterator is the
[
https://issues.apache.org/jira/browse/TIKA-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212870#comment-16212870
]
Ken Krugler commented on TIKA-2482:
---
Hi [~cermar] - in general it's best to first post this type of issue
[
https://issues.apache.org/jira/browse/TIKA-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195386#comment-16195386
]
Ken Krugler commented on TIKA-2472:
---
I had to deal with this before in another project - FWIR, I
[
https://issues.apache.org/jira/browse/TIKA-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423280#comment-15423280
]
Ken Krugler commented on TIKA-2056:
---
Hi [~chrismattmann] - I haven't actually dealt with the ForkParser
[
https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2038:
--
Description:
Currently, Tika uses icu4j for detecting charset encoding of HTML documents as
well as the
[
https://issues.apache.org/jira/browse/TIKA-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378434#comment-15378434
]
Ken Krugler commented on TIKA-2033:
---
Yes, of course...I was thinking of whether we'd want to extract it
[
https://issues.apache.org/jira/browse/TIKA-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378358#comment-15378358
]
Ken Krugler commented on TIKA-2033:
---
Do you have a suggestion for how the text should appear in the
[
https://issues.apache.org/jira/browse/TIKA-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331829#comment-15331829
]
Ken Krugler commented on TIKA-2010:
---
Would it be possible for you to try this broken HTML with JSoup?
[
https://issues.apache.org/jira/browse/TIKA-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler closed TIKA-1938.
-
Resolution: Fixed
Fix with commit da5bbbe..46d5775.
Thanks Joseph!
> HtmlParser drops elements found
[
https://issues.apache.org/jira/browse/TIKA-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler reassigned TIKA-1938:
-
Assignee: Ken Krugler
> HtmlParser drops elements found inside
>
[
https://issues.apache.org/jira/browse/TIKA-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227078#comment-15227078
]
Ken Krugler commented on TIKA-1835:
---
I’d rolled in Markus’s patch directly to support these other link
[
https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218412#comment-15218412
]
Ken Krugler commented on TIKA-1896:
---
Hi Tim - hmm, changing the type of the script tag from cdata to
[
https://issues.apache.org/jira/browse/TIKA-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167891#comment-15167891
]
Ken Krugler commented on TIKA-1855:
---
The things I don't like about this approach are that (a) core
[
https://issues.apache.org/jira/browse/TIKA-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15165642#comment-15165642
]
Ken Krugler commented on TIKA-1855:
---
I'm ok with having some duplicated test files - though for most of
[
https://issues.apache.org/jira/browse/TIKA-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150618#comment-15150618
]
Ken Krugler commented on TIKA-1858:
---
Hi Raghu,
This is a great question for the user mailing list (see
[
https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145135#comment-15145135
]
Ken Krugler commented on TIKA-1851:
---
+1 for the proposal. Let me know if you want me to take a swing at
[
https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141632#comment-15141632
]
Ken Krugler commented on TIKA-1851:
---
Hi [~talli...@apache.org] - thanks for generating this output.
[
https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136003#comment-15136003
]
Ken Krugler commented on TIKA-1851:
---
I got a clean build w/o any pre-installed modules, so much better,
[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136077#comment-15136077
]
Ken Krugler commented on TIKA-1723:
---
OK, I've committed this code to a new tika-langdetect module in the
[
https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136079#comment-15136079
]
Ken Krugler commented on TIKA-1851:
---
After poking around a bit, my vote would be to (a) move the test
[
https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135342#comment-15135342
]
Ken Krugler commented on TIKA-1851:
---
Hmm, now the top-level build fails on the tika parser text module,
[
https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135336#comment-15135336
]
Ken Krugler commented on TIKA-1851:
---
I did a top-level "mvn clean install", which failed with:
[ERROR]
[
https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133624#comment-15133624
]
Ken Krugler commented on TIKA-1851:
---
Hi [~talli...@apache.org] - I'm also getting a local build failure
[
https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133629#comment-15133629
]
Ken Krugler commented on TIKA-1851:
---
I'm also curious why we have Groovy code and shell scripts inside of
[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132961#comment-15132961
]
Ken Krugler commented on TIKA-1723:
---
Good idea re gathering input - I just emailed the dev list.
>
[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130676#comment-15130676
]
Ken Krugler commented on TIKA-1723:
---
[~talli...@apache.org] I must admit, focusing on this change in 2.0,
[
https://issues.apache.org/jira/browse/TIKA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131749#comment-15131749
]
Ken Krugler commented on TIKA-1824:
---
As someone who regularly deals with 100s of jars in the dependency
[
https://issues.apache.org/jira/browse/TIKA-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130666#comment-15130666
]
Ken Krugler commented on TIKA-1848:
---
Unless I'm not understanding the issues properly, I agree with the
[
https://issues.apache.org/jira/browse/TIKA-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler resolved TIKA-1835.
---
Resolution: Fixed
Git commit 489ab93..fe841bc
> LinkContentHandler skips iframe and rel tags
>
[
https://issues.apache.org/jira/browse/TIKA-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1558#comment-1558
]
Ken Krugler edited comment on TIKA-1835 at 1/21/16 7:36 PM:
Git commit
[
https://issues.apache.org/jira/browse/TIKA-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler reassigned TIKA-1835:
-
Assignee: Ken Krugler
> LinkContentHandler skips iframe and rel tags
>
[
https://issues.apache.org/jira/browse/TIKA-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109054#comment-15109054
]
Ken Krugler commented on TIKA-1838:
---
Hi Raymond - this is a question that you should post on the Tika
[
https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106908#comment-15106908
]
Ken Krugler commented on TIKA-1836:
---
This seems to be an issue for POI, as per the message in the stack
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048773#comment-15048773
]
Ken Krugler commented on TIKA-1599:
---
I'm hoping we could use one or the other, as I don't know how a Tika
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048806#comment-15048806
]
Ken Krugler commented on TIKA-1599:
---
Hi [~markus.jel...@openindex.io] - I was actually talking about how
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048819#comment-15048819
]
Ken Krugler commented on TIKA-1599:
---
I think we'd be wanting to parse the raw crawl results (with both
[
https://issues.apache.org/jira/browse/TIKA-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047029#comment-15047029
]
Ken Krugler commented on TIKA-1808:
---
Hi Markus - I don't think this is actually a bug. I created a
[
https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006797#comment-15006797
]
Ken Krugler commented on TIKA-1794:
---
Tika uses XHTML 1.0, which doesn't allow the form-feed character.
[
https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006743#comment-15006743
]
Ken Krugler commented on TIKA-1794:
---
The output of the Tika parse process is XHTML, and I don't believe a
[
https://issues.apache.org/jira/browse/TIKA-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984111#comment-14984111
]
Ken Krugler commented on TIKA-1443:
---
Hi [~talli...@apache.org] - I did look at it, and realized I wanted
[
https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901434#comment-14901434
]
Ken Krugler commented on TIKA-1726:
---
[~talli...@apache.org] had asked for input on this - I don't have
[
https://issues.apache.org/jira/browse/TIKA-568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729414#comment-14729414
]
Ken Krugler commented on TIKA-568:
--
The new LanguageDetector API has a getRawScore() call on the result,
[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729588#comment-14729588
]
Ken Krugler commented on TIKA-1723:
---
Hi Tim,
1. Not sure about "Make language detection configurable via
[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729595#comment-14729595
]
Ken Krugler commented on TIKA-1723:
---
Biggest remaining issue before I commit is how to deal with language
[
https://issues.apache.org/jira/browse/TIKA-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler reassigned TIKA-568:
Assignee: Ken Krugler
> Language Detection isReasonablyCertain() hides valuable information
>
[
https://issues.apache.org/jira/browse/TIKA-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler reassigned TIKA-856:
Assignee: Ken Krugler
Support CJK (Chinese, Japanese and Korean) language detection
[
https://issues.apache.org/jira/browse/TIKA-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721159#comment-14721159
]
Ken Krugler commented on TIKA-369:
--
Initial results from integrating language-detector (see
[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720702#comment-14720702
]
Ken Krugler commented on TIKA-1723:
---
I've also been thinking about how to use lang=xx and
Ken Krugler created TIKA-1723:
-
Summary: Integrate language-detector into Tika
Key: TIKA-1723
URL: https://issues.apache.org/jira/browse/TIKA-1723
Project: Tika
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-1723:
--
Attachment: TIKA-1723.patch
Integrate language-detector into Tika
[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717772#comment-14717772
]
Ken Krugler commented on TIKA-1723:
---
The above work added the language-detector
[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-1723:
--
Component/s: languageidentifier
Integrate language-detector into Tika
[
https://issues.apache.org/jira/browse/TIKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639257#comment-14639257
]
Ken Krugler commented on TIKA-1696:
---
Hi Paul - see
[
https://issues.apache.org/jira/browse/TIKA-1675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619365#comment-14619365
]
Ken Krugler commented on TIKA-1675:
---
Not sure why the above discussion is being
[
https://issues.apache.org/jira/browse/TIKA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler closed TIKA-1624.
-
Resolution: Done
With Tyler's change to the release procedure doc on the wiki
[
https://issues.apache.org/jira/browse/TIKA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544193#comment-14544193
]
Ken Krugler commented on TIKA-1624:
---
As per Chris Mattmann's email, You should only have
[
https://issues.apache.org/jira/browse/TIKA-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler reassigned TIKA-1624:
-
Assignee: Ken Krugler
Syntax error in DOAP file release section
1 - 100 of 280 matches
Mail list logo