[jira] [Created] (TIKA-3151) Update jaxb-runtime and remove activation dependencies & exclusions

2020-07-29 Thread Hans Brende (Jira)
Hans Brende created TIKA-3151: - Summary: Update jaxb-runtime and remove activation dependencies & exclusions Key: TIKA-3151 URL: https://issues.apache.org/jira/browse/TIKA-3151 Project: Tika

[jira] [Commented] (TIKA-2819) Update jaxb & activation

2019-02-02 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759255#comment-16759255 ] Hans Brende commented on TIKA-2819: --- [~talli...@apache.org] You're welcome! However, it looks like you

[jira] [Created] (TIKA-2819) Update jaxb & activation

2019-01-22 Thread Hans Brende (JIRA)
Hans Brende created TIKA-2819: - Summary: Update jaxb & activation Key: TIKA-2819 URL: https://issues.apache.org/jira/browse/TIKA-2819 Project: Tika Issue Type: Improvement Components:

[jira] [Commented] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents

2018-12-16 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722543#comment-16722543 ] Hans Brende commented on TIKA-2038: --- [~faghani] Glad to hear that my hypothesis was correct, and that F8

[jira] [Comment Edited] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents

2018-11-26 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698539#comment-16698539 ] Hans Brende edited comment on TIKA-2038 at 11/26/18 2:38 PM: - The success of

[jira] [Comment Edited] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents

2018-11-25 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698513#comment-16698513 ] Hans Brende edited comment on TIKA-2038 at 11/26/18 7:10 AM: - Here's a more 

[jira] [Commented] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents

2018-11-25 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698539#comment-16698539 ] Hans Brende commented on TIKA-2038: --- The success of this IUST implementation (even if based on

[jira] [Commented] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents

2018-11-25 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698513#comment-16698513 ] Hans Brende commented on TIKA-2038: --- Here's a more rigorous demonstration of my claim (by

[jira] [Commented] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents

2018-11-25 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698497#comment-16698497 ] Hans Brende commented on TIKA-2038: --- As sort of a sanity check on my part, I wanted to make sure that

[jira] [Comment Edited] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents

2018-11-25 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698305#comment-16698305 ] Hans Brende edited comment on TIKA-2038 at 11/26/18 4:54 AM: - [~faghani]

[jira] [Commented] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents

2018-11-25 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698305#comment-16698305 ] Hans Brende commented on TIKA-2038: --- [~faghani] Thanks for the response! If my understanding of the

[jira] [Comment Edited] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents

2018-11-21 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694940#comment-16694940 ] Hans Brende edited comment on TIKA-2038 at 11/21/18 5:05 PM: - Alternatively,

[jira] [Comment Edited] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents

2018-11-21 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694940#comment-16694940 ] Hans Brende edited comment on TIKA-2038 at 11/21/18 5:03 PM: - Alternatively,

[jira] [Commented] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents

2018-11-21 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694940#comment-16694940 ] Hans Brende commented on TIKA-2038: --- Alternatively, you could use guava's

[jira] [Commented] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents

2018-11-19 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692478#comment-16692478 ] Hans Brende commented on TIKA-2038: --- Oh, and one small detail I forgot to mention: jchardet also counted

[jira] [Comment Edited] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents

2018-11-19 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692323#comment-16692323 ] Hans Brende edited comment on TIKA-2038 at 11/19/18 10:28 PM: -- [~faghani]

[jira] [Commented] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents

2018-11-19 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692323#comment-16692323 ] Hans Brende commented on TIKA-2038: --- [~faghani] [~talli...@apache.org] This issue inspired me to look

[jira] [Commented] (TIKA-2778) Upgrade jaxb-runtime and javax.activation

2018-11-13 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686068#comment-16686068 ] Hans Brende commented on TIKA-2778: --- [~ffang] No, that dependency is from

[jira] [Commented] (TIKA-2778) Upgrade jaxb-runtime and javax.activation

2018-11-13 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686061#comment-16686061 ] Hans Brende commented on TIKA-2778: --- [~talli...@apache.org] +1, I think manually excluding

[jira] [Commented] (TIKA-2778) Upgrade jaxb-runtime and javax.activation

2018-11-13 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685996#comment-16685996 ] Hans Brende commented on TIKA-2778: --- [~talli...@apache.org] would you mind posting the full stack trace

[jira] [Commented] (TIKA-2778) Upgrade jaxb-runtime and javax.activation

2018-11-13 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685969#comment-16685969 ] Hans Brende commented on TIKA-2778: --- [~talli...@apache.org] Well, that's a bummer. I'm not 100% sure

[jira] [Commented] (TIKA-2778) Upgrade jaxb-runtime and javax.activation

2018-11-13 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685865#comment-16685865 ] Hans Brende commented on TIKA-2778: --- Yay! deleting two dependencies from pom == successful day >

[jira] [Commented] (TIKA-2778) Upgrade jaxb-runtime and javax.activation

2018-11-13 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685854#comment-16685854 ] Hans Brende commented on TIKA-2778: --- [~talli...@apache.org] Did upgrading to jaxb-runtime 2.3.1 do the

[jira] [Commented] (TIKA-2778) Upgrade jaxb-runtime and javax.activation

2018-11-13 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685764#comment-16685764 ] Hans Brende commented on TIKA-2778: --- [~talli...@apache.org] Sorry, commented before seeing your comment.

[jira] [Commented] (TIKA-2778) Upgrade jaxb-runtime and javax.activation

2018-11-13 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685763#comment-16685763 ] Hans Brende commented on TIKA-2778: --- [~talli...@apache.org] no worries! As regards this issue, do you

[jira] [Updated] (TIKA-2778) Upgrade jaxb-runtime and javax.activation

2018-11-09 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hans Brende updated TIKA-2778: -- Description: The latest version of org.glassfish.jaxb:jaxb-runtime is 2.3.1, which fixes a few issues

[jira] [Updated] (TIKA-2778) Upgrade jaxb-runtime and javax.activation

2018-11-09 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hans Brende updated TIKA-2778: -- Description: The latest version of org.glassfish.jaxb:jaxb-runtime is 2.3.1, which fixes a few issues

[jira] [Commented] (TIKA-2743) Replace com.sun.xml.bind:jaxb-impl and jaxb-core by org.glassfish.jaxb:jaxb-runtime and jaxb-core

2018-11-09 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682066#comment-16682066 ] Hans Brende commented on TIKA-2743: --- I created a new issue for that: TIKA-2778 > Replace

[jira] [Created] (TIKA-2778) Upgrade jaxb-runtime and javax.activation

2018-11-09 Thread Hans Brende (JIRA)
Hans Brende created TIKA-2778: - Summary: Upgrade jaxb-runtime and javax.activation Key: TIKA-2778 URL: https://issues.apache.org/jira/browse/TIKA-2778 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-2743) Replace com.sun.xml.bind:jaxb-impl and jaxb-core by org.glassfish.jaxb:jaxb-runtime and jaxb-core

2018-11-09 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681936#comment-16681936 ] Hans Brende commented on TIKA-2743: --- [~talli...@apache.org] [~thetaphi] Oh, also! It looks like there

[jira] [Commented] (TIKA-2743) Replace com.sun.xml.bind:jaxb-impl and jaxb-core by org.glassfish.jaxb:jaxb-runtime and jaxb-core

2018-11-09 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681902#comment-16681902 ] Hans Brende commented on TIKA-2743: --- [~talli...@apache.org] Runtime scope should theoretically still

[jira] [Comment Edited] (TIKA-2743) Replace com.sun.xml.bind:jaxb-impl and jaxb-core by org.glassfish.jaxb:jaxb-runtime and jaxb-core

2018-11-09 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681902#comment-16681902 ] Hans Brende edited comment on TIKA-2743 at 11/9/18 8:07 PM:

[jira] [Commented] (TIKA-2743) Replace com.sun.xml.bind:jaxb-impl and jaxb-core by org.glassfish.jaxb:jaxb-runtime and jaxb-core

2018-11-08 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680749#comment-16680749 ] Hans Brende commented on TIKA-2743: --- [~talli...@apache.org] shouldn't jaxb-runtime have {{runtime}},

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-08 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680557#comment-16680557 ] Hans Brende commented on TIKA-2771: --- [~talli...@apache.org] Great! I will definitely check that out.

[jira] [Comment Edited] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-08 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680378#comment-16680378 ] Hans Brende edited comment on TIKA-2771 at 11/8/18 9:26 PM:

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-08 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680378#comment-16680378 ] Hans Brende commented on TIKA-2771: --- [~talli...@apache.org] Does Tika have a corpus of documents paired

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-06 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677340#comment-16677340 ] Hans Brende commented on TIKA-2771: --- [~talli...@apache.org] I've implemented my ideas for charset

[jira] [Comment Edited] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-06 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676887#comment-16676887 ] Hans Brende edited comment on TIKA-2771 at 11/6/18 9:00 PM: Compare to the

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-06 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676926#comment-16676926 ] Hans Brende commented on TIKA-2771: --- One thing I am sure of, however, is that if your chances of getting

[jira] [Comment Edited] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-06 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676887#comment-16676887 ] Hans Brende edited comment on TIKA-2771 at 11/6/18 3:33 PM: Compare to the

[jira] [Comment Edited] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-06 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676887#comment-16676887 ] Hans Brende edited comment on TIKA-2771 at 11/6/18 3:31 PM: Compare to the

[jira] [Comment Edited] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-06 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676887#comment-16676887 ] Hans Brende edited comment on TIKA-2771 at 11/6/18 3:23 PM: Compare to the

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-06 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676887#comment-16676887 ] Hans Brende commented on TIKA-2771: --- Compare to the following analogous test for ISO-8859-1 variants:

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-05 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676109#comment-16676109 ] Hans Brende commented on TIKA-2771: --- [~talli...@apache.org] I did a little experimentation with each of

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-05 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675880#comment-16675880 ] Hans Brende commented on TIKA-2771: --- [~wave] Yep, just ran the following {code:java} IntStream.range(0,

[jira] [Comment Edited] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-05 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675828#comment-16675828 ] Hans Brende edited comment on TIKA-2771 at 11/5/18 10:44 PM: -

[jira] [Comment Edited] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-05 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675828#comment-16675828 ] Hans Brende edited comment on TIKA-2771 at 11/5/18 10:44 PM: -

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-05 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675828#comment-16675828 ] Hans Brende commented on TIKA-2771: --- [~talli...@apache.org] Ah, you're correct as regards the byteMap.

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-05 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675708#comment-16675708 ] Hans Brende commented on TIKA-2771: --- [~talli...@apache.org] I'm not sure which all of the charsets are

[jira] [Updated] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-02 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hans Brende updated TIKA-2771: -- Description: When I try to run the CharsetDetector on

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-02 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673481#comment-16673481 ] Hans Brende commented on TIKA-2771: --- (Also relating to my last thought, on the subject of "waiting for

[jira] [Comment Edited] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-02 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673429#comment-16673429 ] Hans Brende edited comment on TIKA-2771 at 11/2/18 5:09 PM: (For my last

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-02 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673429#comment-16673429 ] Hans Brende commented on TIKA-2771: --- (For my last thought, I'd recommend taking a look at this:

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-02 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673400#comment-16673400 ] Hans Brende commented on TIKA-2771: --- [~talli...@apache.org] I totally understand not wanting to modify

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-02 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673215#comment-16673215 ] Hans Brende commented on TIKA-2771: --- [~talli...@apache.org] IBM500 (a.k.a. EBCDIC 500) is an EBCDIC

[jira] [Comment Edited] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-01 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672520#comment-16672520 ] Hans Brende edited comment on TIKA-2771 at 11/2/18 3:19 AM: Just had another

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-01 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672520#comment-16672520 ] Hans Brende commented on TIKA-2771: --- Just had another thought: when the input filter is enabled, it

[jira] [Comment Edited] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-01 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672134#comment-16672134 ] Hans Brende edited comment on TIKA-2771 at 11/1/18 9:47 PM: I mean, because

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-01 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672203#comment-16672203 ] Hans Brende commented on TIKA-2771: --- (Source:

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-01 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672196#comment-16672196 ] Hans Brende commented on TIKA-2771: --- Oh... and probably the best hint of all that this is not IBM500 is

[jira] [Comment Edited] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-01 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672134#comment-16672134 ] Hans Brende edited comment on TIKA-2771 at 11/1/18 8:52 PM: I mean, because

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-01 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672178#comment-16672178 ] Hans Brende commented on TIKA-2771: --- One good hint that this is not IBM500 is that *all* of the

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-01 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672134#comment-16672134 ] Hans Brende commented on TIKA-2771: --- I mean, because otherwise, if you're doing n-gram detection for

[jira] [Comment Edited] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-01 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672116#comment-16672116 ] Hans Brende edited comment on TIKA-2771 at 11/1/18 8:12 PM: Not sure if this

[jira] [Commented] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-01 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672116#comment-16672116 ] Hans Brende commented on TIKA-2771: --- Not sure if this is a contributing factor, but peering into the

[jira] [Created] (TIKA-2771) enableInputFilter() wrecks charset detection for some short html documents

2018-11-01 Thread Hans Brende (JIRA)
Hans Brende created TIKA-2771: - Summary: enableInputFilter() wrecks charset detection for some short html documents Key: TIKA-2771 URL: https://issues.apache.org/jira/browse/TIKA-2771 Project: Tika

[jira] [Updated] (TIKA-2690) Exclude commons-logging & commons-logging-api from uimafit-core

2018-07-18 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hans Brende updated TIKA-2690: -- Description: Exclude commons-logging and commons-logging-api from {{uimafit-core}} dependencies. As

[jira] [Created] (TIKA-2690) Exclude commons-logging & commons-logging-api from uimafit-core

2018-07-18 Thread Hans Brende (JIRA)
Hans Brende created TIKA-2690: - Summary: Exclude commons-logging & commons-logging-api from uimafit-core Key: TIKA-2690 URL: https://issues.apache.org/jira/browse/TIKA-2690 Project: Tika Issue