[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672520#comment-16672520
]
Hans Brende edited comment on TIKA-2771 at 11/2/18 3:19 AM:
Just had another
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672520#comment-16672520
]
Hans Brende commented on TIKA-2771:
---
Just had another thought: when the input filter is enabled, it
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672134#comment-16672134
]
Hans Brende edited comment on TIKA-2771 at 11/1/18 9:47 PM:
I mean, because
[
https://issues.apache.org/jira/browse/TIKA-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672233#comment-16672233
]
Tim Allison commented on TIKA-2769:
---
Until we can support glossary documents in POI, I added a check+log
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672203#comment-16672203
]
Hans Brende commented on TIKA-2771:
---
(Source:
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672196#comment-16672196
]
Hans Brende commented on TIKA-2771:
---
Oh... and probably the best hint of all that this is not IBM500 is
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672134#comment-16672134
]
Hans Brende edited comment on TIKA-2771 at 11/1/18 8:52 PM:
I mean, because
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672178#comment-16672178
]
Hans Brende commented on TIKA-2771:
---
One good hint that this is not IBM500 is that *all* of the
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672134#comment-16672134
]
Hans Brende commented on TIKA-2771:
---
I mean, because otherwise, if you're doing n-gram detection for
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672116#comment-16672116
]
Hans Brende edited comment on TIKA-2771 at 11/1/18 8:12 PM:
Not sure if this
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672116#comment-16672116
]
Hans Brende commented on TIKA-2771:
---
Not sure if this is a contributing factor, but peering into the
[
https://issues.apache.org/jira/browse/TIKA-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671965#comment-16671965
]
Tim Allison edited comment on TIKA-2750 at 11/1/18 6:22 PM:
I just attached
[
https://issues.apache.org/jira/browse/TIKA-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671965#comment-16671965
]
Tim Allison commented on TIKA-2750:
---
I just attached the output of counting pairs of "mime" and
[
https://issues.apache.org/jira/browse/TIKA-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2750:
--
Attachment: CC-MAIN-2018-39-mimes-v-detected.zip
> Update regression corpus
>
Hans Brende created TIKA-2771:
-
Summary: enableInputFilter() wrecks charset detection for some
short html documents
Key: TIKA-2771
URL: https://issues.apache.org/jira/browse/TIKA-2771
Project: Tika
Rohan and Tobias,
This isn't quite a question about fuzzing, but I suspect you might
be able to help with this:
https://issues.apache.org/jira/browse/TIKA-2750?focusedCommentId=16671472=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16671472
Cheers,
Tim
[
https://issues.apache.org/jira/browse/TIKA-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671472#comment-16671472
]
Tim Allison commented on TIKA-2750:
---
I'd like to remove "boring" and/or basically duplicative documents
[
https://issues.apache.org/jira/browse/TIKA-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671452#comment-16671452
]
Markus Jelsma commented on TIKA-2760:
-
Hello [~davemeikle],
Of course! I cannot understand why i did
[
https://issues.apache.org/jira/browse/TIKA-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed TIKA-2760.
---
> LinkContentHandler does not report hyperlinks
> -
>
>
[
https://issues.apache.org/jira/browse/TIKA-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved TIKA-2760.
-
Resolution: Not A Problem
> LinkContentHandler does not report hyperlinks
>
[
https://issues.apache.org/jira/browse/TIKA-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kristen Cheung closed TIKA-2770.
Resolution: Fixed
> Convert EnviHeader "map info" from UTM to LatLon
>
Kristen Cheung created TIKA-2770:
Summary: Convert EnviHeader "map info" from UTM to LatLon
Key: TIKA-2770
URL: https://issues.apache.org/jira/browse/TIKA-2770
Project: Tika
Issue Type:
[
https://issues.apache.org/jira/browse/TIKA-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671189#comment-16671189
]
Dave Meikle commented on TIKA-2760:
---
Hi [~markus17],
Looking at the Nutch code I can see that
23 matches
Mail list logo