[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676109#comment-16676109
]
Hans Brende commented on TIKA-2771:
---
[~talli...@apache.org] I did a little experimentation with each of
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675880#comment-16675880
]
Hans Brende commented on TIKA-2771:
---
[~wave] Yep, just ran the following
{code:java}
IntStream.range(0,
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675839#comment-16675839
]
Tim Allison commented on TIKA-2771:
---
I was thinking something similar...
> enableInputFilter() wrecks
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675828#comment-16675828
]
Hans Brende edited comment on TIKA-2771 at 11/5/18 10:44 PM:
-
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675828#comment-16675828
]
Hans Brende edited comment on TIKA-2771 at 11/5/18 10:44 PM:
-
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675828#comment-16675828
]
Hans Brende commented on TIKA-2771:
---
[~talli...@apache.org] Ah, you're correct as regards the byteMap.
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675740#comment-16675740
]
Tim Allison commented on TIKA-2771:
---
Got it. Thank you.
bq. which calls: match(det, ngrams, byteMap,
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675708#comment-16675708
]
Hans Brende commented on TIKA-2771:
---
[~talli...@apache.org] I'm not sure which all of the charsets are
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675527#comment-16675527
]
Tim Allison commented on TIKA-2771:
---
I'm happy enough adding this check into EBCDIC500. Are there any
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675520#comment-16675520
]
Tim Allison commented on TIKA-2771:
---
When I add a {{tagsWereStripped}}, and have the EBCDIC500 charsets
[
https://issues.apache.org/jira/browse/TIKA-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675511#comment-16675511
]
Tim Allison commented on TIKA-2771:
---
Let me try again. I _think_ I've re-engaged my brain before I
ionut hodor created TIKA-2772:
-
Summary: Problem if cell contains quotation marks (")
Key: TIKA-2772
URL: https://issues.apache.org/jira/browse/TIKA-2772
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675342#comment-16675342
]
Tim Allison commented on TIKA-2750:
---
To my query above about jacoco, see the responses by Tobias Ospelt
[
https://issues.apache.org/jira/browse/TIKA-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675322#comment-16675322
]
Tim Allison commented on TIKA-2750:
---
I just added charset and lang by tld in last month's CommonCrawl
[
https://issues.apache.org/jira/browse/TIKA-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2750:
--
Attachment: CC-MAIN-2018-39-charset_lang_by_tld.zip
> Update regression corpus
>
[
https://issues.apache.org/jira/browse/TIKA-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675281#comment-16675281
]
Luis Filipe Nassif commented on TIKA-2765:
--
POI-62886 created. Thanks [~talli...@apache.org] and
[
https://issues.apache.org/jira/browse/TIKA-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674833#comment-16674833
]
ionut hodor commented on TIKA-2767:
---
Hi [~davemeikle],
thank you to answered me, i attached 2 files,
[
https://issues.apache.org/jira/browse/TIKA-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ionut hodor updated TIKA-2767:
--
Attachment: exampleXLS.xls
exampleXLSX.xlsx
> Problem with import xlsx with null cells
[
https://issues.apache.org/jira/browse/TIKA-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674826#comment-16674826
]
ionut hodor commented on TIKA-2767:
---
Hi [~davemeikle]
I have 2 example for you
> Problem with import
[
https://issues.apache.org/jira/browse/TIKA-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ionut hodor updated TIKA-2767:
--
Comment: was deleted
(was: Hi [~davemeikle]
I have 2 example for you)
> Problem with import xlsx with
20 matches
Mail list logo