[
https://issues.apache.org/jira/browse/TIKA-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396332#comment-17396332
]
Abha edited comment on TIKA-3518 at 8/10/21, 1:09 AM:
--
Update –
So i tried 1.27 and
[
https://issues.apache.org/jira/browse/TIKA-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396339#comment-17396339
]
Abha commented on TIKA-3518:
So if i copy the eng.traineddata from the tessdata folder and move it to the
[
https://issues.apache.org/jira/browse/TIKA-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396332#comment-17396332
]
Abha edited comment on TIKA-3518 at 8/10/21, 1:00 AM:
--
Update --
So i tried 1.27 and
[
https://issues.apache.org/jira/browse/TIKA-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396336#comment-17396336
]
Xiaohong Yang commented on TIKA-3519:
-
No. We have not. I will try it and let you know.
Thank you
[
https://issues.apache.org/jira/browse/TIKA-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396332#comment-17396332
]
Abha commented on TIKA-3518:
Please find my response inline -
{color:#FFAB00}When you say the processbuilder
[
https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396302#comment-17396302
]
Luís Filipe Nassif commented on TIKA-3515:
--
Yes, seems to be some java Font issue, if I copy the
[
https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396282#comment-17396282
]
Tim Allison commented on TIKA-3515:
---
The UI is not very familiar to me. It looks like regular java
[
https://issues.apache.org/jira/browse/TIKA-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396277#comment-17396277
]
Tim Allison commented on TIKA-3518:
---
There shouldn't be any new config changes. Hmmm...
When you say
[
https://issues.apache.org/jira/browse/TIKA-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396260#comment-17396260
]
Tim Allison edited comment on TIKA-3517 at 8/9/21, 8:27 PM:
For posterity, I'm
[
https://issues.apache.org/jira/browse/TIKA-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396260#comment-17396260
]
Tim Allison edited comment on TIKA-3517 at 8/9/21, 8:26 PM:
For posterity, I'm
[
https://issues.apache.org/jira/browse/TIKA-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396260#comment-17396260
]
Tim Allison commented on TIKA-3517:
---
For posterity, I'm attaching the Document.iwa file from inside the
[
https://issues.apache.org/jira/browse/TIKA-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396258#comment-17396258
]
Abha edited comment on TIKA-3518 at 8/9/21, 8:18 PM:
-
I tested it with JDK 11 and
[
https://issues.apache.org/jira/browse/TIKA-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396258#comment-17396258
]
Abha commented on TIKA-3518:
I tested it with JDK 11 and still the same issue.
The ProcessBuilder class in
[
https://issues.apache.org/jira/browse/TIKA-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-3517:
--
Attachment: Document
Document.iwa
> Text extraction doesn't work for Pages and Numbers
[
https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luís Filipe Nassif updated TIKA-3515:
-
Issue Type: Improvement (was: Bug)
Priority: Minor (was: Major)
> Korean chars
[
https://issues.apache.org/jira/browse/TIKA-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396218#comment-17396218
]
Abha commented on TIKA-3518:
On debugging it seems that, the processbuilder (java) is not creating the tmp
[
https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396217#comment-17396217
]
Luís Filipe Nassif commented on TIKA-3515:
--
Programmatically it worked fine.
Is
[
https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396206#comment-17396206
]
Luís Filipe Nassif commented on TIKA-3515:
--
Humm both 2 commands above worked (also without -J
[
https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396203#comment-17396203
]
Tim Allison commented on TIKA-3515:
---
There is a subtle difference in tika-cli in handling xhtml/html and
[
https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396201#comment-17396201
]
Tim Allison commented on TIKA-3515:
---
Or, what happens if you try the {{-e}} option with {{-t}}?
[
https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396199#comment-17396199
]
Tim Allison commented on TIKA-3515:
---
I'm perplexed... :( Y, those question marks are literally byte
[
https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luís Filipe Nassif updated TIKA-3515:
-
Environment: Windows 10, Liberica OpenJDK FULL x64 1.8.0_302
> Korean chars not extracted
[
https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luís Filipe Nassif updated TIKA-3515:
-
Attachment: LIVE-Seoul-ntfs-utf-8.txt_-x_output.xml
[
https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396188#comment-17396188
]
Luís Filipe Nassif edited comment on TIKA-3515 at 8/9/21, 5:43 PM:
---
Hi,
[
https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396188#comment-17396188
]
Luís Filipe Nassif commented on TIKA-3515:
--
Hi, Tim. This is what I'm getting from tika-app from
[
https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luís Filipe Nassif updated TIKA-3515:
-
Attachment: image-2021-08-09-14-38-26-763.png
> Korean chars not extracted correctly
>
[
https://issues.apache.org/jira/browse/TIKA-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luís Filipe Nassif updated TIKA-3515:
-
Attachment: image-2021-08-09-14-37-30-552.png
> Korean chars not extracted correctly
>
[
https://issues.apache.org/jira/browse/TIKA-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396134#comment-17396134
]
Tim Allison commented on TIKA-3519:
---
Have you tried setting a write limit on your handler?
e.g.
[
https://issues.apache.org/jira/browse/TIKA-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396067#comment-17396067
]
Xiaohong Yang commented on TIKA-3519:
-
We call Tika programmatically.
> Wonder if you can add a
[
https://issues.apache.org/jira/browse/TIKA-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396043#comment-17396043
]
Sebastian Nagel commented on TIKA-3489:
---
+1 to leave it as is. A backport definitely makes sense, in
[
https://issues.apache.org/jira/browse/TIKA-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396040#comment-17396040
]
Chris Bryant commented on TIKA-3517:
Thanks, [~tallison]. I see Tika-1358 now. With regular
[
https://issues.apache.org/jira/browse/TIKA-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396006#comment-17396006
]
Hudson commented on TIKA-3517:
--
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #304 (See
[
https://issues.apache.org/jira/browse/TIKA-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17395967#comment-17395967
]
Tim Allison commented on TIKA-3517:
---
I fixed this in main/2.0.1-SNAPSHOT. However, regrettably, we
[
https://issues.apache.org/jira/browse/TIKA-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17395964#comment-17395964
]
Tim Allison commented on TIKA-3518:
---
I'm not seeing problems with 4.1.1 locally.
I think we added more
Porting our RTFParser might be a good place to start.
I’ve heard good things about kaitai… thank you for sharing!
On Mon, Aug 9, 2021 at 5:40 AM Nick Burch wrote:
> Hi All
>
> I came across Kaitai - http://kaitai.io/ - yesterday. Based on the
> experiences documented in this twitter thread on
Hi All
I came across Kaitai - http://kaitai.io/ - yesterday. Based on the
experiences documented in this twitter thread on understanding + parsing
an embedded filesystem:
https://twitter.com/wrongbaud/status/1424380510671880198
Looks like it might be worth a look for if we need to write our
36 matches
Mail list logo