[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364710#comment-14364710
]
Tilman Hausherr commented on TIKA-1575:
---
Could you attach the TIKA output you get
[
https://issues.apache.org/jira/browse/TIKA-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ann Burgess updated TIKA-1577:
--
Description:
A netCDF classic or 64-bit offset dataset is stored as a single file comprising
two parts:
GitHub user mkr opened a pull request:
https://github.com/apache/tika/pull/35
TIKA-1365: Lower priority for XML starting with comment
TIKA-1365: Lower priority for XML starting with comment, allow HTML
starting with comment to be detected as text/html
You can merge this pull
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365641#comment-14365641
]
Tim Allison edited comment on TIKA-1575 at 3/17/15 5:51 PM:
We
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365641#comment-14365641
]
Tim Allison edited comment on TIKA-1575 at 3/17/15 5:50 PM:
We
Ann Burgess created TIKA-1577:
-
Summary: NetCDF Data Extraction
Key: TIKA-1577
URL: https://issues.apache.org/jira/browse/TIKA-1577
Project: Tika
Issue Type: Improvement
Components:
[
https://issues.apache.org/jira/browse/TIKA-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366176#comment-14366176
]
ASF GitHub Bot commented on TIKA-1365:
--
GitHub user mkr opened a pull request:
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1575:
--
Attachment: 005937.pdf.json
Y, I can't find it in Acro Reader with search either, but it was extracted
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364885#comment-14364885
]
Tim Allison edited comment on TIKA-1575 at 3/17/15 10:27 AM:
-
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365641#comment-14365641
]
Tim Allison commented on TIKA-1575:
---
We haven't yet integrated OCR with PDFParsing...it
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365733#comment-14365733
]
Tim Allison commented on TIKA-1575:
---
If the multithreading hypothesis is correct, we had
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365799#comment-14365799
]
Tim Allison commented on TIKA-1575:
---
I've kicked off a single-threaded batch run of 1.8.9
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365807#comment-14365807
]
Tilman Hausherr commented on TIKA-1575:
---
Can't tell, I don't know much about the
[
https://issues.apache.org/jira/browse/TIKA-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366201#comment-14366201
]
Matthias Krueger commented on TIKA-1365:
Quick wrapup:
* HTML starting with comment
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365829#comment-14365829
]
Tilman Hausherr commented on TIKA-1575:
---
Thanks. Re: OCR, you should know that there
GitHub user mkr opened a pull request:
https://github.com/apache/tika/pull/34
TIKA-1554: Adding EMF magic as per Microsoft's EMF specification, thanks to
Luis Filipe Nassif
TIKA-1554: Adding EMF magic as per Microsoft's EMF specification, thanks to
Luis Filipe Nassif
You can
[
https://issues.apache.org/jira/browse/TIKA-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366038#comment-14366038
]
ASF GitHub Bot commented on TIKA-1554:
--
GitHub user mkr opened a pull request:
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1575:
--
Attachment: 005937_1_8_9-SNAPSHOT.pdf.json
Corrupted characters where monitoring should be. Given that
[
https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365524#comment-14365524
]
Tilman Hausherr commented on TIKA-1575:
---
I can't understand how you get the extracted
19 matches
Mail list logo