[
https://issues.apache.org/jira/browse/TIKA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670462#comment-16670462
]
feng ye commented on TIKA-2735:
---
appreciate your efforts and the update Tim!
> notes and footer contents
[
https://issues.apache.org/jira/browse/TIKA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
feng ye closed TIKA-2734.
-
Resolution: Fixed
thanks for Tim's reply. We can close this issue now.
> Tika addes extra characters at the end
[
https://issues.apache.org/jira/browse/TIKA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654386#comment-16654386
]
feng ye commented on TIKA-2734:
---
Thanks Tim for your detailed tips.
I am using Tika to extract all kinds
[
https://issues.apache.org/jira/browse/TIKA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
feng ye reopened TIKA-2734:
---
Hi Tim,
applied the config setting you suggested but the results still contain the
footer contents somehow.
[
https://issues.apache.org/jira/browse/TIKA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
feng ye updated TIKA-2734:
--
Attachment: extra_A_Page_P.png
> Tika addes extra characters at the end of text in extracting from excel file
>
[
https://issues.apache.org/jira/browse/TIKA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639822#comment-16639822
]
feng ye commented on TIKA-2734:
---
Another question: it seems the footer contents are appended right after the
[
https://issues.apache.org/jira/browse/TIKA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639786#comment-16639786
]
feng ye commented on TIKA-2734:
---
Thanks Tim for your info! Would you please show me an example of
[
https://issues.apache.org/jira/browse/TIKA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638801#comment-16638801
]
feng ye commented on TIKA-2735:
---
Really appreciate it Tim for addressing this issue. Yes, having these
[
https://issues.apache.org/jira/browse/TIKA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629020#comment-16629020
]
feng ye commented on TIKA-2734:
---
and I believe is corresponding to something other than page footer.
>
[
https://issues.apache.org/jira/browse/TIKA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628999#comment-16628999
]
feng ye commented on TIKA-2734:
---
Thanks Nick for looking into this.
I did see a footer of "Page 1" for
feng ye created TIKA-2735:
-
Summary: notes and footer contents are duplicated in extracting
text from power point slides
Key: TIKA-2735
URL: https://issues.apache.org/jira/browse/TIKA-2735
Project: Tika
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482594#comment-16482594
]
feng ye commented on TIKA-2643:
---
In fact it always (so far) runs on cdh 5.11 but not on cdh 5.8.
> Tika call
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482540#comment-16482540
]
feng ye commented on TIKA-2643:
---
another interesting observation: on CDH 5.11 with Java 8, I can process this
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482184#comment-16482184
]
feng ye commented on TIKA-2643:
---
Hi Ken,
Really appreciate your efforts looking into this. The MR classpath
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
feng ye updated TIKA-2643:
--
Attachment: hs_err_pid32104.log
> Tika call hangs when processes a pdf on Cloudera Hadoop
>
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481255#comment-16481255
]
feng ye commented on TIKA-2643:
---
turned out CDH 5.8 has Tika 1.5 bundled in. Although it is not in MapReduce
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479204#comment-16479204
]
feng ye commented on TIKA-2643:
---
Ken, did you suggest me to kill Hadoop process to get the stack trace for
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479060#comment-16479060
]
feng ye commented on TIKA-2643:
---
JDK 1.7 is used for Cloudera 5.8 on the hanging/crashing machine
> Tika
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478428#comment-16478428
]
feng ye commented on TIKA-2643:
---
I extended timeout to 30 mins from 10 mins. Instead of hanging all the 30
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477613#comment-16477613
]
feng ye commented on TIKA-2643:
---
Yes, it hangs every time at the same log message.
> Tika call hangs when
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477524#comment-16477524
]
feng ye commented on TIKA-2643:
---
Thanks Ken. In my code I did set mapreduce.job.user.classpath.first. Of
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477516#comment-16477516
]
feng ye commented on TIKA-2643:
---
Copying below the last lines in the log corresponding to the working
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477504#comment-16477504
]
feng ye commented on TIKA-2643:
---
logger.debug("in helper funciton... ");
String content =
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477480#comment-16477480
]
feng ye commented on TIKA-2643:
---
For information the whole suite of 257 files including this file got
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477458#comment-16477458
]
feng ye commented on TIKA-2643:
---
Thanks Tim for looking into this!
I have no problem processing this file
feng ye created TIKA-2643:
-
Summary: Tika call hangs when processes a pdf on Cloudera Hadoop
Key: TIKA-2643
URL: https://issues.apache.org/jira/browse/TIKA-2643
Project: Tika
Issue Type: Bug
26 matches
Mail list logo