[jira] [Commented] (TIKA-2735) notes and footer contents are duplicated in extracting text from power point slides

2018-10-31 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670462#comment-16670462 ] feng ye commented on TIKA-2735: --- appreciate your efforts and the update Tim! > notes and footer contents

[jira] [Closed] (TIKA-2734) Tika addes extra characters at the end of text in extracting from excel file

2018-10-21 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feng ye closed TIKA-2734. - Resolution: Fixed thanks for Tim's reply. We can close this issue now. > Tika addes extra characters at the end

[jira] [Commented] (TIKA-2734) Tika addes extra characters at the end of text in extracting from excel file

2018-10-17 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654386#comment-16654386 ] feng ye commented on TIKA-2734: --- Thanks Tim for your detailed tips.  I am using Tika to extract all kinds

[jira] [Reopened] (TIKA-2734) Tika addes extra characters at the end of text in extracting from excel file

2018-10-16 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feng ye reopened TIKA-2734: --- Hi Tim, applied the config setting you suggested but the results still contain the footer contents somehow.

[jira] [Updated] (TIKA-2734) Tika addes extra characters at the end of text in extracting from excel file

2018-10-05 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feng ye updated TIKA-2734: -- Attachment: extra_A_Page_P.png > Tika addes extra characters at the end of text in extracting from excel file >

[jira] [Commented] (TIKA-2734) Tika addes extra characters at the end of text in extracting from excel file

2018-10-05 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639822#comment-16639822 ] feng ye commented on TIKA-2734: --- Another question: it seems the footer contents are appended right after the

[jira] [Commented] (TIKA-2734) Tika addes extra characters at the end of text in extracting from excel file

2018-10-05 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639786#comment-16639786 ] feng ye commented on TIKA-2734: --- Thanks Tim for your info! Would you please show me an example of

[jira] [Commented] (TIKA-2735) notes and footer contents are duplicated in extracting text from power point slides

2018-10-04 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638801#comment-16638801 ] feng ye commented on TIKA-2735: --- Really appreciate it Tim for addressing this issue. Yes, having these

[jira] [Commented] (TIKA-2734) Tika addes extra characters at the end of text in extracting from excel file

2018-09-26 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629020#comment-16629020 ] feng ye commented on TIKA-2734: --- and I believe is corresponding to something other than page footer.  >

[jira] [Commented] (TIKA-2734) Tika addes extra characters at the end of text in extracting from excel file

2018-09-26 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628999#comment-16628999 ] feng ye commented on TIKA-2734: --- Thanks Nick for looking into this.  I did see a footer of "Page 1" for

[jira] [Created] (TIKA-2735) notes and footer contents are duplicated in extracting text from power point slides

2018-09-24 Thread feng ye (JIRA)
feng ye created TIKA-2735: - Summary: notes and footer contents are duplicated in extracting text from power point slides Key: TIKA-2735 URL: https://issues.apache.org/jira/browse/TIKA-2735 Project: Tika

[jira] [Commented] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop

2018-05-21 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482594#comment-16482594 ] feng ye commented on TIKA-2643: --- In fact it always (so far) runs on cdh 5.11 but not on cdh 5.8. > Tika call

[jira] [Commented] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop

2018-05-21 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482540#comment-16482540 ] feng ye commented on TIKA-2643: --- another interesting observation: on CDH 5.11 with Java 8, I can process this

[jira] [Commented] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop

2018-05-20 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482184#comment-16482184 ] feng ye commented on TIKA-2643: --- Hi Ken,  Really appreciate your efforts looking into this. The MR classpath

[jira] [Updated] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop

2018-05-18 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feng ye updated TIKA-2643: -- Attachment: hs_err_pid32104.log > Tika call hangs when processes a pdf on Cloudera Hadoop >

[jira] [Commented] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop

2018-05-18 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481255#comment-16481255 ] feng ye commented on TIKA-2643: --- turned out CDH 5.8 has Tika 1.5 bundled in. Although it is not in MapReduce

[jira] [Commented] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop

2018-05-17 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479204#comment-16479204 ] feng ye commented on TIKA-2643: --- Ken, did you suggest me to kill Hadoop process to get the stack trace for

[jira] [Commented] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop

2018-05-17 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479060#comment-16479060 ] feng ye commented on TIKA-2643: --- JDK 1.7 is used for Cloudera 5.8 on the hanging/crashing machine > Tika

[jira] [Commented] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop

2018-05-16 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478428#comment-16478428 ] feng ye commented on TIKA-2643: --- I extended timeout to 30 mins from 10 mins. Instead of hanging all the 30

[jira] [Commented] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop

2018-05-16 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477613#comment-16477613 ] feng ye commented on TIKA-2643: --- Yes, it hangs every time at the same log message.  > Tika call hangs when

[jira] [Commented] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop

2018-05-16 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477524#comment-16477524 ] feng ye commented on TIKA-2643: --- Thanks Ken. In my code I did set mapreduce.job.user.classpath.first. Of

[jira] [Commented] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop

2018-05-16 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477516#comment-16477516 ] feng ye commented on TIKA-2643: --- Copying below the last lines in the log corresponding to the working

[jira] [Commented] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop

2018-05-16 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477504#comment-16477504 ] feng ye commented on TIKA-2643: --- logger.debug("in helper funciton... "); String content =

[jira] [Commented] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop

2018-05-16 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477480#comment-16477480 ] feng ye commented on TIKA-2643: --- For information the whole suite of 257 files including this file got

[jira] [Commented] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop

2018-05-16 Thread feng ye (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477458#comment-16477458 ] feng ye commented on TIKA-2643: --- Thanks Tim for looking into this! I have no problem processing this file

[jira] [Created] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop

2018-05-16 Thread feng ye (JIRA)
feng ye created TIKA-2643: - Summary: Tika call hangs when processes a pdf on Cloudera Hadoop Key: TIKA-2643 URL: https://issues.apache.org/jira/browse/TIKA-2643 Project: Tika Issue Type: Bug