[ https://issues.apache.org/jira/browse/TIKA-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560803#action_12560803 ]
Jukka Zitting commented on TIKA-114: ------------------------------------ PDFBox doesn't seem to call the processLineSeparator method in our PDF2XHML class. I'll investigate... > PDFParser : Getting content of the document using "writer.ToString ()" , some > words are stuck together > ------------------------------------------------------------------------------------------------------ > > Key: TIKA-114 > URL: https://issues.apache.org/jira/browse/TIKA-114 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 0.2-incubating > Reporter: Rida Benjelloun > Fix For: 0.2-incubating > > > PDFParser : Getting the content of the document using "writer.ToString ()" , > some words are stuck together > Result of PDF extraction : > "Apache Tika - Apache Tikahttp://incubator.apache.org/tika/1 of 115.9.2007 > 11:02Tika - Content Analysis ToolkitApache Tika is a toolkit for detecting > and extracting metadata and structured text content from various documents > using existing parser libraries. Apache Tika is an effort undergoing > incubation at The Apache Software Foundation (ASF), sponsored by the Apache > Lucene PMC. Incubation is required of all newly accepted projects until a > further review indicates that the infrastructure, communications, and > decision making process have stabilized in a manner consistent with other > successful ASF projects. While incubation status is not necessarily a > reflection of the completeness or stability of the code, it does indicate > that the project has yet to be fully endorsed by the ASF.See the Apache Tika > Incubation Status page for the current incubation status.Latest NewsMarch > 22nd, 2007: Apache Tika project startedThe Apache Tika project was formally > started when the Tika proposal was accepted by the Apache Incubator PMC." -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.