[ 
https://issues.apache.org/jira/browse/TIKA-114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560803#action_12560803
 ] 

Jukka Zitting commented on TIKA-114:
------------------------------------

PDFBox doesn't seem to call the processLineSeparator method in our PDF2XHML 
class. I'll investigate...

> PDFParser : Getting content of the document using "writer.ToString ()" , some 
> words are stuck together
> ------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-114
>                 URL: https://issues.apache.org/jira/browse/TIKA-114
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.2-incubating
>            Reporter: Rida Benjelloun
>             Fix For: 0.2-incubating
>
>
> PDFParser : Getting the content of the document using "writer.ToString ()" , 
> some words are stuck together
> Result of PDF extraction : 
> "Apache Tika - Apache Tikahttp://incubator.apache.org/tika/1 of 115.9.2007 
> 11:02Tika - Content Analysis ToolkitApache Tika is a toolkit for detecting 
> and extracting metadata and structured text content from various documents 
> using existing parser libraries. Apache Tika is an effort undergoing 
> incubation at The Apache Software Foundation (ASF), sponsored by the Apache 
> Lucene PMC. Incubation is required of all newly accepted projects until a 
> further review indicates that the infrastructure, communications, and 
> decision making process have stabilized in a manner consistent with other 
> successful ASF projects. While incubation status is not necessarily a 
> reflection of the completeness or stability of the code, it does indicate 
> that the project has yet to be fully endorsed by the ASF.See the Apache Tika 
> Incubation Status page for the current incubation status.Latest NewsMarch 
> 22nd, 2007: Apache Tika project startedThe Apache Tika project was formally 
> started when the Tika proposal was accepted by the Apache Incubator PMC."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to