PDFParser : Getting content of the document using "writer.ToString ()" , some 
words are stuck together
------------------------------------------------------------------------------------------------------

                 Key: TIKA-114
                 URL: https://issues.apache.org/jira/browse/TIKA-114
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.2-incubating
            Reporter: Rida Benjelloun
             Fix For: 0.2-incubating


PDFParser : Getting the content of the document using "writer.ToString ()" , 
some words are stuck together
Result of PDF extraction : 
"Apache Tika - Apache Tikahttp://incubator.apache.org/tika/1 of 115.9.2007 
11:02Tika - Content Analysis ToolkitApache Tika is a toolkit for detecting and 
extracting metadata and structured text content from various documents using 
existing parser libraries. Apache Tika is an effort undergoing incubation at 
The Apache Software Foundation (ASF), sponsored by the Apache Lucene PMC. 
Incubation is required of all newly accepted projects until a further review 
indicates that the infrastructure, communications, and decision making process 
have stabilized in a manner consistent with other successful ASF projects. 
While incubation status is not necessarily a reflection of the completeness or 
stability of the code, it does indicate that the project has yet to be fully 
endorsed by the ASF.See the Apache Tika Incubation Status page for the current 
incubation status.Latest NewsMarch 22nd, 2007: Apache Tika project startedThe 
Apache Tika project was formally started when the Tika proposal was accepted by 
the Apache Incubator PMC."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to