[ 
https://issues.apache.org/jira/browse/TIKA-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keith R. Bennett updated TIKA-38:
---------------------------------

    Attachment: tika38.patch

In making these code changes, I made some assumptions.  If they are not valid, 
then the code change needs to be changed.  Here they are:

1) We want the text file being parsed to come through the parser exactly as it 
is stored, character by character, except that any line termination sequences 
should be translated to a newline ('\n') (as they would normally be represented 
in a Java string).

2) The BufferedReader does the line ending translation for us, so we will only 
see '\n' as a line terminator.

3) Using StringBuilder is better than using StringBuffer (now that we know we 
are using Java 1.5 we have the option).

4) Calling StringBuilder.read() is better than calling StringBuilder.readLine() 
because with readLine() we have no way of knowing whether or not a newline 
terminated the last line.  Also, we don't have to store a possibly arbitrarily 
long string in memory.

5) Calling StringBuilder.read() is slightly simpler than calling 
StringBuilder.read(char[],int,int) and may not be significantly slower (?).



> TXTParser appends a space to the text found in the file.
> --------------------------------------------------------
>
>                 Key: TIKA-38
>                 URL: https://issues.apache.org/jira/browse/TIKA-38
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>         Attachments: tika38.patch
>
>
> TXTParser adds a space to the content it reads from a file.  When it parsed a 
> file containing "1", it returned as the full text "1 " (space appended).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to