[
https://issues.apache.org/jira/browse/TIKA-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Keith R. Bennett updated TIKA-38:
---------------------------------
Attachment: tika38.patch
In making these code changes, I made some assumptions. If they are not valid,
then the code change needs to be changed. Here they are:
1) We want the text file being parsed to come through the parser exactly as it
is stored, character by character, except that any line termination sequences
should be translated to a newline ('\n') (as they would normally be represented
in a Java string).
2) The BufferedReader does the line ending translation for us, so we will only
see '\n' as a line terminator.
3) Using StringBuilder is better than using StringBuffer (now that we know we
are using Java 1.5 we have the option).
4) Calling StringBuilder.read() is better than calling StringBuilder.readLine()
because with readLine() we have no way of knowing whether or not a newline
terminated the last line. Also, we don't have to store a possibly arbitrarily
long string in memory.
5) Calling StringBuilder.read() is slightly simpler than calling
StringBuilder.read(char[],int,int) and may not be significantly slower (?).
> TXTParser appends a space to the text found in the file.
> --------------------------------------------------------
>
> Key: TIKA-38
> URL: https://issues.apache.org/jira/browse/TIKA-38
> Project: Tika
> Issue Type: Bug
> Components: general
> Affects Versions: 0.1-incubator
> Reporter: Keith R. Bennett
> Priority: Minor
> Fix For: 0.1-incubator
>
> Attachments: tika38.patch
>
>
> TXTParser adds a space to the content it reads from a file. When it parsed a
> file containing "1", it returned as the full text "1 " (space appended).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.