On Wed, Apr 20, 2011 at 10:58 AM, Jens Grivolla <[email protected]> wrote: > As it turns out, the other system considers CR+LF (Windows style line > endings) to be two characters, while UIMA sees it as one.
As Jörn suggested, this is probably a bug in the code somewhere where you read in the text. Perhaps you're using org.apache.uima.pear.util.FileUtil.loadTextFile? That's definitely broken in terms of line endings and I know that gave us trouble before. We found that org.apache.uima.util.FileUtils.file2String actually does the right thing, so you could use that instead. Having been bitten by this though, I tend to avoid the UIMA classes for handling files, and use com.google.common.io.Files.toString from the guava libraries instead, which I trust more. Steve P.S. Yes, I know I should have filed a bug report. Sorry for not getting around to it... -- Where did you get that preposterous hypothesis? Did Steve tell you that? --- The Hiphopopotamus
