On 4/20/2011 14:31, Steven Bethard wrote: > On Wed, Apr 20, 2011 at 10:58 AM, Jens Grivolla <[email protected]> wrote: >> As it turns out, the other system considers CR+LF (Windows style line >> endings) to be two characters, while UIMA sees it as one. > > As Jörn suggested, this is probably a bug in the code somewhere where > you read in the text. Perhaps you're using > org.apache.uima.pear.util.FileUtil.loadTextFile? That's definitely > broken in terms of line endings and I know that gave us trouble > before. We found that org.apache.uima.util.FileUtils.file2String > actually does the right thing, so you could use that instead. Having > been bitten by this though, I tend to avoid the UIMA classes for > handling files, and use com.google.common.io.Files.toString from the > guava libraries instead, which I trust more.
This is getting slightly off-topic, but you can also use Apache Commons IO for this sort of thing. Although I resent having the UIMA core file utils lumped in with the pear stuff, I can't blame you for your conclusion ;-) --Thilo > > Steve > > P.S. Yes, I know I should have filed a bug report. Sorry for not > getting around to it...
