Interesting. I kind of got to the same place right as your message came in: I replaced my (temporary debugging) usage of System.out.println (I know, horror, horror) with a PrintWriter that was created like this:
new PrintWriter(new OutputStreamWriter(System.out, "UTF8")); ...and that let me see that indeed characters were not getting stomped on. Thanks, Laird On Thu, Sep 9, 2010 at 5:53 PM, David Fisher <[email protected]> wrote: > JRUBY had a similar issue - perhaps this thread will help you sort things > out. > > http://jira.codehaus.org/browse/JRUBY-3576 > > Or this report. http://openradar.appspot.com/8307054 > > But it is "not a bug it is feature." > > http://bugs.sun.com/view_bug.do?bug_id=4163515 > > I think you will need to explicitly set your JVM to utf-8, but it seems > like the Mac JVM is broken. > > Dave > > On Sep 9, 2010, at 2:36 PM, Laird Nelson wrote: > > > Also, this is making me a little nervous. How does POI figure it out? > > Presumably if it's going to give me a "real Java unicode string", then it > > has to know how to convert from whatever encoding the spreadsheet is in > to > > Unicode. So how does it figure out what encoding the spreadsheet is in? > > What if it guesses wrongly? > > > > Best, > > Laird > > > > On Thu, Sep 9, 2010 at 4:58 PM, Laird Nelson <[email protected]> wrote: > > > >> OK, so given that, I'm trying to figure out how when I take a String > from > >> cell.getStringCellValue(), and write it to a file whose FileOutputStream > has > >> been explicitly wrapped by a FileWriter using the UTF8 encoding--I'm > trying > >> to figure out why the contents in the file appear to be in MacRoman > encoding > >> (my platform's default). > >> > >> I'm creating my XMLEventWriter that's ultimately doing the writing like > >> this: > >> > >> final FileOutputStream fileOuptutStream = new FileOutputStream(file); > >> final OutputStreamWriter outputStreamWriter = new > >> OutputStreamWriter(fileOuptutStream, "UTF8"); > >> final BufferedWriter bufferedWriter = new > >> BufferedWriter(outputStreamWriter); > >> > >> XMLEventWriter writer = > outputFactory.createXMLEventWriter(bufferedWriter); > >> > >> ...and then at various points I'm using the String value from POI to > stick > >> in there as #PCDATA. Seems like this should not involve ANY character > set > >> conversion, is what you're telling me? > >> > >> L > >> > >> > >> On Thu, Sep 9, 2010 at 4:30 PM, Nick Burch <[email protected] > >wrote: > >> > >>> On Thu, 9 Sep 2010, Laird Nelson wrote: > >>> > >>>> I am using POI to read an Excel spreadsheet. I have no idea what > >>>> character encoding it's in. I can tell you, however, it's not in > UTF8. :-) > >>>> > >>> > >>> You don't have to worry about the encoding, POI sorts that out for you. > >>> Every String you get back is a real Java unicode string already > >>> > >>> Nick > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: [email protected] > >>> For additional commands, e-mail: [email protected] > >>> > >>> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
