Interesting.  I kind of got to the same place right as your message came in:
I replaced my (temporary debugging) usage of System.out.println (I know,
horror, horror) with a PrintWriter that was created like this:

new PrintWriter(new OutputStreamWriter(System.out, "UTF8"));

...and that let me see that indeed characters were not getting stomped on.

Thanks,
Laird

On Thu, Sep 9, 2010 at 5:53 PM, David Fisher <[email protected]> wrote:

> JRUBY had a similar issue - perhaps this thread will help you sort things
> out.
>
> http://jira.codehaus.org/browse/JRUBY-3576
>
> Or this report. http://openradar.appspot.com/8307054
>
> But it is "not a bug it is feature."
>
> http://bugs.sun.com/view_bug.do?bug_id=4163515
>
> I think you will need to explicitly set your JVM to utf-8, but it seems
> like the Mac JVM is broken.
>
> Dave
>
> On Sep 9, 2010, at 2:36 PM, Laird Nelson wrote:
>
> > Also, this is making me a little nervous.  How does POI figure it out?
> > Presumably if it's going to give me a "real Java unicode string", then it
> > has to know how to convert from whatever encoding the spreadsheet is in
> to
> > Unicode.  So how does it figure out what encoding the spreadsheet is in?
> > What if it guesses wrongly?
> >
> > Best,
> > Laird
> >
> > On Thu, Sep 9, 2010 at 4:58 PM, Laird Nelson <[email protected]> wrote:
> >
> >> OK, so given that, I'm trying to figure out how when I take a String
> from
> >> cell.getStringCellValue(), and write it to a file whose FileOutputStream
> has
> >> been explicitly wrapped by a FileWriter using the UTF8 encoding--I'm
> trying
> >> to figure out why the contents in the file appear to be in MacRoman
> encoding
> >> (my platform's default).
> >>
> >> I'm creating my XMLEventWriter that's ultimately doing the writing like
> >> this:
> >>
> >> final FileOutputStream fileOuptutStream = new FileOutputStream(file);
> >> final OutputStreamWriter outputStreamWriter = new
> >> OutputStreamWriter(fileOuptutStream, "UTF8");
> >> final BufferedWriter bufferedWriter = new
> >> BufferedWriter(outputStreamWriter);
> >>
> >> XMLEventWriter writer =
> outputFactory.createXMLEventWriter(bufferedWriter);
> >>
> >> ...and then at various points I'm using the String value from POI to
> stick
> >> in there as #PCDATA.  Seems like this should not involve ANY character
> set
> >> conversion, is what you're telling me?
> >>
> >> L
> >>
> >>
> >> On Thu, Sep 9, 2010 at 4:30 PM, Nick Burch <[email protected]
> >wrote:
> >>
> >>> On Thu, 9 Sep 2010, Laird Nelson wrote:
> >>>
> >>>> I am using POI to read an Excel spreadsheet.  I have no idea what
> >>>> character encoding it's in.  I can tell you, however, it's not in
> UTF8. :-)
> >>>>
> >>>
> >>> You don't have to worry about the encoding, POI sorts that out for you.
> >>> Every String you get back is a real Java unicode string already
> >>>
> >>> Nick
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [email protected]
> >>> For additional commands, e-mail: [email protected]
> >>>
> >>>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to