Re: [Pharo-users] NeoCSV/NeoJSON and encodings

Sven Van Caekenberghe Mon, 11 Mar 2013 01:25:25 -0700

Hello Esteban,

On 11 Mar 2013, at 03:17, Esteban A. Maringolo <[email protected]> wrote:


> Hi all, Sven,
> 
> I would like to know what is the proper way (steps) to parse a UTF-8 encoded
> CSV file, which will store most of the strings into domain objects instVars
> which will get mapped back to JSON and send trough the wire by means of a
> Seaside RESTful Filter.
> 
> I haven't specified any encoding information during the input or the output,
> and then I'm not seeing the right characters in the inspectors (I expected
> that), nor in the JSON output or the Seaside HTML output.
> 
> The Zinc server adaptor has its default codec, it is, utf-8.

Both NeoCSV and NeoJSON were written to be encoding agnostic. I.e. they work on 
character streams that you provide. The encoding/decoding is up to you or up to 
whatever you use to instanciate the character streams.

Here is a quick example (Pharo VM on Mac OS X, #20587, standard NeoCSV release).

'foo.csv' asFileReference writeStreamDo: [ :out |
        (NeoCSVWriter on: out)
                nextPut: #( 1 'élève en Français' ) ].

'foo.csv' asFileReference readStreamDo: [ :in |
        (NeoCSVReader on: in)
                next ]. 

#('1' 'élève en Français')

$ cat foo.csv 
"1","élève en Français"

$ file foo.csv
foo.csv: UTF-8 Unicode text, with CRLF line terminators

The above code uses whatever FileReference offers, namely UTF-8 encoded 
character streams.

I would suggest that you inspect the contents of the character streams before 
feeding them to NeoCSV or NeoJSON, the wrong encoding will probably be visible 
there.

'foo.csv' asFileReference readStreamDo: [ :in | in upToEnd ].

Zinc, both the client and the server, should normally always do the right thing 
(™): based on the Content-Type bytes will be converted using the proper 
encoding.

Regards,

Sven

--
Sven Van Caekenberghe
http://stfx.eu
Smalltalk is the Red Pill

Re: [Pharo-users] NeoCSV/NeoJSON and encodings

Reply via email to