> On 26 Sep 2017, at 18:09, Stephane Ducasse <[email protected]> wrote:
> 
> Here is a script that should work to convert from latin1 to utf-8.
> Thanks to your book and trial and error.
> 
> | str wstr |
> str := ('listeDeMotsFrancaisFrGut.txt' asFileReference readStreamDo: [ :in |
>   (ZnCharacterReadStream on: in binary encoding: #latin1)
>      upToEnd ]) lines.
> 
> 'listeDeMotsFrancaisFrGutUTF8.txt' asFileReference writeStreamDo: [ :out |
>   wstr := (ZnCharacterWriteStream on: out binary encoding: #utf8).
> str do: [ :each | wstr nextPutAll: each. wstr crlf. ].
> ].

Yes, that is correct (and using the newer encoders in both directions)

> On Tue, Sep 26, 2017 at 6:01 PM, Stephane Ducasse
> <[email protected]> wrote:
>> Now inspecting the file containent opens gtinspector and freezes Pharo :(
>> I think that I will remove this part of my book.
>> It is simpler.
>> 
>> On Tue, Sep 26, 2017 at 5:53 PM, Stephane Ducasse
>> <[email protected]> wrote:
>>> I'm reading your chapter :)
>>> Now I understand the file I found is totally bogus :)
>>> But the first one I found is indeed encoded in latin1.
>>> So I'm trying to convert it.
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Sep 26, 2017 at 5:40 PM, Stephane Ducasse
>>> <[email protected]> wrote:
>>>>> Any chance you can point me to the original file ?
>>>> 
>>>> No they removed it
>>>> May be I could try to convert it to utf-8 (I do not know how to do it)
>>>> 
>>>>> The file is indeed in Latin1 encoded, but GitHub serves it as UTF-8 (it 
>>>>> did not change the contents, but the meta data).
>>>> 
>>>> Ok I see the problem
>>>>> 
>>>>> The default encoder option only works when the server says nothing, it 
>>>>> does not override what the server says.
>>>> 
>>>> Ah ok.
>>>> 
>>>>> The only way to read it, is by reading it binary (which basically ignores 
>>>>> the meta data) and then convert it manually:
>>>>> 
>>>>> (ZnCharacterEncoder latin1 decodeBytes:
>>>>>  (ZnClient new
>>>>>        beBinary;
>>>>>        get: 
>>>>> 'https://raw.githubusercontent.com/SquareBracketAssociates/LearningOOPWithPharo/master/resources/listeDeMotsFrancaisFrGut.txt'))
>>>>>  lines.
>>>>> 
>>>>> But this is very ugly.
>>>>> 
>>>>> Best convert the original file to UTF-8 before uploading to GitHub.
>>>> 
>>>> OK I will try to leanr how to do it.
>>>> 
>>>> 
>>>>> 
>>>>> Sven
>>>>> 
>>>>> 
>>>>> 
> 


Reply via email to