Re: [Pharo-dev] Moving from mc to tonel?

Sven Van Caekenberghe Mon, 04 Dec 2017 23:56:47 -0800


> On 5 Dec 2017, at 08:34, Alistair Grant <[email protected]> wrote:
> 
> On 5 December 2017 at 03:41, Martin Dias <[email protected]> wrote:
>> I suspect it's related to the large number of commits in my repo. I made
>> some tweaks and succeeded to create the fast-import file. But I get:
>> 
>> fatal: Unsupported command: .
>> fast-import: dumping crash report to .git/fast_import_crash_10301
>> 
>> Do you recognize this error?
>> I will check my changes tweaking the git-migration tool to see if I modified
>> some behavior my mistake...
> 
> I had the same error just last night.
> 
> In my case, it turned out to be a non-UTF8 encoded character in one of
> the commit messages.
> 
> I tracked it down by looking at the crash report and searching for a
> nearby command.  I've deleted the crash reports now, but I think it
> was the number associated with a mark command that got me near the
> problem character in the fast-import file.
> 
> I also modified the code to halt whenever it found a non-UTF8
> character.  I'm sure there are better ways to do this, but:
> 
> 
> GitMigrationCommitInfo>>inlineDataFor: aString
> 
>    | theString |
>    theString := aString.
>    "Ensure the message has valid UTF-8 encoding (this will raise an
> error if it doesn't)"
>    [ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString
> asByteArray ]
>        on: Error
>        do: [ :ex | self halt: 'Illegal string encoding'.
>            theString := aString select: [ :each | each asciiValue
> between: 32 and: 128 ] ].
>    ^ 'data ' , theString size asString , String cr , (theString
> ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ])


There is also ByteArray>>#utf8Decoded (as well as String>>#utf8Encoded), both 
using the newer ZnCharacterEncoders. So you could write it shorter as:

  aString asByteArray utf8Decoded

Instead of Error you could use the more intention revealing 
ZnCharacterEncodingError.

Apart from that, and I known you did not create this mess, encoding a String 
into a String, although it can be done, is so wrong. You encode a String (a 
collection of Characters, of Unicode code points) into a ByteArray and you 
decode a ByteArray into a String.

Sending #asByteArray to a String in almost always wrong, as is sending 
#asString to a ByteArray. These are implicit conversions (null conversions, 
like ZnNullEncoder) that only work for pure ASCII or Latin1 (iso-8859-1), but 
not for the much richer set of Characters that Pharo supports. So these will 
eventually fail, one day, in a country far away.

> Cheers,
> Alistair
>

Re: [Pharo-dev] Moving from mc to tonel?

Reply via email to