Re: [Pharo-dev] Moving from mc to tonel?

Peter Uhnák Tue, 05 Dec 2017 00:01:23 -0800

> In my case, it turned out to be a non-UTF8 encoded character in one of the
commit messages.


I've ran into this problem in a sister project (tonel-migration), and do
not have a proper resolution yet. I was forcing everything to be unicode,
so I need a better way to read and write encoded strings. :<

On Tue, Dec 5, 2017 at 8:56 AM, Sven Van Caekenberghe <[email protected]> wrote:

>
>
> > On 5 Dec 2017, at 08:34, Alistair Grant <[email protected]> wrote:
> >
> > On 5 December 2017 at 03:41, Martin Dias <[email protected]> wrote:
> >> I suspect it's related to the large number of commits in my repo. I made
> >> some tweaks and succeeded to create the fast-import file. But I get:
> >>
> >> fatal: Unsupported command: .
> >> fast-import: dumping crash report to .git/fast_import_crash_10301
> >>
> >> Do you recognize this error?
> >> I will check my changes tweaking the git-migration tool to see if I
> modified
> >> some behavior my mistake...
> >
> > I had the same error just last night.
> >
> > In my case, it turned out to be a non-UTF8 encoded character in one of
> > the commit messages.
> >
> > I tracked it down by looking at the crash report and searching for a
> > nearby command.  I've deleted the crash reports now, but I think it
> > was the number associated with a mark command that got me near the
> > problem character in the fast-import file.
> >
> > I also modified the code to halt whenever it found a non-UTF8
> > character.  I'm sure there are better ways to do this, but:
> >
> >
> > GitMigrationCommitInfo>>inlineDataFor: aString
> >
> >    | theString |
> >    theString := aString.
> >    "Ensure the message has valid UTF-8 encoding (this will raise an
> > error if it doesn't)"
> >    [ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString
> > asByteArray ]
> >        on: Error
> >        do: [ :ex | self halt: 'Illegal string encoding'.
> >            theString := aString select: [ :each | each asciiValue
> > between: 32 and: 128 ] ].
> >    ^ 'data ' , theString size asString , String cr , (theString
> > ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ])
>
> There is also ByteArray>>#utf8Decoded (as well as String>>#utf8Encoded),
> both using the newer ZnCharacterEncoders. So you could write it shorter as:
>
>   aString asByteArray utf8Decoded
>
> Instead of Error you could use the more intention revealing
> ZnCharacterEncodingError.
>
> Apart from that, and I known you did not create this mess, encoding a
> String into a String, although it can be done, is so wrong. You encode a
> String (a collection of Characters, of Unicode code points) into a
> ByteArray and you decode a ByteArray into a String.
>
> Sending #asByteArray to a String in almost always wrong, as is sending
> #asString to a ByteArray. These are implicit conversions (null conversions,
> like ZnNullEncoder) that only work for pure ASCII or Latin1 (iso-8859-1),
> but not for the much richer set of Characters that Pharo supports. So these
> will eventually fail, one day, in a country far away.
>
> > Cheers,
> > Alistair
> >
>
>
>

Re: [Pharo-dev] Moving from mc to tonel?

Reply via email to