> On 5 Dec 2017, at 08:34, Alistair Grant <[email protected]> wrote: > > On 5 December 2017 at 03:41, Martin Dias <[email protected]> wrote: >> I suspect it's related to the large number of commits in my repo. I made >> some tweaks and succeeded to create the fast-import file. But I get: >> >> fatal: Unsupported command: . >> fast-import: dumping crash report to .git/fast_import_crash_10301 >> >> Do you recognize this error? >> I will check my changes tweaking the git-migration tool to see if I modified >> some behavior my mistake... > > I had the same error just last night. > > In my case, it turned out to be a non-UTF8 encoded character in one of > the commit messages. > > I tracked it down by looking at the crash report and searching for a > nearby command. I've deleted the crash reports now, but I think it > was the number associated with a mark command that got me near the > problem character in the fast-import file. > > I also modified the code to halt whenever it found a non-UTF8 > character. I'm sure there are better ways to do this, but: > > > GitMigrationCommitInfo>>inlineDataFor: aString > > | theString | > theString := aString. > "Ensure the message has valid UTF-8 encoding (this will raise an > error if it doesn't)" > [ (ZnCharacterEncoder newForEncoding: 'utf8') decodeBytes: aString > asByteArray ] > on: Error > do: [ :ex | self halt: 'Illegal string encoding'. > theString := aString select: [ :each | each asciiValue > between: 32 and: 128 ] ]. > ^ 'data ' , theString size asString , String cr , (theString > ifEmpty: [ '' ] ifNotEmpty: [ theString , String cr ])
There is also ByteArray>>#utf8Decoded (as well as String>>#utf8Encoded), both using the newer ZnCharacterEncoders. So you could write it shorter as: aString asByteArray utf8Decoded Instead of Error you could use the more intention revealing ZnCharacterEncodingError. Apart from that, and I known you did not create this mess, encoding a String into a String, although it can be done, is so wrong. You encode a String (a collection of Characters, of Unicode code points) into a ByteArray and you decode a ByteArray into a String. Sending #asByteArray to a String in almost always wrong, as is sending #asString to a ByteArray. These are implicit conversions (null conversions, like ZnNullEncoder) that only work for pure ASCII or Latin1 (iso-8859-1), but not for the much richer set of Characters that Pharo supports. So these will eventually fail, one day, in a country far away. > Cheers, > Alistair >
