Well, this is latin1-centric, but could speed-up a majority of us,
since most code is written in English.
The idea is to use Andreas fast conversions ByteString>>#squeakToUtf8 ...
However creating an intermediary Stream then an intermediary String is
not necessary...
So we'll have to replace this with dispatching techniques.
Moreover, the technique can be generalized to other encodings.
More more over, the lineEndConventions can be handled by the very same trick.
To achieve this, we install a latin1Map and latin1Encodings as class
instance variables in TextConverter to generalize Andreas'
squeakToUtf8 trick.
Then we copy these in instance variables. This is to enable
lineEndConventions mapping.
Then we care to #installLineEndConventionInConverter in
MultiByteFileStream to initialize TextConverter variables.
Then we use following dispatching:
MultiByteFileStream nextPutAll:
-> TextConverter nextPutAll:
And use a hack for ByteString in TextConverter nextPutAll:
(dispatching on String would be overkill by now).
To measure how it can enhance things, I put a hack
in Debugger class>>#openOn:context:label:contents:fullView:
Preferences logDebuggerStackToFile
ifTrue: [MessageTally spyOn: [Smalltalk
logError: title
inContext: context
to: 'SqueakDebug.log']]
Then did a 0/0
FasterLatin1Conversion reduce tallies from - 11452 tallies, 11586
msec. to - 3625 tallies, 3718 msec.
Still a little bit long for a Debugger to open, but already better.
Nicolas
'From Pharo0.1 of 16 May 2008 [Latest update: #10300] on 10 May 2009 at 1:01:14 am'!
"Change Set: FasterLatin1Conversion-Part1
Date: 10 May 2009
Author: nice
Install fast latin1 conversion Part1
generalize Andreas Raab trick from ByteString>>#squeakToUtf8
to every converter using class instance variables"!
TextConverter class
instanceVariableNames: 'latin1Map latin1Encodings '!
!TextConverter class methodsFor: 'accessing' stamp: 'nice 5/10/2009 00:45'!
initializeLatin1MapAndEncodings
"Initialize the latin1Map and latin1Encodings.
These variables ensure that conversions from latin1 ByteString is reasonably fast"
| latin1 utf8 |
latin1Map := ByteArray new: 256.
latin1Encodings := Array new: 256.
0 to: 255 do:[:i|
utf8 := (String new: 8) writeStream.
latin1 := String with: (Character value: i).
self new nextPut: latin1 first toStream: utf8.
utf8 := utf8 contents.
latin1 = utf8 ifTrue:[
latin1Map at: i+1 put: 0. "no translation needed"
] ifFalse:[
latin1Map at: i+1 put: 1. "translation needed"
latin1Encodings at: i+1 put: utf8.
].
].! !
!TextConverter class methodsFor: 'accessing' stamp: 'nice 5/9/2009 23:35'!
latin1Encodings
"Answer an Array mapping latin1 characters to conversion string"
^latin1Encodings ifNil:
[self initializeLatin1MapAndEncodings.
latin1Encodings]! !
!TextConverter class methodsFor: 'accessing' stamp: 'nice 5/9/2009 23:34'!
latin1Map
"Answer a ByteArray map telling if latin1 characters needs conversion or not"
^latin1Map ifNil:
[self initializeLatin1MapAndEncodings.
latin1Map]! !
TextConverter class
instanceVariableNames: 'latin1Map latin1Encodings'!
'From Pharo0.1 of 16 May 2008 [Latest update: #10300] on 10 May 2009 at 1:01:18 am'!
"Change Set: FasterLatin1Conversion-Part2
Date: 10 May 2009
Author: nice
Install fast latin1 conversion Part2
Create TextConverter instance variables to handle
latin1 fast conversion + lineEndConventions fast conversion"!
Object subclass: #TextConverter
instanceVariableNames: 'latin1Map latin1Encodings '
classVariableNames: ''
poolDictionaries: 'EventSensorConstants'
category: 'Multilingual-TextConversion'!
!MultiByteFileStream methodsFor: 'private' stamp: 'nice 5/10/2009 00:13'!
installLineEndConventionInConverter
converter ifNotNil: [converter installLineEndConvention: (self doConversion
ifTrue: [LineEndStrings at: lineEndConvention]
ifFalse: [nil])]! !
!TextConverter methodsFor: 'initialize-release' stamp: 'nice 5/10/2009 00:09'!
installLineEndConvention: lineEndStringOrNil
latin1Map := self class latin1Map.
latin1Encodings := self class latin1Encodings.
lineEndStringOrNil ifNotNil:
[latin1Encodings := latin1Encodings copy.
latin1Encodings at: Character cr asciiValue + 1 put: (self convertFromSystemString: lineEndStringOrNil).
latin1Map := latin1Map copy.
latin1Map at: Character cr asciiValue + 1 put: 1]! !
!TextConverter methodsFor: 'conversion' stamp: 'nice 5/10/2009 00:03'!
nextPutAll: aString toStream: aStream
"Handle fast conversion if ByteString"
| lastIndex nextIndex |
aString class == ByteString ifFalse: [
(latin1Map at: Character cr asciiValue + 1) = 0
ifTrue: [
aString do: [:char | self nextPut: char toStream: aStream]]
ifFalse: [
aString do: [:char | aStream nextPut: char]].
^self].
lastIndex := 1.
[nextIndex := ByteString findFirstInString: aString inSet: latin1Map startingAt: lastIndex.
nextIndex = 0] whileFalse:
[aStream next: nextIndex-lastIndex putAll: aString startingAt: lastIndex.
aStream basicNextPutAll: (latin1Encodings at: (aString byteAt: nextIndex)+1).
lastIndex := nextIndex + 1].
aStream next: aString size-lastIndex+1 putAll: aString startingAt: lastIndex.
^self! !
Object subclass: #TextConverter
instanceVariableNames: 'latin1Map latin1Encodings'
classVariableNames: ''
poolDictionaries: 'EventSensorConstants'
category: 'Multilingual-TextConversion'!
'From Pharo0.1 of 16 May 2008 [Latest update: #10300] on 10 May 2009 at 1:01:25 am'!
"Change Set: FasterLatin1Conversion-Part3
Date: 10 May 2009
Author: nice
Install fast latin1 conversion Part3
Use #installLineEndConventionInConverter where due.
that is whenever converter or lineEndConvention are changed in MultiByteFileStream"!
!MultiByteFileStream methodsFor: 'accessing' stamp: 'nice 5/10/2009 00:14'!
binary
super binary.
lineEndConvention := nil.
self installLineEndConventionInConverter! !
!MultiByteFileStream methodsFor: 'accessing' stamp: 'nice 5/10/2009 00:17'!
converter
converter ifNil: [converter := TextConverter defaultSystemConverter.
self installLineEndConventionInConverter].
^ converter
! !
!MultiByteFileStream methodsFor: 'accessing' stamp: 'nice 5/10/2009 00:18'!
converter: aConverter
converter := aConverter.
self installLineEndConventionInConverter
! !
!MultiByteFileStream methodsFor: 'accessing' stamp: 'nice 5/10/2009 00:14'!
lineEndConvention: aSymbol
lineEndConvention := aSymbol.
self installLineEndConventionInConverter! !
!MultiByteFileStream methodsFor: 'crlf private' stamp: 'nice 5/10/2009 00:53'!
detectLineEndConvention
"Detect the line end convention used in this stream. The result may be either #cr, #lf or #crlf."
| char numRead state |
self isBinary ifTrue: [^ self error: 'Line end conventions are not used on binary streams'].
self wantsLineEndConversion ifFalse: [lineEndConvention := nil.
self installLineEndConventionInConverter.
^lineEndConvention].
self closed ifTrue: [lineEndConvention := LineEndDefault.
self installLineEndConventionInConverter.
^lineEndConvention].
"Default if nothing else found"
numRead := 0.
state := converter saveStateOf: self.
lineEndConvention := nil.
[super atEnd not and: [numRead < LookAheadCount]]
whileTrue:
[char := self next.
char = Lf
ifTrue:
[converter restoreStateOf: self with: state.
lineEndConvention := #lf.
self installLineEndConventionInConverter.
^lineEndConvention].
char = Cr
ifTrue:
[self peek = Lf
ifTrue: [lineEndConvention := #crlf]
ifFalse: [lineEndConvention := #cr].
converter restoreStateOf: self with: state.
self installLineEndConventionInConverter.
^ lineEndConvention].
numRead := numRead + 1].
converter restoreStateOf: self with: state.
lineEndConvention := LineEndDefault.
self installLineEndConventionInConverter.
^ lineEndConvention! !
!MultiByteFileStream methodsFor: 'open/close' stamp: 'nice 5/10/2009 00:18'!
reset
super reset.
converter ifNil: [
converter := UTF8TextConverter new.
self installLineEndConventionInConverter
].
! !
"Postscript:
Leave the line above, and replace the rest of this comment by a useful one.
Executable statements should follow this comment, and should
be separated by periods, with no exclamation points (!!).
Be sure to put any further comments in double-quotes, like this one."
SourceFiles do: [:e | (e isKindOf: MultiByteFileStream) ifTrue: [
e installLineEndConventionInConverter]]!
'From Pharo0.1 of 16 May 2008 [Latest update: #10300] on 10 May 2009 at 1:01:25 am'!
"Change Set: FasterLatin1Conversion-Part4
Date: 10 May 2009
Author: nice
Install fast latin1 conversion Part4
Install fast latin1 conversion in MultiByteFileStream>>#nextPutAll:"!
!MultiByteFileStream methodsFor: 'public' stamp: 'nice 5/10/2009 00:21'!
nextPutAll: aCollection
(self isBinary or: [aCollection class == ByteArray]) ifTrue: [
^ super nextPutAll: aCollection.
].
self converter nextPutAll: aCollection toStream: self.! !
_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project