The code is in XMLParser: see XMLEncodingDetector. I can port it, if you think 
the algorithm is appropriate.

The YAML algorithm is actually a less restrictive version of this XML one: 
https://www.w3.org/TR/REC-xml/#sec-guessing
The XML one is "Non-Normative" (ie optional), so I chose to implement the more 
general YAML algorithm instead.

> Sent: Friday, March 16, 2018 at 6:44 AM
> From: "Sven Van Caekenberghe" <[email protected]>
> To: "Pharo Development List" <[email protected]>
> Subject: Re: [Pharo-dev] Executive Summary of the recent FileStream Changes
>
> 
> 
> > On 16 Mar 2018, at 07:05, monty <[email protected]> wrote:
> > 
> >> Sent: Thursday, March 15, 2018 at 4:01 PM
> >> From: "Sven Van Caekenberghe" <[email protected]>
> >> To: "Pharo Development List" <[email protected]>
> >> Subject: [Pharo-dev] Executive Summary of the recent FileStream Changes
> >> 
> >> Executive Summary of the recent FileStream Changes
> >> 
> >> In Pharo 7 Guille Polito recently committed a heroic set of changes that 
> >> we were planning to do for a long time but were afraid to take on.
> >> 
> >> The idea is to replace a couple of fat, overly complex, multi-functional, 
> >> do-all classes with a set of simpler single purpose classes that can be 
> >> combined as needed.
> >> 
> >> The classes that we want to get rid of can be found in the package 
> >> DeprecatedFileSystem, in particular FileStream, StandardFileStream, 
> >> MultiByteFileStream, MultiByteBinaryOrTextStream and RWBinaryOrTextStream.
> > 
> > StandardFileStream, at least, should remain for backwards compatibility and 
> > cross-platform compatibility with Squeak. It's a no-frills, non-decoding, 
> > non-LE normalizing stream that is heavily depended on.
> 
> Hmm, maybe.
> 
> The standard (no pun intended) interface to the file system in Pharo has been 
> FileSystem (FileReference) for quite a while. Many packages dealing with 
> either different Pharo versions or different Smalltalk implementations have 
> constructed their own portability facade (heck, I even did it in 
> ZnFileSystemUtils myself).
> 
> Note however that some aspects (API, behaviour) about the streams themselves 
> changed as well (no longer being bivalent, separating reading/writing, 
> smaller/simpler API, sometimes no positioning).
> 
> >> The replacements are can be found in packages Files and 
> >> Zinc-Character-Encoding-Core.
> >> 
> >> Encoding and decoding characters to and from bytes is done using classes 
> >> that you wrap around a more primitive binary stream. The same goes for 
> >> buffering or translating line endings.
> >> 
> >> For example,
> >> 
> >> '/Users/sven/Desktop/foo.txt' asFileReference binaryReadStream.
> >> 
> >> gives you a ZnBufferedWriteStream wrapping a BinaryWriteStream.
> >> 
> >> While,
> >> 
> >> '/Users/sven/Desktop/foo.txt' asFileReference readStream.
> > 
> > What do you think about this algorithm for encoding detection: 
> > http://www.yaml.org/spec/1.2/spec.html#id2771184
> > 
> > I have an implementation (with tests), if you're interested. (I was waiting 
> > to propose it until the FileSystem API switched over to using Zn streams 
> > and encoders. The TextConverter API doesn't support UTF-32.)
> 
> I did a primitive one in ZnCharacterEncoding class>>#detectEncoding: but I am 
> not happy with it. I will read your reference and I am certainly interested 
> in seeing your code !
> 
> >> gives a ZnCharacterReadStream wrapping a ZnBufferedWriteStream wrapping a 
> >> BinaryWriteStream.
> >> 
> >> To translate line endings, we would wrap a ZnCharacterWriteStream using a 
> >> ZnCrPortableWriteStream.
> >> 
> >> There are a couple of more specialised streams to cover special cases 
> >> (like read and writing at the same time).
> >> 
> >> SocketStream remains another fat, overly complex, multi-functional, do-all 
> >> class, for which usable replacements exist in the form of ZdcSocketStream 
> >> and ZdcSecureSocketStream, which are simpler, cleaner and binary only.
> >> 
> >> Of course, switching is more than replacing one class with a 100% 
> >> compatible alternative, that would give us the same complex result. The 
> >> challenge is to use a simpler API as well, to rethink how the streams are 
> >> used. You know, KISS.
> >> 
> >> Of course, we are far from done and need more testing, debugging and help 
> >> from as many people as possible.
> >> 
> >> Sven
> >> 
> >> 
> >> --
> >> Sven Van Caekenberghe
> >> Proudly supporting Pharo
> >> http://pharo.org
> >> http://association.pharo.org
> >> http://consortium.pharo.org
> 
> 
> 

Reply via email to