Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-21 Thread Guillermo Polito
Hi all,

Don't forget also that this are the low level bricks used in Pharo. Utterly
required by the bootstrap and the minimal image, thus keeping dependencies
small and controlled is a big +.

Magic and automatic stuff can be added on top. Nothing prevents to add a
little layer on top as a separate library (I even encourage people to do
so!) adding a builder, or a DSL to ease the construction of streams in the
most performant way possible.
Pharo has extension methods, and even this could be done with some extra
objects managing the creation of the stream without being too complex.

Guille

On Tue, Mar 20, 2018 at 3:12 PM, Sven Van Caekenberghe  wrote:

>
>
> > On 20 Mar 2018, at 14:51, Andrew P. Black  wrote:
> >
> > I like the structure of streams wrapping streams from the point of view
> of modularity of implementation and flexibility of reuse (as in applying
> buffering and encoding to in-memory or network streams).
> >
> > But I don’t like exposing al the machinery to the user.  For example,
> instead of
> >
> >   bufferedReadStream := ZnBufferedReadStream on: aReadStream.
> >   bufferedWriteStream := ZnBufferedWriteStream on: aWriteStream.
> >
> > why not
> >
> >   bufferedReadStream := ZnBuffer on: aReadStream.
> >   bufferedWriteStream := ZnBuffer on: aWriteStream
> >
> > where the class that implements the buffering looks at the stream and
> decides whether to do read or write buffering (or both, for a read-write
> stream).
>
> Hmm, then I would prefer real object messages, like #buffered not class
> facades.
>
> > The meaning of #position: on an encoded Unicode stream is perfectly
> well-defined.  It’s just not the same as the meaning of position on the
> underlying byteStream — one counts in codePoints, the other in bytes.
>
> Yes, agreed.
>
> > The old file streams got this wrong; I’m hoping that the new ones get it
> right.  However, the comment that
> >
> >   "As a side note: using #size and #position[:] on variable encoded
> streams
> >   like UTF8 is *very* dangerous and should be avoided.”
> >
> > makes me worry that they do not.  Please, either implement #size and
> #position[:]   correctly on the encoded stream, or don’t implement them at
> all!  Do *not* implement them incorrectly, and then tell me to avoid them
> because they are dangerous!
>
> Well, I will repeat my opinion: #size #position #position: #reset have no
> place in streams.
>
> I have an experimental implementation of a ZnPositionableReadStream that
> adds 'excursion behaviour' as an add-on on top of more primitive streams
> that do not. It allows for code like
>
>   stream savingPositionDo: [ "read ahead" ].
>   "continue reading as if noting happened"
>
> Here is the class comment.
>
> I am ZnPositionableReadStream.
> I am polymorphic with (the most used/important methods of) ReadStream and
> PositionableStream.
>
> I wrap another read stream and store the elements that I read in a sliding
> circular buffer so that I am able to go back to any position inside that
> buffer.
>
> Essentially, I implement #position and #position: to be used to back out
> of reading ahead.
>
> Note that the size of my buffer limits how far I can go backwards. A
> SubscriptOutOfBounds exception will be signalled when an attempt is made to
> go too far backwards.
>
> The index returned by #position should be considered abstract, without
> concrete meaning, but it is currently implemented as the count of elements
> read by #next on the wrapped stream. On a simple stream over an in memory
> collection, that will be equivalent to an integer index into that
> collection. But on network streams or streams that were already further
> along, this will no longer be the case.
>
> The most elementary example of my capabilities can be seen in my
> implementation of #peek. See also the unit tests #testPlainExcursion and
> #testSearch
>
> Of course, backing out of an excursion is only possible within the window
> of the buffer size.
>
> Implementation
>
> - stream  the read stream that I wrap and add positioning to
> - buffer  sliding, circular buffer
> - index  zero based index into buffer, where next will be
> stored
> - count  number of next operations done on wrapped stream
> - delta  number of positions that I was moved backwards
>
> The real core methods are #next, #atEnd, #position and #position: and are
> used to implement the rest.
>
> Part of Zinc HTTP Components
>
>
>
>


-- 



Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - *http://www.cnrs.fr
*


*Web:* *http://guillep.github.io* 

*Phone: *+33 06 52 70 66 13


Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-20 Thread Sven Van Caekenberghe


> On 20 Mar 2018, at 14:51, Andrew P. Black  wrote:
> 
> I like the structure of streams wrapping streams from the point of view of 
> modularity of implementation and flexibility of reuse (as in applying 
> buffering and encoding to in-memory or network streams).
> 
> But I don’t like exposing al the machinery to the user.  For example, instead 
> of
> 
>   bufferedReadStream := ZnBufferedReadStream on: aReadStream.
>   bufferedWriteStream := ZnBufferedWriteStream on: aWriteStream.
> 
> why not
> 
>   bufferedReadStream := ZnBuffer on: aReadStream.
>   bufferedWriteStream := ZnBuffer on: aWriteStream
> 
> where the class that implements the buffering looks at the stream and decides 
> whether to do read or write buffering (or both, for a read-write stream).

Hmm, then I would prefer real object messages, like #buffered not class facades.

> The meaning of #position: on an encoded Unicode stream is perfectly 
> well-defined.  It’s just not the same as the meaning of position on the 
> underlying byteStream — one counts in codePoints, the other in bytes.

Yes, agreed.

> The old file streams got this wrong; I’m hoping that the new ones get it 
> right.  However, the comment that
> 
>   "As a side note: using #size and #position[:] on variable encoded 
> streams 
>   like UTF8 is *very* dangerous and should be avoided.”
> 
> makes me worry that they do not.  Please, either implement #size and 
> #position[:]   correctly on the encoded stream, or don’t implement them at 
> all!  Do *not* implement them incorrectly, and then tell me to avoid them 
> because they are dangerous!

Well, I will repeat my opinion: #size #position #position: #reset have no place 
in streams.

I have an experimental implementation of a ZnPositionableReadStream that adds 
'excursion behaviour' as an add-on on top of more primitive streams that do 
not. It allows for code like

  stream savingPositionDo: [ "read ahead" ].
  "continue reading as if noting happened"

Here is the class comment.

I am ZnPositionableReadStream.
I am polymorphic with (the most used/important methods of) ReadStream and 
PositionableStream.

I wrap another read stream and store the elements that I read in a sliding 
circular buffer so that I am able to go back to any position inside that 
buffer. 

Essentially, I implement #position and #position: to be used to back out of 
reading ahead.

Note that the size of my buffer limits how far I can go backwards. A 
SubscriptOutOfBounds exception will be signalled when an attempt is made to go 
too far backwards.

The index returned by #position should be considered abstract, without concrete 
meaning, but it is currently implemented as the count of elements read by #next 
on the wrapped stream. On a simple stream over an in memory collection, that 
will be equivalent to an integer index into that collection. But on network 
streams or streams that were already further along, this will no longer be the 
case.

The most elementary example of my capabilities can be seen in my implementation 
of #peek. See also the unit tests #testPlainExcursion and #testSearch

Of course, backing out of an excursion is only possible within the window of 
the buffer size.

Implementation

- stream  the read stream that I wrap and add positioning to
- buffer  sliding, circular buffer
- index  zero based index into buffer, where next will be 
stored
- count  number of next operations done on wrapped stream
- delta  number of positions that I was moved backwards

The real core methods are #next, #atEnd, #position and #position: and are used 
to implement the rest.

Part of Zinc HTTP Components





Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-20 Thread Sven Van Caekenberghe


> On 20 Mar 2018, at 12:56, Henrik Sperre Johansen 
>  wrote:
> 
> This is great! 
> And, +1 on the need for some more convenience constructors...
> 
> Word of warning (which would be nice to mention in the guide); using
> writeStreamDo: and manually constructed buffered writers will not #flush the
> buffered stream automatically on close, so it's crucial this is done
> manually.

Yes, you are right, but it is clearly documented in the class comment of 
ZnBufferedWriteStream

There is also ZnBufferedWriteStream class>>#on:do: to help

Now, these are system classes more than user classes. Normal users should get a 
ready made stream not knowing much about the construction process.

> charGen := Random seed: 16543986234. 
> output := String new: 1024*1024*50.
> 1 to: output size do: [ :ix | output at: ix put: ((charGen nextInt: 256) -
> 1) asCharacter ].
> 
> [(File named: 'utf16test.txt') writeStreamDo: [:ws | |encStream|
>   encStream := ZnCharacterWriteStream 
>   on: (ZnBufferedWriteStream on: ws) 
>   encoding: 'utf16'.
>   encStream nextPutAll: output
>   ]] timeToRun. 
> 
> FileSize: 102 336 KB.
> 
> [(File named: 'utf16test2.txt') writeStreamDo: [:ws | |encStream|
>   encStream := ZnCharacterWriteStream 
>   on: (ZnBufferedWriteStream on: ws) 
>   encoding: 'utf16'.
>   encStream nextPutAll: output;flush
>   ]] timeToRun. 
> 
> FileSize: 102 400 KB.
> 
> It's about 4 times slower than the corresponding 
> 
> [FileStream forceNewFileNamed: 'utf16test2.txt' do: [:ws | 
>   ws converter: UTF16TextConverter new.
>   ws nextPutAll: output
>   ]] timeToRun
> 
> but it's 100MB of UTF16, and 4 secs vs 1, so probably not relevant to real
> use cases .

Ah, but you are cheating (deliberately or not ;-).

Your test string is a ByteString not a WideString, and UTF16TextConverter 
contains a fast path that basically does nothing.

Try another string, like

1 to: output size do: [ :ix | output at: ix put: ((charGen nextInt: 1024) -
1) asCharacter ].

Zn is the same speed, the old one does not even stop for me.

I am not sure we have to optimise UTF16, maybe ... adding the same fast path 
would not be too difficult I guess.

> Another addition to the guide valuable to those of use still stuck in the
> stone age, would be code  equivalents to FileStream methods:
> forceNewFileNamed:do:
> newFileNamed:do:
> oldFileNamed:do:
> 
> Cheers,
> Henry
> 
> 
> 
> 
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
> 




Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-20 Thread Andrew P. Black
I like the structure of streams wrapping streams from the point of view of 
modularity of implementation and flexibility of reuse (as in applying buffering 
and encoding to in-memory or network streams).

But I don’t like exposing al the machinery to the user.  For example, instead of

bufferedReadStream := ZnBufferedReadStream on: aReadStream.
bufferedWriteStream := ZnBufferedWriteStream on: aWriteStream.

why not

bufferedReadStream := ZnBuffer on: aReadStream.
bufferedWriteStream := ZnBuffer on: aWriteStream

where the class that implements the buffering looks at the stream and decides 
whether to do read or write buffering (or both, for a read-write stream).

The meaning of #position: on an encoded Unicode stream is perfectly 
well-defined.  It’s just not the same as the meaning of position on the 
underlying byteStream — one counts in codePoints, the other in bytes. The old 
file streams got this wrong; I’m hoping that the new ones get it right.  
However, the comment that

"As a side note: using #size and #position[:] on variable encoded 
streams 
like UTF8 is *very* dangerous and should be avoided.”

makes me worry that they do not.  Please, either implement #size and 
#position[:]   correctly on the encoded stream, or don’t implement them at all! 
 Do *not* implement them incorrectly, and then tell me to avoid them because 
they are dangerous!

Andrew




Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-20 Thread Thierry Goubier
2018-03-20 13:20 GMT+01:00 Henrik Sperre Johansen
:
> EstebanLM wrote
>>> On 20 Mar 2018, at 12:56, Henrik Sperre Johansen 
>
>> henrik.s.johansen@
>
>>  wrote:
>>>
>>> This is great!
>>> And, +1 on the need for some more convenience constructors...
>>>
>>> Word of warning (which would be nice to mention in the guide); using
>>> writeStreamDo: and manually constructed buffered writers will not #flush
>>> the
>>> buffered stream automatically on close, so it's crucial this is done
>>> manually.
>>
>> well… in fact it should flush it, isn’t?
>>
>>>
>
> The binary filestream flushes to disk on each write operation iirc (or, at
> least when it is being closed), but it has no way to know it is wrapped by a
> buffered stream which also needs to be flushed to actually write the
> remaining buffer contents to file.

Could a binary filestream that has flush requirements (and close
requirements as well) be made aware of the presence of the buffered
stream above it to ensure a close / flush is done correctly?

Could structures where you have multiple buffered streams concurrently
opened on a single binary file stream considered as not exactly
correct?

Thierry

>
> Cheers,
> Henry
>
>
>
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>



Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-20 Thread Henrik Sperre Johansen
EstebanLM wrote
>> On 20 Mar 2018, at 12:56, Henrik Sperre Johansen 

> henrik.s.johansen@

>  wrote:
>> 
>> This is great! 
>> And, +1 on the need for some more convenience constructors...
>> 
>> Word of warning (which would be nice to mention in the guide); using
>> writeStreamDo: and manually constructed buffered writers will not #flush
>> the
>> buffered stream automatically on close, so it's crucial this is done
>> manually.
> 
> well… in fact it should flush it, isn’t? 
> 
>>

The binary filestream flushes to disk on each write operation iirc (or, at
least when it is being closed), but it has no way to know it is wrapped by a
buffered stream which also needs to be flushed to actually write the
remaining buffer contents to file.

Cheers,
Henry



--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html



Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-20 Thread Esteban Lorenzano


> On 20 Mar 2018, at 12:56, Henrik Sperre Johansen 
>  wrote:
> 
> This is great! 
> And, +1 on the need for some more convenience constructors...
> 
> Word of warning (which would be nice to mention in the guide); using
> writeStreamDo: and manually constructed buffered writers will not #flush the
> buffered stream automatically on close, so it's crucial this is done
> manually.

well… in fact it should flush it, isn’t? 

> 
> charGen := Random seed: 16543986234. 
> output := String new: 1024*1024*50.
> 1 to: output size do: [ :ix | output at: ix put: ((charGen nextInt: 256) -
> 1) asCharacter ].
> 
> [(File named: 'utf16test.txt') writeStreamDo: [:ws | |encStream|
>   encStream := ZnCharacterWriteStream 
>   on: (ZnBufferedWriteStream on: ws) 
>   encoding: 'utf16'.
>   encStream nextPutAll: output
>   ]] timeToRun. 
> 
> FileSize: 102 336 KB.
> 
> [(File named: 'utf16test2.txt') writeStreamDo: [:ws | |encStream|
>   encStream := ZnCharacterWriteStream 
>   on: (ZnBufferedWriteStream on: ws) 
>   encoding: 'utf16'.
>   encStream nextPutAll: output;flush
>   ]] timeToRun. 
> 
> FileSize: 102 400 KB.
> 
> It's about 4 times slower than the corresponding 
> 
> [FileStream forceNewFileNamed: 'utf16test2.txt' do: [:ws | 
>   ws converter: UTF16TextConverter new.
>   ws nextPutAll: output
>   ]] timeToRun
> 
> but it's 100MB of UTF16, and 4 secs vs 1, so probably not relevant to real
> use cases .

is always relevant. 
I wonder why would be 4x to the 2nd one.

Esteban

> 
> Another addition to the guide valuable to those of use still stuck in the
> stone age, would be code  equivalents to FileStream methods:
> forceNewFileNamed:do:
> newFileNamed:do:
> oldFileNamed:do:
> 
> Cheers,
> Henry
> 
> 
> 
> 
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
> 




Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-20 Thread Henrik Sperre Johansen
This is great! 
And, +1 on the need for some more convenience constructors...

Word of warning (which would be nice to mention in the guide); using
writeStreamDo: and manually constructed buffered writers will not #flush the
buffered stream automatically on close, so it's crucial this is done
manually.

charGen := Random seed: 16543986234. 
output := String new: 1024*1024*50.
1 to: output size do: [ :ix | output at: ix put: ((charGen nextInt: 256) -
1) asCharacter ].

[(File named: 'utf16test.txt') writeStreamDo: [:ws | |encStream|
encStream := ZnCharacterWriteStream 
on: (ZnBufferedWriteStream on: ws) 
encoding: 'utf16'.
encStream nextPutAll: output
]] timeToRun. 

FileSize: 102 336 KB.

[(File named: 'utf16test2.txt') writeStreamDo: [:ws | |encStream|
encStream := ZnCharacterWriteStream 
on: (ZnBufferedWriteStream on: ws) 
encoding: 'utf16'.
encStream nextPutAll: output;flush
]] timeToRun. 

FileSize: 102 400 KB.

It's about 4 times slower than the corresponding 

[FileStream forceNewFileNamed: 'utf16test2.txt' do: [:ws | 
ws converter: UTF16TextConverter new.
ws nextPutAll: output
]] timeToRun

but it's 100MB of UTF16, and 4 secs vs 1, so probably not relevant to real
use cases .

Another addition to the guide valuable to those of use still stuck in the
stone age, would be code  equivalents to FileStream methods:
forceNewFileNamed:do:
newFileNamed:do:
oldFileNamed:do:

Cheers,
Henry




--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html



Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-20 Thread Sven Van Caekenberghe
I would very much prefer to take things slowly.

First we have to make the whole migration work, we are not yet there, there are 
lots of important things to fix.

If there are serious performance regressions, then yes we have to fix them.

But improving things or adding features should be done when everything is 
stable again.

> On 20 Mar 2018, at 10:31, Denis Kudriashov  wrote:
> 
> 
> 2018-03-20 10:15 GMT+01:00 Denis Kudriashov :
> We can cover this case by extra variable maxBufferSize.
> 
> Operations like upToEnd will relocate maximum buffer. 
> #next: related methods can be also improved by it too. We will need extra 
> method: 
> 
> ZnBufferedReadStream>>nextBuffer: requiredSize
> newBufferSize := requiredSize min: maxBufferSize max: minBufferSize.
> buffer size = newBufferSize ifFalse: [
>buffer := self collectionSpecies new: size].
>   limit := stream readInto: buffer startingAt: 1 count: buffer size.
>   position := 1
> 
> But it will require another variable minBufferSize.
> 
> By default it would be equal to given size in method #sizeBuffer:. And 
> special users will specify concrete value. FileHandler will use file size for 
> this.
> It will also improve buffer size for small files where we do not need big 
> buffer.
> 
> 2018-03-20 9:57 GMT+01:00 Guillermo Polito :
> 
> 
> On Tue, Mar 20, 2018 at 8:57 AM, Sven Van Caekenberghe  wrote:
> 
> 
> > On 20 Mar 2018, at 00:19, Sean P. DeNigris  wrote:
> >
> > Guillermo Polito wrote
> >> I've done also some profiling and it does not look like we've lost in
> >> performance either (reading and decoding a 35MB file):
> >
> > Your 1st (new API) example: "0:00:00:01.535"
> > Your 2nd (old API) example:"0:00:00:01.732"
> > *** but ***
> > Your 1st (new API) example using the same consuming selector as example #1
> > (#upToEnd): "0:00:00:07.816" or 4.5x slower
> 
> Yes and no ;-)
> 
> Look at MultiByteFileStream>>#upToEnd where they cheat a bit and do basically 
> the same as example 1 using #next:
> 
> The Zn #upToEnd implementation does not use size knowledge (as streams are 
> infinite).
> 
> As a side note: using #size and #position[:] on variable encoded streams like 
> UTF8 is *very* dangerous and should be avoided. Counting/indexing bytes in 
> the underlying stream is *not* the same as counting/indexing decoded 
> characters. It can be the source of subtle bugs.
> 
> Sean,
> 
> Think that the Buffered stream can be reused in the context of sockets too. 
> And there size has no meaning.
> 
> Basically the cost in there is on growing and copying the internal buffer 
> used to read. This buffer for the 35M file ends up being ~35M, so it has to 
> grow and copy several times...
> Using #next: will preallocate the good size for the internal buffer.
> 
> Now, I'd say first that if people want performance they should use the good 
> API :).
> 
> Maybe the ZnBuffered stream could ask its underlying stream for its 
> #preferredInternalBufferSizeForUpToAll. But that would contaminate all the 
> stream hierarchy... Is it worth it? Dunno... 
>  
> 
> 
> > -
> > Cheers,
> > Sean
> > --
> > Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
> >
> 
> 
> 
> 
> 
> -- 
>
> Guille Polito
> Research Engineer
> 
> Centre de Recherche en Informatique, Signal et Automatique de Lille
> CRIStAL - UMR 9189
> French National Center for Scientific Research - http://www.cnrs.fr
> 
> Web: http://guillep.github.io
> Phone: +33 06 52 70 66 13
> 
> 




Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-20 Thread Denis Kudriashov
2018-03-20 10:15 GMT+01:00 Denis Kudriashov :

> We can cover this case by extra variable maxBufferSize.
>

Operations like upToEnd will relocate maximum buffer.
#next: related methods can be also improved by it too. We will need extra
method:

ZnBufferedReadStream>>nextBuffer: requiredSize
newBufferSize := requiredSize min: maxBufferSize max: minBufferSize.
buffer size = newBufferSize ifFalse: [
   buffer := self collectionSpecies new: size].
limit := stream readInto: buffer startingAt: 1 count: buffer size.
position := 1


But it will require another variable minBufferSize.

By default it would be equal to given size in method #sizeBuffer:. And
> special users will specify concrete value. FileHandler will use file size
> for this.
> It will also improve buffer size for small files where we do not need big
> buffer.
>
> 2018-03-20 9:57 GMT+01:00 Guillermo Polito :
>
>>
>>
>> On Tue, Mar 20, 2018 at 8:57 AM, Sven Van Caekenberghe 
>> wrote:
>>
>>>
>>>
>>> > On 20 Mar 2018, at 00:19, Sean P. DeNigris 
>>> wrote:
>>> >
>>> > Guillermo Polito wrote
>>> >> I've done also some profiling and it does not look like we've lost in
>>> >> performance either (reading and decoding a 35MB file):
>>> >
>>> > Your 1st (new API) example: "0:00:00:01.535"
>>> > Your 2nd (old API) example:"0:00:00:01.732"
>>> > *** but ***
>>> > Your 1st (new API) example using the same consuming selector as
>>> example #1
>>> > (#upToEnd): "0:00:00:07.816" or 4.5x slower
>>>
>>> Yes and no ;-)
>>>
>>> Look at MultiByteFileStream>>#upToEnd where they cheat a bit and do
>>> basically the same as example 1 using #next:
>>>
>>> The Zn #upToEnd implementation does not use size knowledge (as streams
>>> are infinite).
>>>
>>> As a side note: using #size and #position[:] on variable encoded streams
>>> like UTF8 is *very* dangerous and should be avoided. Counting/indexing
>>> bytes in the underlying stream is *not* the same as counting/indexing
>>> decoded characters. It can be the source of subtle bugs.
>>>
>>
>> Sean,
>>
>> Think that the Buffered stream can be reused in the context of sockets
>> too. And there size has no meaning.
>>
>> Basically the cost in there is on growing and copying the internal buffer
>> used to read. This buffer for the 35M file ends up being ~35M, so it has to
>> grow and copy several times...
>> Using #next: will preallocate the good size for the internal buffer.
>>
>> Now, I'd say first that if people want performance they should use the
>> good API :).
>>
>> Maybe the ZnBuffered stream could ask its underlying stream for its
>> #preferredInternalBufferSizeForUpToAll. But that would contaminate all
>> the stream hierarchy... Is it worth it? Dunno...
>>
>>
>>>
>>>
>>> > -
>>> > Cheers,
>>> > Sean
>>> > --
>>> > Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>>> >
>>>
>>>
>>>
>>
>>
>> --
>>
>>
>>
>> Guille Polito
>>
>> Research Engineer
>>
>> Centre de Recherche en Informatique, Signal et Automatique de Lille
>>
>> CRIStAL - UMR 9189
>>
>> French National Center for Scientific Research - *http://www.cnrs.fr
>> *
>>
>>
>> *Web:* *http://guillep.github.io* 
>>
>> *Phone: *+33 06 52 70 66 13 <+33%206%2052%2070%2066%2013>
>>
>
>


Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-20 Thread Denis Kudriashov
We can cover this case by extra variable maxBufferSize.
By default it would be equal to given size in method #sizeBuffer:. And
special users will specify concrete value. FileHandler will use file size
for this.
It will also improve buffer size for small files where we do not need big
buffer.

2018-03-20 9:57 GMT+01:00 Guillermo Polito :

>
>
> On Tue, Mar 20, 2018 at 8:57 AM, Sven Van Caekenberghe 
> wrote:
>
>>
>>
>> > On 20 Mar 2018, at 00:19, Sean P. DeNigris 
>> wrote:
>> >
>> > Guillermo Polito wrote
>> >> I've done also some profiling and it does not look like we've lost in
>> >> performance either (reading and decoding a 35MB file):
>> >
>> > Your 1st (new API) example: "0:00:00:01.535"
>> > Your 2nd (old API) example:"0:00:00:01.732"
>> > *** but ***
>> > Your 1st (new API) example using the same consuming selector as example
>> #1
>> > (#upToEnd): "0:00:00:07.816" or 4.5x slower
>>
>> Yes and no ;-)
>>
>> Look at MultiByteFileStream>>#upToEnd where they cheat a bit and do
>> basically the same as example 1 using #next:
>>
>> The Zn #upToEnd implementation does not use size knowledge (as streams
>> are infinite).
>>
>> As a side note: using #size and #position[:] on variable encoded streams
>> like UTF8 is *very* dangerous and should be avoided. Counting/indexing
>> bytes in the underlying stream is *not* the same as counting/indexing
>> decoded characters. It can be the source of subtle bugs.
>>
>
> Sean,
>
> Think that the Buffered stream can be reused in the context of sockets
> too. And there size has no meaning.
>
> Basically the cost in there is on growing and copying the internal buffer
> used to read. This buffer for the 35M file ends up being ~35M, so it has to
> grow and copy several times...
> Using #next: will preallocate the good size for the internal buffer.
>
> Now, I'd say first that if people want performance they should use the
> good API :).
>
> Maybe the ZnBuffered stream could ask its underlying stream for its #
> preferredInternalBufferSizeForUpToAll. But that would contaminate all the
> stream hierarchy... Is it worth it? Dunno...
>
>
>>
>>
>> > -
>> > Cheers,
>> > Sean
>> > --
>> > Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
>> >
>>
>>
>>
>
>
> --
>
>
>
> Guille Polito
>
> Research Engineer
>
> Centre de Recherche en Informatique, Signal et Automatique de Lille
>
> CRIStAL - UMR 9189
>
> French National Center for Scientific Research - *http://www.cnrs.fr
> *
>
>
> *Web:* *http://guillep.github.io* 
>
> *Phone: *+33 06 52 70 66 13 <+33%206%2052%2070%2066%2013>
>


Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-20 Thread Guillermo Polito
On Tue, Mar 20, 2018 at 8:57 AM, Sven Van Caekenberghe  wrote:

>
>
> > On 20 Mar 2018, at 00:19, Sean P. DeNigris 
> wrote:
> >
> > Guillermo Polito wrote
> >> I've done also some profiling and it does not look like we've lost in
> >> performance either (reading and decoding a 35MB file):
> >
> > Your 1st (new API) example: "0:00:00:01.535"
> > Your 2nd (old API) example:"0:00:00:01.732"
> > *** but ***
> > Your 1st (new API) example using the same consuming selector as example
> #1
> > (#upToEnd): "0:00:00:07.816" or 4.5x slower
>
> Yes and no ;-)
>
> Look at MultiByteFileStream>>#upToEnd where they cheat a bit and do
> basically the same as example 1 using #next:
>
> The Zn #upToEnd implementation does not use size knowledge (as streams are
> infinite).
>
> As a side note: using #size and #position[:] on variable encoded streams
> like UTF8 is *very* dangerous and should be avoided. Counting/indexing
> bytes in the underlying stream is *not* the same as counting/indexing
> decoded characters. It can be the source of subtle bugs.
>

Sean,

Think that the Buffered stream can be reused in the context of sockets too.
And there size has no meaning.

Basically the cost in there is on growing and copying the internal buffer
used to read. This buffer for the 35M file ends up being ~35M, so it has to
grow and copy several times...
Using #next: will preallocate the good size for the internal buffer.

Now, I'd say first that if people want performance they should use the good
API :).

Maybe the ZnBuffered stream could ask its underlying stream for its
#preferredInternalBufferSizeForUpToAll. But that would contaminate all the
stream hierarchy... Is it worth it? Dunno...


>
>
> > -
> > Cheers,
> > Sean
> > --
> > Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
> >
>
>
>


-- 



Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - *http://www.cnrs.fr
*


*Web:* *http://guillep.github.io* 

*Phone: *+33 06 52 70 66 13


Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-20 Thread Sven Van Caekenberghe


> On 20 Mar 2018, at 00:19, Sean P. DeNigris  wrote:
> 
> Guillermo Polito wrote
>> I've done also some profiling and it does not look like we've lost in
>> performance either (reading and decoding a 35MB file):
> 
> Your 1st (new API) example: "0:00:00:01.535"
> Your 2nd (old API) example:"0:00:00:01.732"
> *** but ***
> Your 1st (new API) example using the same consuming selector as example #1
> (#upToEnd): "0:00:00:07.816" or 4.5x slower

Yes and no ;-)

Look at MultiByteFileStream>>#upToEnd where they cheat a bit and do basically 
the same as example 1 using #next:

The Zn #upToEnd implementation does not use size knowledge (as streams are 
infinite).

As a side note: using #size and #position[:] on variable encoded streams like 
UTF8 is *very* dangerous and should be avoided. Counting/indexing bytes in the 
underlying stream is *not* the same as counting/indexing decoded characters. It 
can be the source of subtle bugs.


> -
> Cheers,
> Sean
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html
> 




Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-19 Thread Sean P. DeNigris
Guillermo Polito wrote
> I've done also some profiling and it does not look like we've lost in
> performance either (reading and decoding a 35MB file):

Your 1st (new API) example: "0:00:00:01.535"
Your 2nd (old API) example:"0:00:00:01.732"
*** but ***
Your 1st (new API) example using the same consuming selector as example #1
(#upToEnd): "0:00:00:07.816" or 4.5x slower



-
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html



Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-19 Thread Stephane Ducasse
I love the idea of measuring statement :)


On Mon, Mar 19, 2018 at 6:10 PM, Guillermo Polito  wrote:

> Well, I'd say it we did it in the name of modularity. And yes, I believe
> that having separate responsibilities help in designing, testing and
> ensuring more easily the correctness of each of the parts in isolation.
>
> I've done also some profiling and it does not look like we've lost in
> performance either (reading and decoding a 35MB file):
>
> [file := Smalltalk sourcesFile fullName.
> (File named: file) readStreamDo: [ :binaryFile |
> (ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding:
> 'utf8') next: binaryFile size.
> ]] timeToRun. "0:00:00:01.976"
>
> [file := Smalltalk sourcesFile fullName.
> (MultiByteFileStream fileNamed: file)
> converter: (TextConverter newForEncoding: 'utf8');
> upToEnd
> ] timeToRun. "0:00:00:02.147"
>
>
>
> On Mon, Mar 19, 2018 at 5:51 PM, Richard O'Keefe  wrote:
>
>> The idea of stacking things like character coding and buffering 
>> by using layers of wrapping is very much in the spirit of Java and
>> is neither good for performance nor helpful for correctness.
>>
>>
>> On 20 March 2018 at 05:19, Guillermo Polito 
>> wrote:
>>
>>> Hi all,
>>>
>>> I've put some minutes summarizing the new APIs provided by the
>>> combination of the new File implementation and the Zn encoders. They all
>>> basically follow the decorator pattern to stack different responsibilities
>>> such as buffering, encoding, line ending convertions.
>>>
>>> Please, do not hesitate to give your feedback.
>>>
>>> Guille
>>>
>>>
>>> 1. Basic Files
>>>
>>> By default files are binary. Not buffered.
>>>
>>> (File named: 'name') readStream.
>>> (File named: 'name') readStreamDo: [ :stream | ... ].
>>> (File named: 'name') writeStream.
>>> (File named: 'name') writeStreamDo: [ :stream | ... ].
>>>
>>>
>>> 2. Encoding
>>>
>>> To add encoding, wrap a stream with a corresponding
>>> ZnCharacterRead/WriteStream.
>>>
>>> "Reading"
>>> utf8Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf8'.
>>> utf16Encoded := ZnCharacterReadStream on: aBinaryStream encoding:
>>> 'utf16'.
>>>
>>> "Writing"
>>> utf8Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf8'.
>>> utf16Encoded := ZnCharacterWriteStream on: aBinaryStream encoding:
>>> 'utf16'.
>>>
>>> 3. Buffering
>>>
>>> To add buffering, wrap a stream with a corresponding
>>> ZnBufferedRead/WriteStream.
>>>
>>> bufferedReadStream := ZnBufferedReadStream on: aStream.
>>> bufferedWriteStream := ZnBufferedWriteStream on: aStream.
>>>
>>> It is in general better to buffer the reading on the binary file and
>>> apply the encoding on the buffer in memory than the other way around. See
>>>
>>> [file := Smalltalk sourcesFile fullName.
>>> (File named: file) readStreamDo: [ :binaryFile |
>>> (ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile)
>>> encoding: 'utf8') upToEnd
>>> ]] timeToRun. "0:00:00:09.288"
>>>
>>> [file := Smalltalk sourcesFile fullName.
>>> (File named: file) readStreamDo: [ :binaryFile |
>>> (ZnBufferedReadStream on: (ZnCharacterReadStream on: binaryFile
>>> encoding: 'utf8')) upToEnd
>>> ]] timeToRun. "0:00:00:14.189"
>>>
>>> 4. File System
>>>
>>> By default, file system files are buffered and utf8 encoded to keep
>>> backwards compatibility.
>>>
>>> 'name' asFileReference readStreamDo: [ :bufferedUtf8Stream | ... ].
>>> 'name' asFileReference writeStreamDo: [ :bufferedUtf8Stream | ... ].
>>>
>>> FileStream also provides access to plain binary files using the
>>> #binaryRead/WriteStream messages. Binary streams are buffered by default
>>> also.
>>>
>>> 'name' asFileReference binaryReadStreamDo: [ :bufferedBinaryStream | ...
>>> ].
>>> 'name' asFileReference binaryWriteStreamDo: [ :bufferedBinaryStream |
>>> ... ].
>>>
>>> If you want a file with another encoding (to come in the PR
>>> https://github.com/pharo-project/pharo/pull/1134), you can specify it
>>> while obtaining the stream:
>>>
>>> 'name' asFileReference
>>> readStreamEncoded: 'utf16'
>>> do: [ :bufferedUtf16Stream | ... ].
>>>
>>> 'name' asFileReference
>>> writeStreamEncoded: 'utf8'
>>> do: [ :bufferedUtf16Stream | ... ].
>>>
>>> 5. Line Ending Conventions
>>>
>>> If you want to write files following a specific line ending convention,
>>> use the ZnNewLineWriterStream.
>>> This stream decorator will transform any line ending (cr, lf, crlf) into
>>> a defined line ending.
>>> By default it chooses the platform line ending convention.
>>>
>>> lineWriter := ZnNewLineWriterStream on: aStream.
>>>
>>> If you want to choose another line ending convention you can do:
>>>
>>> lineWriter forCr.
>>> lineWriter forLf.
>>> lineWriter forCrLf.
>>> lineWriter forPlatformLineEnding.
>>>
>>> --
>>>
>>>
>>>
>>> Guille Polito
>>>
>>> Research Engineer
>>>
>>> Centre de Recherche en Informatique, Signal et Automatique de Lille
>>>
>>> CRIStAL - UMR 9189
>>>

Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-19 Thread Stephane Ducasse
Thanks Guille for this great post!
I will turn it into a doc :)



On Mon, Mar 19, 2018 at 6:10 PM, Guillermo Polito  wrote:

> Well, I'd say it we did it in the name of modularity. And yes, I believe
> that having separate responsibilities help in designing, testing and
> ensuring more easily the correctness of each of the parts in isolation.
>
> I've done also some profiling and it does not look like we've lost in
> performance either (reading and decoding a 35MB file):
>
> [file := Smalltalk sourcesFile fullName.
> (File named: file) readStreamDo: [ :binaryFile |
> (ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding:
> 'utf8') next: binaryFile size.
> ]] timeToRun. "0:00:00:01.976"
>
> [file := Smalltalk sourcesFile fullName.
> (MultiByteFileStream fileNamed: file)
> converter: (TextConverter newForEncoding: 'utf8');
> upToEnd
> ] timeToRun. "0:00:00:02.147"
>
>
>
> On Mon, Mar 19, 2018 at 5:51 PM, Richard O'Keefe  wrote:
>
>> The idea of stacking things like character coding and buffering 
>> by using layers of wrapping is very much in the spirit of Java and
>> is neither good for performance nor helpful for correctness.
>>
>>
>> On 20 March 2018 at 05:19, Guillermo Polito 
>> wrote:
>>
>>> Hi all,
>>>
>>> I've put some minutes summarizing the new APIs provided by the
>>> combination of the new File implementation and the Zn encoders. They all
>>> basically follow the decorator pattern to stack different responsibilities
>>> such as buffering, encoding, line ending convertions.
>>>
>>> Please, do not hesitate to give your feedback.
>>>
>>> Guille
>>>
>>>
>>> 1. Basic Files
>>>
>>> By default files are binary. Not buffered.
>>>
>>> (File named: 'name') readStream.
>>> (File named: 'name') readStreamDo: [ :stream | ... ].
>>> (File named: 'name') writeStream.
>>> (File named: 'name') writeStreamDo: [ :stream | ... ].
>>>
>>>
>>> 2. Encoding
>>>
>>> To add encoding, wrap a stream with a corresponding
>>> ZnCharacterRead/WriteStream.
>>>
>>> "Reading"
>>> utf8Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf8'.
>>> utf16Encoded := ZnCharacterReadStream on: aBinaryStream encoding:
>>> 'utf16'.
>>>
>>> "Writing"
>>> utf8Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf8'.
>>> utf16Encoded := ZnCharacterWriteStream on: aBinaryStream encoding:
>>> 'utf16'.
>>>
>>> 3. Buffering
>>>
>>> To add buffering, wrap a stream with a corresponding
>>> ZnBufferedRead/WriteStream.
>>>
>>> bufferedReadStream := ZnBufferedReadStream on: aStream.
>>> bufferedWriteStream := ZnBufferedWriteStream on: aStream.
>>>
>>> It is in general better to buffer the reading on the binary file and
>>> apply the encoding on the buffer in memory than the other way around. See
>>>
>>> [file := Smalltalk sourcesFile fullName.
>>> (File named: file) readStreamDo: [ :binaryFile |
>>> (ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile)
>>> encoding: 'utf8') upToEnd
>>> ]] timeToRun. "0:00:00:09.288"
>>>
>>> [file := Smalltalk sourcesFile fullName.
>>> (File named: file) readStreamDo: [ :binaryFile |
>>> (ZnBufferedReadStream on: (ZnCharacterReadStream on: binaryFile
>>> encoding: 'utf8')) upToEnd
>>> ]] timeToRun. "0:00:00:14.189"
>>>
>>> 4. File System
>>>
>>> By default, file system files are buffered and utf8 encoded to keep
>>> backwards compatibility.
>>>
>>> 'name' asFileReference readStreamDo: [ :bufferedUtf8Stream | ... ].
>>> 'name' asFileReference writeStreamDo: [ :bufferedUtf8Stream | ... ].
>>>
>>> FileStream also provides access to plain binary files using the
>>> #binaryRead/WriteStream messages. Binary streams are buffered by default
>>> also.
>>>
>>> 'name' asFileReference binaryReadStreamDo: [ :bufferedBinaryStream | ...
>>> ].
>>> 'name' asFileReference binaryWriteStreamDo: [ :bufferedBinaryStream |
>>> ... ].
>>>
>>> If you want a file with another encoding (to come in the PR
>>> https://github.com/pharo-project/pharo/pull/1134), you can specify it
>>> while obtaining the stream:
>>>
>>> 'name' asFileReference
>>> readStreamEncoded: 'utf16'
>>> do: [ :bufferedUtf16Stream | ... ].
>>>
>>> 'name' asFileReference
>>> writeStreamEncoded: 'utf8'
>>> do: [ :bufferedUtf16Stream | ... ].
>>>
>>> 5. Line Ending Conventions
>>>
>>> If you want to write files following a specific line ending convention,
>>> use the ZnNewLineWriterStream.
>>> This stream decorator will transform any line ending (cr, lf, crlf) into
>>> a defined line ending.
>>> By default it chooses the platform line ending convention.
>>>
>>> lineWriter := ZnNewLineWriterStream on: aStream.
>>>
>>> If you want to choose another line ending convention you can do:
>>>
>>> lineWriter forCr.
>>> lineWriter forLf.
>>> lineWriter forCrLf.
>>> lineWriter forPlatformLineEnding.
>>>
>>> --
>>>
>>>
>>>
>>> Guille Polito
>>>
>>> Research Engineer
>>>
>>> Centre de Recherche en Informatique, Signal et Automatique de Lille
>>>
>>> 

Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-19 Thread Guillermo Polito
Well, I'd say it we did it in the name of modularity. And yes, I believe
that having separate responsibilities help in designing, testing and
ensuring more easily the correctness of each of the parts in isolation.

I've done also some profiling and it does not look like we've lost in
performance either (reading and decoding a 35MB file):

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding:
'utf8') next: binaryFile size.
]] timeToRun. "0:00:00:01.976"

[file := Smalltalk sourcesFile fullName.
(MultiByteFileStream fileNamed: file)
converter: (TextConverter newForEncoding: 'utf8');
upToEnd
] timeToRun. "0:00:00:02.147"



On Mon, Mar 19, 2018 at 5:51 PM, Richard O'Keefe  wrote:

> The idea of stacking things like character coding and buffering 
> by using layers of wrapping is very much in the spirit of Java and
> is neither good for performance nor helpful for correctness.
>
>
> On 20 March 2018 at 05:19, Guillermo Polito 
> wrote:
>
>> Hi all,
>>
>> I've put some minutes summarizing the new APIs provided by the
>> combination of the new File implementation and the Zn encoders. They all
>> basically follow the decorator pattern to stack different responsibilities
>> such as buffering, encoding, line ending convertions.
>>
>> Please, do not hesitate to give your feedback.
>>
>> Guille
>>
>>
>> 1. Basic Files
>>
>> By default files are binary. Not buffered.
>>
>> (File named: 'name') readStream.
>> (File named: 'name') readStreamDo: [ :stream | ... ].
>> (File named: 'name') writeStream.
>> (File named: 'name') writeStreamDo: [ :stream | ... ].
>>
>>
>> 2. Encoding
>>
>> To add encoding, wrap a stream with a corresponding
>> ZnCharacterRead/WriteStream.
>>
>> "Reading"
>> utf8Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf8'.
>> utf16Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf16'.
>>
>> "Writing"
>> utf8Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf8'.
>> utf16Encoded := ZnCharacterWriteStream on: aBinaryStream encoding:
>> 'utf16'.
>>
>> 3. Buffering
>>
>> To add buffering, wrap a stream with a corresponding
>> ZnBufferedRead/WriteStream.
>>
>> bufferedReadStream := ZnBufferedReadStream on: aStream.
>> bufferedWriteStream := ZnBufferedWriteStream on: aStream.
>>
>> It is in general better to buffer the reading on the binary file and
>> apply the encoding on the buffer in memory than the other way around. See
>>
>> [file := Smalltalk sourcesFile fullName.
>> (File named: file) readStreamDo: [ :binaryFile |
>> (ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile)
>> encoding: 'utf8') upToEnd
>> ]] timeToRun. "0:00:00:09.288"
>>
>> [file := Smalltalk sourcesFile fullName.
>> (File named: file) readStreamDo: [ :binaryFile |
>> (ZnBufferedReadStream on: (ZnCharacterReadStream on: binaryFile encoding:
>> 'utf8')) upToEnd
>> ]] timeToRun. "0:00:00:14.189"
>>
>> 4. File System
>>
>> By default, file system files are buffered and utf8 encoded to keep
>> backwards compatibility.
>>
>> 'name' asFileReference readStreamDo: [ :bufferedUtf8Stream | ... ].
>> 'name' asFileReference writeStreamDo: [ :bufferedUtf8Stream | ... ].
>>
>> FileStream also provides access to plain binary files using the
>> #binaryRead/WriteStream messages. Binary streams are buffered by default
>> also.
>>
>> 'name' asFileReference binaryReadStreamDo: [ :bufferedBinaryStream | ...
>> ].
>> 'name' asFileReference binaryWriteStreamDo: [ :bufferedBinaryStream | ...
>> ].
>>
>> If you want a file with another encoding (to come in the PR
>> https://github.com/pharo-project/pharo/pull/1134), you can specify it
>> while obtaining the stream:
>>
>> 'name' asFileReference
>> readStreamEncoded: 'utf16'
>> do: [ :bufferedUtf16Stream | ... ].
>>
>> 'name' asFileReference
>> writeStreamEncoded: 'utf8'
>> do: [ :bufferedUtf16Stream | ... ].
>>
>> 5. Line Ending Conventions
>>
>> If you want to write files following a specific line ending convention,
>> use the ZnNewLineWriterStream.
>> This stream decorator will transform any line ending (cr, lf, crlf) into
>> a defined line ending.
>> By default it chooses the platform line ending convention.
>>
>> lineWriter := ZnNewLineWriterStream on: aStream.
>>
>> If you want to choose another line ending convention you can do:
>>
>> lineWriter forCr.
>> lineWriter forLf.
>> lineWriter forCrLf.
>> lineWriter forPlatformLineEnding.
>>
>> --
>>
>>
>>
>> Guille Polito
>>
>> Research Engineer
>>
>> Centre de Recherche en Informatique, Signal et Automatique de Lille
>>
>> CRIStAL - UMR 9189
>>
>> French National Center for Scientific Research - *http://www.cnrs.fr
>> *
>>
>>
>> *Web:* *http://guillep.github.io* 
>>
>> *Phone: *+33 06 52 70 66 13 <+33%206%2052%2070%2066%2013>
>>
>
>


-- 



Guille Polito

Research Engineer

Centre de 

Re: [Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-19 Thread Richard O'Keefe
The idea of stacking things like character coding and buffering 
by using layers of wrapping is very much in the spirit of Java and
is neither good for performance nor helpful for correctness.


On 20 March 2018 at 05:19, Guillermo Polito 
wrote:

> Hi all,
>
> I've put some minutes summarizing the new APIs provided by the combination
> of the new File implementation and the Zn encoders. They all basically
> follow the decorator pattern to stack different responsibilities such as
> buffering, encoding, line ending convertions.
>
> Please, do not hesitate to give your feedback.
>
> Guille
>
>
> 1. Basic Files
>
> By default files are binary. Not buffered.
>
> (File named: 'name') readStream.
> (File named: 'name') readStreamDo: [ :stream | ... ].
> (File named: 'name') writeStream.
> (File named: 'name') writeStreamDo: [ :stream | ... ].
>
>
> 2. Encoding
>
> To add encoding, wrap a stream with a corresponding
> ZnCharacterRead/WriteStream.
>
> "Reading"
> utf8Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf8'.
> utf16Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf16'.
>
> "Writing"
> utf8Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf8'.
> utf16Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf16'.
>
> 3. Buffering
>
> To add buffering, wrap a stream with a corresponding
> ZnBufferedRead/WriteStream.
>
> bufferedReadStream := ZnBufferedReadStream on: aStream.
> bufferedWriteStream := ZnBufferedWriteStream on: aStream.
>
> It is in general better to buffer the reading on the binary file and apply
> the encoding on the buffer in memory than the other way around. See
>
> [file := Smalltalk sourcesFile fullName.
> (File named: file) readStreamDo: [ :binaryFile |
> (ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding:
> 'utf8') upToEnd
> ]] timeToRun. "0:00:00:09.288"
>
> [file := Smalltalk sourcesFile fullName.
> (File named: file) readStreamDo: [ :binaryFile |
> (ZnBufferedReadStream on: (ZnCharacterReadStream on: binaryFile encoding:
> 'utf8')) upToEnd
> ]] timeToRun. "0:00:00:14.189"
>
> 4. File System
>
> By default, file system files are buffered and utf8 encoded to keep
> backwards compatibility.
>
> 'name' asFileReference readStreamDo: [ :bufferedUtf8Stream | ... ].
> 'name' asFileReference writeStreamDo: [ :bufferedUtf8Stream | ... ].
>
> FileStream also provides access to plain binary files using the
> #binaryRead/WriteStream messages. Binary streams are buffered by default
> also.
>
> 'name' asFileReference binaryReadStreamDo: [ :bufferedBinaryStream | ... ].
> 'name' asFileReference binaryWriteStreamDo: [ :bufferedBinaryStream | ...
> ].
>
> If you want a file with another encoding (to come in the PR
> https://github.com/pharo-project/pharo/pull/1134), you can specify it
> while obtaining the stream:
>
> 'name' asFileReference
> readStreamEncoded: 'utf16'
> do: [ :bufferedUtf16Stream | ... ].
>
> 'name' asFileReference
> writeStreamEncoded: 'utf8'
> do: [ :bufferedUtf16Stream | ... ].
>
> 5. Line Ending Conventions
>
> If you want to write files following a specific line ending convention,
> use the ZnNewLineWriterStream.
> This stream decorator will transform any line ending (cr, lf, crlf) into a
> defined line ending.
> By default it chooses the platform line ending convention.
>
> lineWriter := ZnNewLineWriterStream on: aStream.
>
> If you want to choose another line ending convention you can do:
>
> lineWriter forCr.
> lineWriter forLf.
> lineWriter forCrLf.
> lineWriter forPlatformLineEnding.
>
> --
>
>
>
> Guille Polito
>
> Research Engineer
>
> Centre de Recherche en Informatique, Signal et Automatique de Lille
>
> CRIStAL - UMR 9189
>
> French National Center for Scientific Research - *http://www.cnrs.fr
> *
>
>
> *Web:* *http://guillep.github.io* 
>
> *Phone: *+33 06 52 70 66 13 <+33%206%2052%2070%2066%2013>
>


[Pharo-users] New Files in Pharo - Migration Guide, How To's and examples

2018-03-19 Thread Guillermo Polito
Hi all,

I've put some minutes summarizing the new APIs provided by the combination
of the new File implementation and the Zn encoders. They all basically
follow the decorator pattern to stack different responsibilities such as
buffering, encoding, line ending convertions.

Please, do not hesitate to give your feedback.

Guille


1. Basic Files

By default files are binary. Not buffered.

(File named: 'name') readStream.
(File named: 'name') readStreamDo: [ :stream | ... ].
(File named: 'name') writeStream.
(File named: 'name') writeStreamDo: [ :stream | ... ].


2. Encoding

To add encoding, wrap a stream with a corresponding
ZnCharacterRead/WriteStream.

"Reading"
utf8Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf8'.
utf16Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf16'.

"Writing"
utf8Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf8'.
utf16Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf16'.

3. Buffering

To add buffering, wrap a stream with a corresponding
ZnBufferedRead/WriteStream.

bufferedReadStream := ZnBufferedReadStream on: aStream.
bufferedWriteStream := ZnBufferedWriteStream on: aStream.

It is in general better to buffer the reading on the binary file and apply
the encoding on the buffer in memory than the other way around. See

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding:
'utf8') upToEnd
]] timeToRun. "0:00:00:09.288"

[file := Smalltalk sourcesFile fullName.
(File named: file) readStreamDo: [ :binaryFile |
(ZnBufferedReadStream on: (ZnCharacterReadStream on: binaryFile encoding:
'utf8')) upToEnd
]] timeToRun. "0:00:00:14.189"

4. File System

By default, file system files are buffered and utf8 encoded to keep
backwards compatibility.

'name' asFileReference readStreamDo: [ :bufferedUtf8Stream | ... ].
'name' asFileReference writeStreamDo: [ :bufferedUtf8Stream | ... ].

FileStream also provides access to plain binary files using the
#binaryRead/WriteStream messages. Binary streams are buffered by default
also.

'name' asFileReference binaryReadStreamDo: [ :bufferedBinaryStream | ... ].
'name' asFileReference binaryWriteStreamDo: [ :bufferedBinaryStream | ... ].

If you want a file with another encoding (to come in the PR
https://github.com/pharo-project/pharo/pull/1134), you can specify it while
obtaining the stream:

'name' asFileReference
readStreamEncoded: 'utf16'
do: [ :bufferedUtf16Stream | ... ].

'name' asFileReference
writeStreamEncoded: 'utf8'
do: [ :bufferedUtf16Stream | ... ].

5. Line Ending Conventions

If you want to write files following a specific line ending convention, use
the ZnNewLineWriterStream.
This stream decorator will transform any line ending (cr, lf, crlf) into a
defined line ending.
By default it chooses the platform line ending convention.

lineWriter := ZnNewLineWriterStream on: aStream.

If you want to choose another line ending convention you can do:

lineWriter forCr.
lineWriter forLf.
lineWriter forCrLf.
lineWriter forPlatformLineEnding.

-- 



Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - *http://www.cnrs.fr
*


*Web:* *http://guillep.github.io* 

*Phone: *+33 06 52 70 66 13