[Pharo-users] [ANN] I stop maintaining the Ubuntu packages

2014-09-24 Thread Damien Cassou
Dear all,

I've recently switched my Linux distribution from Ubuntu to NixOS.
This means I can no longer maintain the Ubuntu packages and their PPA
(https://launchpad.net/~pharo/).

These packages need a maintainer!

The good side of it is that it is not a lot of work as I did all the
automating infrastructure:
https://github.com/pharo-project/pharo-ubuntu. Basically, the
maintainer will have to launch a few shell scripts regularly (once a
month for example). I will obviously be available to help the new
maintainer.

The community needs *you*.

Good news is: I packaged the Pharo VM for the Nix package manager
which can be installed on many Unix (e.g., Linux, MacOS X, FreeBSD).
And NixOS will have the Pharo VM package in its next release in
October!

-- 
Damien Cassou
http://damiencassou.seasidehosting.st

Success is the ability to go from one failure to another without
losing enthusiasm.
Winston Churchill



[Pharo-users] Packaging Pharo for many distributions at once

2014-09-24 Thread Damien Cassou
Hi,

I've packaged Pharo VM for the Nix package manager which can be
installed on many Unix (e.g., Linux, MacOS X and FreeBSD). This works
well but requires installing Nix on your Unix.

Another solution is http://openbuildservice.org. Is any of you
interested in trying to use this service to build Pharo VM packages
for several Linux distributions automatically? I can help.

Best

-- 
Damien Cassou
http://damiencassou.seasidehosting.st

Success is the ability to go from one failure to another without
losing enthusiasm.
Winston Churchill



Re: [Pharo-users] [ANN] I stop maintaining the Ubuntu packages

2014-09-24 Thread Nicolai Hess
Count me in. I still have a ubuntu installation.
Don't know what I have to do, but I'll try it.

2014-09-24 9:41 GMT+02:00 Damien Cassou damien.cas...@gmail.com:

 Dear all,

 I've recently switched my Linux distribution from Ubuntu to NixOS.
 This means I can no longer maintain the Ubuntu packages and their PPA
 (https://launchpad.net/~pharo/).

 These packages need a maintainer!

 The good side of it is that it is not a lot of work as I did all the
 automating infrastructure:
 https://github.com/pharo-project/pharo-ubuntu. Basically, the
 maintainer will have to launch a few shell scripts regularly (once a
 month for example). I will obviously be available to help the new
 maintainer.

 The community needs *you*.

 Good news is: I packaged the Pharo VM for the Nix package manager
 which can be installed on many Unix (e.g., Linux, MacOS X, FreeBSD).
 And NixOS will have the Pharo VM package in its next release in
 October!

 --
 Damien Cassou
 http://damiencassou.seasidehosting.st

 Success is the ability to go from one failure to another without
 losing enthusiasm.
 Winston Churchill




[Pharo-users] Mac VM - developer unidentified

2014-09-24 Thread Usman Bhatti
Hi,

Today I downloaded Pharo vm from pharo.org. When I try to open the VM I get
this security notification from Mac and it didn't happen with earlier
versions of Pharo VM. I'm running: OS X 10.9.4 (13E28).


[image: Inline image 1]

I should be able to change my security settings to run it but I prefer not
to and for less knowledgable it can be a negative message.

regards,

usman


Re: [Pharo-users] Mac VM - developer unidentified

2014-09-24 Thread Mircea S.
Right click on the app and click Open. 

This time around it will bring up the same screen but with a second button that 
says Open. 

After you click that it will never ask again. 

Trimis de pe iPhone-ul meu

Pe 24.09.2014, la 12:13, Usman Bhatti usman.bha...@gmail.com a scris:

 Hi,
 
 Today I downloaded Pharo vm from pharo.org. When I try to open the VM I get 
 this security notification from Mac and it didn't happen with earlier 
 versions of Pharo VM. I'm running: OS X 10.9.4 (13E28).
 
 
 Screen Shot 2014-09-24 at 11.07.29 AM.png
 
 I should be able to change my security settings to run it but I prefer not to 
 and for less knowledgable it can be a negative message.
 
 regards,
 
 usman


Re: [Pharo-users] Mac VM - developer unidentified

2014-09-24 Thread Usman Bhatti
On Wed, Sep 24, 2014 at 11:18 AM, Mircea S. mir...@unom.ro wrote:

 Right click on the app and click Open.

 This time around it will bring up the same screen but with a second button
 that says Open.

 After you click that it will never ask again.


Tx for saving me the effort to google about it :)



 Trimis de pe iPhone-ul meu

 Pe 24.09.2014, la 12:13, Usman Bhatti usman.bha...@gmail.com a scris:

 Hi,

 Today I downloaded Pharo vm from pharo.org. When I try to open the VM I
 get this security notification from Mac and it didn't happen with earlier
 versions of Pharo VM. I'm running: OS X 10.9.4 (13E28).


 Screen Shot 2014-09-24 at 11.07.29 AM.png

 I should be able to change my security settings to run it but I prefer not
 to and for less knowledgable it can be a negative message.

 regards,

 usman




Re: [Pharo-users] Packaging Pharo for many distributions at once

2014-09-24 Thread Christophe Demarey

Le 24 sept. 2014 à 11:00, Thierry Goubier a écrit :

  Hi Damien,
 
 I would be interested in a zeroinstall [http://0install.net/] version :) A 
 cool system because it is user-level (no need to go into system admin mode).

It looks interesting but requires the user to install 0install before being 
able to install Pharo. A big drawback ...

smime.p7s
Description: S/MIME cryptographic signature


Re: [Pharo-users] Packaging Pharo for many distributions at once

2014-09-24 Thread Thierry Goubier
2014-09-24 11:35 GMT+02:00 Christophe Demarey christophe.dema...@inria.fr:


 Le 24 sept. 2014 à 11:00, Thierry Goubier a écrit :

  Hi Damien,

 I would be interested in a zeroinstall [http://0install.net/] version :)
 A cool system because it is user-level (no need to go into system admin
 mode).


 It looks interesting but requires the user to install 0install before
 being able to install Pharo. A big drawback ...


This is because you don't have yet half a dozen things already installed
via 0install ;)

All non-native package managers have that issue.

Thierry


[Pharo-users] Ring package, when to use it?

2014-09-24 Thread Juraj Kubelka
Hi!

I am trying to understand in what scenarios is good to use Ring package instead 
of objects of compiled methods, classes and r-packages. It is not clear to me. 

For example if I want to ask for where a method/class/package is referenced 
should I consider the Ring package?

When I should consider to use Ring?

Thank you a lot,
Juraj


[Pharo-users] Glorp + NBSQLite3

2014-09-24 Thread Pierce Ng
Hello,

I am pleased to report that I have gotten Glorp working with NBSQLite3
enough to run Sven's Reddit.st. 

As mentioned in my blog post, there is more work to be done to get
Glorp fully integrated with NBSQLite3, but preliminary results are 
encouraging.

  http://www.samadhiweb.com/blog/2014.09.24.glorp.nbsqlite3.html

Pierce




Re: [Pharo-users] Glorp + NBSQLite3

2014-09-24 Thread Esteban A. Maringolo
This is really cool!

How does SQLite scale in terms of table size and so on?

I was surprised to know it is based on an old version of PostgreSQL
according to this presentation:
http://www.pgcon.org/2014/schedule/events/736.en.html

Regards!
Esteban A. Maringolo


2014-09-24 13:17 GMT-03:00 Pierce Ng pie...@samadhiweb.com:
 Hello,

 I am pleased to report that I have gotten Glorp working with NBSQLite3
 enough to run Sven's Reddit.st.

 As mentioned in my blog post, there is more work to be done to get
 Glorp fully integrated with NBSQLite3, but preliminary results are
 encouraging.

   http://www.samadhiweb.com/blog/2014.09.24.glorp.nbsqlite3.html

 Pierce





Re: [Pharo-users] Ring package, when to use it?

2014-09-24 Thread Juraj Kubelka
Thank you Marcus for the explanation. 

So now I understand that if I want to analyse existing 
packages/class/methods/etc in the image, Ring is not a kind of interest. 

But as I think about it, if someone uses Ring as a base to analyse environment, 
then it could be useful to use the same analysis tool for any source, e.g. not 
loaded packages. Am I right or are the some limitations?

Thanks.
Juraj

On Sep 24, 2014, at 1:28 PM, Marcus Denker marcus.den...@inria.fr wrote:

 
 
 On Wed, Sep 24, 2014 at 6:03 PM, Juraj Kubelka juraj.kube...@gmail.com 
 wrote:
 Hi!
 
 I am trying to understand in what scenarios is good to use Ring package 
 instead of objects of compiled methods, classes and r-packages. It is not 
 clear to me.
 
 For example if I want to ask for where a method/class/package is referenced 
 should I consider the Ring package?
 
 When I should consider to use Ring?
 
 
 The idea is that Ring models Classes/methods that you want to reason about, 
 but that are not actually really in the system installed.
 This is needed often and everyone implements their own model: Monticello 
 (MCClassDefiniion), FilePackage (Pseudoclass/PseudoMethod), 
 RB (RBClass, RBMethod).
 Ring is a first step to propose one model that everyone can use who needs to 
 model code that is not installed in the system.
 
 e.g. if you want to analyse and mcz package, instead of loading it (with all 
 the side effects), you could load it as a Ring model.
 
 Like everything that exists it is not perfect (else it would not exist)... 
 e.g. we actually should replace PseudoClass and PseudoMethod
 by Ring, for example. Any improvement (both to the model or its use) are very 
 welcome.
 
 E.g. one thing I am slowly doing is to simplify it (e.g. removing the 
 RGFactory class) 
 
Marcus
  



Re: [Pharo-users] Ridiculous we are

2014-09-24 Thread Benjamin Pollack

On Tue, 23 Sep 2014 08:51:54 -0400, Hilaire hila...@drgeo.eu wrote:


Le 23/09/2014 14:09, Damien Cassou a écrit :

I recently read documents about utf-8 encoding. In all of them, the
author says that pathnames should be kept as is because you never know
which encoding the filesystem uses. So, a filename should probably be
a bytearray.



yes, but a #é should be encoded in two bytes.


As noted in my previous message, é could be represented as either one or  
two Unicode code points, and these in turn could validly be either two or  
three bytes in UTF-8.  My gut says that $é should be U+00E9, because  
otherwise you should have to use two Characters ($e and $´), but you could  
legitimately argue otherwise as well, and at any rate, #é could definitely  
be either.  This is likely the core of the issue you're hitting.




Re: [Pharo-users] Ridiculous we are

2014-09-24 Thread Benjamin Pollack
On Mon, 22 Sep 2014 17:58:41 -0400, Sven Van Caekenberghe s...@stfx.eu  
wrote:


I also find the way some problems are reported quite disturbing. How  
much testing did you do ? On which platforms ?


I can do this (in Pharo 3) without any problems (we're talking about  
arbitrary Unicode characters in path names):


('/tmp' asFileReference / 'été') ensureCreateDirectory.
'/tmp/été' asFileReference exists.
('/tmp/été' asFileReference / 'Ελλάδα.txt') writeStreamDo: [ :out |
  out  'What about Greece ?' ].
('/tmp/été' asFileReference / 'Ελλάδα.txt') exists.
('/tmp/été' asFileReference / 'Ελλάδα.txt') contents.

And in a terminal, I get:

$ ls /tmp/été/Ελλάδα.txt
/tmp/été/Ελλάδα.txt

$ cat !$
cat /tmp/été/Ελλάδα.txt
What about Greece ?

This is on Mac OS X.

So this part fundamentally works in the image and on one VM. There might  
of course be problems in how paths are used in certain places or on  
certain VM/platforms.




Focusing purely on Unicode itself (not the encoding systems), a letter  
like é can be represented as U+00E9 (LATIN SMALL LETTER E WITH ACUTE), or  
as U+0065 (LATIN SMALL LETTER E) followed by U+0301 (combining acute  
accent).  These will appear identical to the user, but are emphatically  
*not* identical for most software.  The way you're testing here, you will  
not hit any error relating to this concept, ever, because you're using  
Pharo for both generating and consuming the strings.  At the very least,  
we'd need to generate a file named été with both forms explicitly and  
see what happens.


Things get even more exciting, though, because Unix says that file names  
are simply arbitrary byte patterns that do not contain the null byte.*   
Thus, you can trivially create a file named été using Latin-1 encoding,  
and again using UTF-8 encoding, and again using UTF-7 encoding, and these  
might all be shown to the user as identically named, but I guarantee you  
that Pharo will not act sanely with all four of these.  Even on Windows,  
where things are a bit saner (NTFS mandates UTF-16), and where an explicit  
normalization form is preferred (NFC), I just explicitly verified that I  
can trivially inject other normalization forms into the file system.   
Thus, you can still have two files named été that nevertheless have  
different names as far as the OS is concerned.


In this case, as far as I can tell, Pharo assumes that all path names are  
Unicode, and does not do any work to convert strings to or from the  
various normalization schemes (looking in Path  
classcanonicalizeElements:, Path classfrom:delimiter, and  
FileSystemStorepathFromString: here).


There's therefore a pretty straightforward fix that Pharo could do:

  1. Path would use ByteArrays as the actual canonical store, and
 provide convenience methods to see what the array decodes to
 in various encodings.  The developer and application can make
 decisions about what encoding system they want to use.
  2. The VM likely needs to be modified to handle this (didn't check)

As much as I wish Hilaire provided more details in his bug report, it's  
worth keeping in mind that not all users, or even all programmers,  
understand the full implications of things like how various Unicode  
normalization and encoding schemes interact in practice with Unix's very  
vague concept of what a file name actually is, so I usually try to  
approach these bug reports carefully and with an open mind.


--Benjamin

* On OS X, HFS+ uses UTF-16 with an Apple-specific variant of NFD, whereas  
I do not believe this holds for e.g. UFS or FUSE-backed file systems, so  
things are a bit subtler there, but the general rule holds.




Re: [Pharo-users] Mac VM - developer unidentified

2014-09-24 Thread Damien Cassou
On Wed, Sep 24, 2014 at 11:13 AM, Usman Bhatti usman.bha...@gmail.com
wrote:

 for less knowledgable it can be a negative message.


yes I agree.


-- 
Damien Cassou
http://damiencassou.seasidehosting.st

Success is the ability to go from one failure to another without losing
enthusiasm.
Winston Churchill


Re: [Pharo-users] Ring package, when to use it?

2014-09-24 Thread Marcus Denker
On Wed, Sep 24, 2014 at 6:03 PM, Juraj Kubelka juraj.kube...@gmail.com
wrote:

 Hi!

 I am trying to understand in what scenarios is good to use Ring package
 instead of objects of compiled methods, classes and r-packages. It is not
 clear to me.

 For example if I want to ask for where a method/class/package is
 referenced should I consider the Ring package?

 When I should consider to use Ring?


The idea is that Ring models Classes/methods that you want to reason about,
but that are not actually really in the system installed.
This is needed often and everyone implements their own model: Monticello
(MCClassDefiniion), FilePackage (Pseudoclass/PseudoMethod),
RB (RBClass, RBMethod).
Ring is a first step to propose one model that everyone can use who needs
to model code that is not installed in the system.

e.g. if you want to analyse and mcz package, instead of loading it (with
all the side effects), you could load it as a Ring model.

Like everything that exists it is not perfect (else it would not exist)...
e.g. we actually should replace PseudoClass and PseudoMethod
by Ring, for example. Any improvement (both to the model or its use) are
very welcome.

E.g. one thing I am slowly doing is to simplify it (e.g. removing the
RGFactory class)

   Marcus


Re: [Pharo-users] Packaging Pharo for many distributions at once

2014-09-24 Thread Damien Cassou
On Wed, Sep 24, 2014 at 12:47 PM, Thierry Goubier
thierry.goub...@gmail.com wrote:
 All non-native package managers have that issue.


nix included

-- 
Damien Cassou
http://damiencassou.seasidehosting.st

Success is the ability to go from one failure to another without
losing enthusiasm.
Winston Churchill



Re: [Pharo-users] [ANN] I stop maintaining the Ubuntu packages

2014-09-24 Thread volk...@nivoba.de
Sad news. I switched from OSX to Ubuntu and i also plan to switch from 
Win8 to Ubuntu.


I really hope Pharo will find a new maintainer for the most popular 
Linux Distribution.


Thank you for your work ...

BW,
Volkert

Am 24.09.2014 um 09:41 schrieb Damien Cassou:

Dear all,

I've recently switched my Linux distribution from Ubuntu to NixOS.
This means I can no longer maintain the Ubuntu packages and their PPA
(https://launchpad.net/~pharo/).

These packages need a maintainer!

The good side of it is that it is not a lot of work as I did all the
automating infrastructure:
https://github.com/pharo-project/pharo-ubuntu. Basically, the
maintainer will have to launch a few shell scripts regularly (once a
month for example). I will obviously be available to help the new
maintainer.

The community needs *you*.

Good news is: I packaged the Pharo VM for the Nix package manager
which can be installed on many Unix (e.g., Linux, MacOS X, FreeBSD).
And NixOS will have the Pharo VM package in its next release in
October!



--
www.nivoba.de

The more complex an object, the larger the investment in learning to use it, and 
the greater the resistance to abandon it., NW




Re: [Pharo-users] Ridiculous we are

2014-09-24 Thread Sven Van Caekenberghe

On 24 Sep 2014, at 18:48, Benjamin Pollack benja...@bitquabit.com wrote:

 On Tue, 23 Sep 2014 08:51:54 -0400, Hilaire hila...@drgeo.eu wrote:
 
 Le 23/09/2014 14:09, Damien Cassou a écrit :
 I recently read documents about utf-8 encoding. In all of them, the
 author says that pathnames should be kept as is because you never know
 which encoding the filesystem uses. So, a filename should probably be
 a bytearray.
 
 
 yes, but a #é should be encoded in two bytes.
 
 As noted in my previous message, é could be represented as either one or 
 two Unicode code points, and these in turn could validly be either two or 
 three bytes in UTF-8.  My gut says that $é should be U+00E9, because 
 otherwise you should have to use two Characters ($e and $´), but you could 
 legitimately argue otherwise as well, and at any rate, #é could definitely be 
 either.  This is likely the core of the issue you're hitting.

Did you read the actual conversation in the issue ?

 
https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters

It has been renamed and there is a fix (as a change set, not as a slice, yet). 
Basically, there was a primitive call into a plugin that failed to do encoding.

Now regarding the issues you raised. Pharo does not do Unicode canonicalisation 
or any of that other fancy stuff (like categorisation, proper ordering and so 
on). This is another orthogonal and way more general issue.

Regarding the pathnames encoding: if the OS itself does not know it, how can we 
? I think that the current approach (assuming UTF-8) makes (the most) sense for 
a system that runs on multiple platforms.

Sven




Re: [Pharo-users] Ridiculous we are

2014-09-24 Thread Benjamin Pollack
On Wed, 24 Sep 2014 13:03:57 -0400, Sven Van Caekenberghe s...@stfx.eu  
wrote:




Did you read the actual conversation in the issue ?

 
https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters

It has been renamed and there is a fix (as a change set, not as a slice,  
yet). Basically, there was a primitive call into a plugin that failed to  
do encoding.




No, I apologize; I missed the bug link.  Thanks for reposting it.

Now regarding the issues you raised. Pharo does not do Unicode  
canonicalisation or any of that other fancy stuff (like categorisation,  
proper ordering and so on). This is another orthogonal and way more  
general issue.


Regarding the pathnames encoding: if the OS itself does not know it, how  
can we ?


That's actually the argument *against* using UTF-8 as the standard Pharo  
way to represent filenames--at least on Unix systems.  If Pharo used  
ByteArrays to represent paths, with convenience methods for working with  
UTF-8 (since I do agree that's the most likely thing for a user/dev to  
want), then you'd be able to work with all files no matter what, *and*  
have a convenient way of doing so for the common case.


This is an old discussion, and I do see both sides of it.  In terms of  
SCMs, Mercurial and Git both just say it's a collection of bytes,  
whereas Subversion says it's Unicode code points.  This has some  
uncomfortable implications for both systems when working on multiple  
platforms.


--Benjamin



Re: [Pharo-users] Ring package, when to use it?

2014-09-24 Thread Marcus Denker

On 24 Sep 2014, at 18:35, Juraj Kubelka juraj.kube...@gmail.com wrote:

 Thank you Marcus for the explanation. 
 
 So now I understand that if I want to analyse existing 
 packages/class/methods/etc in the image, Ring is not a kind of interest. 
 
 But as I think about it, if someone uses Ring as a base to analyse 
 environment, then it could be useful to use the same analysis tool for any 
 source, e.g. not loaded packages. Am I right or are the some limitations?
 
 
That is the idea… Ring Definition and normal classes/methods share a common API 
and can be used interchangeably.

Marcus


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Pharo-users] Ridiculous we are

2014-09-24 Thread Sven Van Caekenberghe

On 24 Sep 2014, at 19:09, Benjamin Pollack benja...@bitquabit.com wrote:

 On Wed, 24 Sep 2014 13:03:57 -0400, Sven Van Caekenberghe s...@stfx.eu 
 wrote:
 
 
 Did you read the actual conversation in the issue ?
 
 https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters
 
 It has been renamed and there is a fix (as a change set, not as a slice, 
 yet). Basically, there was a primitive call into a plugin that failed to do 
 encoding.
 
 
 No, I apologize; I missed the bug link.  Thanks for reposting it.
 
 Now regarding the issues you raised. Pharo does not do Unicode 
 canonicalisation or any of that other fancy stuff (like categorisation, 
 proper ordering and so on). This is another orthogonal and way more general 
 issue.
 
 Regarding the pathnames encoding: if the OS itself does not know it, how can 
 we ?
 
 That's actually the argument *against* using UTF-8 as the standard Pharo way 
 to represent filenames--at least on Unix systems.  If Pharo used ByteArrays 
 to represent paths, with convenience methods for working with UTF-8 (since I 
 do agree that's the most likely thing for a user/dev to want), then you'd be 
 able to work with all files no matter what, *and* have a convenient way of 
 doing so for the common case.
 
 This is an old discussion, and I do see both sides of it.  In terms of SCMs, 
 Mercurial and Git both just say it's a collection of bytes, whereas 
 Subversion says it's Unicode code points.  This has some uncomfortable 
 implications for both systems when working on multiple platforms.

Benjamin,

I think I understand the concern / situation that you describe. But I fail to 
see how not-interpreting it and interpreting it in different encodings can work 
in practice, especially since your point seems to be that there is no meta 
information that gives a definitive answer. 

I would guess that other languages, say Java or Python, have some approach to 
handle this problem ?

Also, since we are living with the current approach without much problems, I 
think the issue is not terribly pressing.

Sven




[Pharo-users] Update of AgileVisualization.com

2014-09-24 Thread Alexandre Bergel
Dear All,

AgileVisualization.com , the book about Roassal, has been updated with a new 
chapter. 
The HTML versions of the chapter are also online. 

Agile Visualization is written using Pillar and Skeleton. Thanks Damien Cassou 
and Yuriy Tymchuk (Uko) for these wonderful frameworks.

I am also looking for contributors. I know that some of you guys have done 
wonderful things with Roassal. Sharing your knowledge with the rest of us would 
be fantastic. You may even get paid to write a chapter :-) Get in touch with me 
for more info.

Cheers,
Alexandre
-- 
_,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
Alexandre Bergel  http://www.bergel.eu
^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.






Re: [Pharo-users] Ridiculous we are

2014-09-24 Thread Alain Rastoul



Le 24/09/2014 19:09, Benjamin Pollack a écrit :


If Pharo used  ByteArrays to represent paths, with convenience methods for 
working with
UTF-8 (since I do agree that's the most likely thing for a user/dev to
want), then you'd be able to work with all files no matter what, *and*
have a convenient way of doing so for the common case.

Hi Ben,
I strongly disagree with you on this point: using byte arrays (or byte 
strings) is a pain in an international context.

The OS knows about its encoding: locale for unix, code page for windows.
Windows code pages depends on country, for english windows 1252 (similar 
to iso-8859-1), for other european countries, other variations of 
8859-xx... (welcome to ISO  soup), same for unix.


Java uses UTF8 strings and dotNet uses UTF16 strings (don't know for 
Python) where chars are not bytes and they are not used as byte arrays 
but as Character arrays.
Both do conversions from OS character set encoding  to internal encoding 
for strings (paths and whatever).


There is already an UTF8 and UTF16 encoding support in Pharo, but the
standard String class uses bytes, and lot of files, directories and
system methods use ByteString class and that is the problem here.
UTF8 encoding in Pharo encodes to a variable lenght ByteString, which is 
not the same as an (hypothetical) Utf8String where all (variable length) 
chars would be utf8 encoded.

Using a new UTF8 or UTF16 string class could be a major rework,
but taking a decision about about internal string encoding is needed.
As Sven says, there is no emergency and you have a workaround, but
perhaps using the existing WideString encoded as UTF16 (or UTF32?) in
some well defined classes/methods could be a good start for this rework?
IMHO the workaround of using utf8 encoded byte strings is not a good way 
to deal with this problem and should not be granted as the solution.





Re: [Pharo-users] Ridiculous we are

2014-09-24 Thread Sven Van Caekenberghe
Alain,

On 24 Sep 2014, at 23:00, Alain Rastoul alf.mmm@gmail.com wrote:

 Le 24/09/2014 19:09, Benjamin Pollack a écrit :
 
 If Pharo used  ByteArrays to represent paths, with convenience methods for 
 working with
 UTF-8 (since I do agree that's the most likely thing for a user/dev to
 want), then you'd be able to work with all files no matter what, *and*
 have a convenient way of doing so for the common case.
 Hi Ben,
 I strongly disagree with you on this point: using byte arrays (or byte 
 strings) is a pain in an international context.
 The OS knows about its encoding: locale for unix, code page for windows.
 Windows code pages depends on country, for english windows 1252 (similar to 
 iso-8859-1), for other european countries, other variations of 8859-xx... 
 (welcome to ISO  soup), same for unix.
 
 Java uses UTF8 strings and dotNet uses UTF16 strings (don't know for Python) 
 where chars are not bytes and they are not used as byte arrays but as 
 Character arrays.
 Both do conversions from OS character set encoding  to internal encoding for 
 strings (paths and whatever).
 
 There is already an UTF8 and UTF16 encoding support in Pharo, but the
 standard String class uses bytes, and lot of files, directories and
 system methods use ByteString class and that is the problem here.
 UTF8 encoding in Pharo encodes to a variable lenght ByteString, which is not 
 the same as an (hypothetical) Utf8String where all (variable length) chars 
 would be utf8 encoded.
 Using a new UTF8 or UTF16 string class could be a major rework,
 but taking a decision about about internal string encoding is needed.
 As Sven says, there is no emergency and you have a workaround, but
 perhaps using the existing WideString encoded as UTF16 (or UTF32?) in
 some well defined classes/methods could be a good start for this rework?
 IMHO the workaround of using utf8 encoded byte strings is not a good way to 
 deal with this problem and should not be granted as the solution.

The character encoding situation in Pharo is pretty good actually. The only 
problem is that there is some old school code left that encodes strings into 
strings, but today you can easily write much better and conceptually correct 
code.

You could have a look at this draft chapter of the upcoming 'Enterprise Pharo' 
book that I am currently writing:

  http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/

Concerning file system paths, FilePathEncoder and FilePluginPrimitives already 
do the right thing.

Now, your idea about using UTF-8 to represent internal Strings is something 
that has been discussed before and in many other languages as well. The short 
answer is that due to it being variable length, the inefficiency is (probably) 
just too high. Simple indexed access becomes a problem, let alone more complex 
string manipulations. I am not saying that it cannot be done, I think it is 
just not worth the trouble. The current solution in Pharo with ByteString and 
WideString is quite nice (check the chapter I mentioned before).

Sven




Re: [Pharo-users] Ridiculous we are

2014-09-24 Thread Alain Rastoul

Le 25/09/2014 00:06, Sven Van Caekenberghe a écrit :

Alain,



The character encoding situation in Pharo is pretty good actually. The only 
problem is that there is some old school code left that encodes strings into 
strings, but today you can easily write much better and conceptually correct 
code.

You could have a look at this draft chapter of the upcoming 'Enterprise Pharo' 
book that I am currently writing:

   http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/

Concerning file system paths, FilePathEncoder and FilePluginPrimitives already 
do the right thing.

Now, your idea about using UTF-8 to represent internal Strings is something 
that has been discussed before and in many other languages as well. The short 
answer is that due to it being variable length, the inefficiency is (probably) 
just too high. Simple indexed access becomes a problem, let alone more complex 
string manipulations. I am not saying that it cannot be done, I think it is 
just not worth the trouble. The current solution in Pharo with ByteString and 
WideString is quite nice (check the chapter I mentioned before).

Sven


Very interesting !
It seems that most of what I was saying is already here :)
I was not saying that Pharo should use utf8 (I mentionned utf8 because 
it is a standard, but I find the variable length encoding very weird), I 
was rather talking of using WideString in UTF 16 or 32 and that's done.
I saw asWideString but didn't know about automatic convertion or 
codepoint selector and internal wide string support.
Does it means that Pharo Greek users (for example) use WideString for 
Strings without having to specify it or make explicit convertions 
(except of course when dealing with bytes if they want to) ?

If yes, very good, job is almost done :)
(personnally I would also deprecate ByteString, and get rid of it, just 
my opinion).

Thanks for the link, another good chapter .

Regards,

Alain





Re: [Pharo-users] Ridiculous we are

2014-09-24 Thread Sven Van Caekenberghe

On 25 Sep 2014, at 01:04, Alain Rastoul alf.mmm@gmail.com wrote:

 Le 25/09/2014 00:06, Sven Van Caekenberghe a écrit :
 Alain,
 
 The character encoding situation in Pharo is pretty good actually. The only 
 problem is that there is some old school code left that encodes strings into 
 strings, but today you can easily write much better and conceptually correct 
 code.
 
 You could have a look at this draft chapter of the upcoming 'Enterprise 
 Pharo' book that I am currently writing:
 
   http://stfx.eu/EnterprisePharo/Zinc-Encoding-Meta/
 
 Concerning file system paths, FilePathEncoder and FilePluginPrimitives 
 already do the right thing.
 
 Now, your idea about using UTF-8 to represent internal Strings is something 
 that has been discussed before and in many other languages as well. The 
 short answer is that due to it being variable length, the inefficiency is 
 (probably) just too high. Simple indexed access becomes a problem, let alone 
 more complex string manipulations. I am not saying that it cannot be done, I 
 think it is just not worth the trouble. The current solution in Pharo with 
 ByteString and WideString is quite nice (check the chapter I mentioned 
 before).
 
 Sven
 
 Very interesting !
 It seems that most of what I was saying is already here :)
 I was not saying that Pharo should use utf8 (I mentionned utf8 because it is 
 a standard, but I find the variable length encoding very weird), I was rather 
 talking of using WideString in UTF 16 or 32 and that's done.
 I saw asWideString but didn't know about automatic convertion or codepoint 
 selector and internal wide string support.
 Does it means that Pharo Greek users (for example) use WideString for Strings 
 without having to specify it or make explicit convertions (except of course 
 when dealing with bytes if they want to) ?
 If yes, very good, job is almost done :)
 (personnally I would also deprecate ByteString, and get rid of it, just my 
 opinion).
 Thanks for the link, another good chapter .
 
 Regards,
 
 Alain

Yes, the Greek users won't notice a difference, it is all transparent. 
ByteString is important because it is an optimalization of the most common 
case. As a normal user you should only think of abstract Strings and never use 
#asByteString (but use proper encoding).

Feedback on the chapter is always welcome.

Sven