Re: [Pharo-dev] [Vm-dev] Removing most of the windowing code

2016-11-23 Thread Esteban Lorenzano

> On 24 Nov 2016, at 08:24, Esteban Lorenzano  wrote:
> 
> where can I see the code?

forget it, I was so overblown with your ANN that I just skip the link to your 
branch :P

Esteban

Re: [Pharo-dev] Running on Ubuntu?

2016-11-23 Thread Chris Muller
Hey Igor, I was just messing around with this the other day; Levente
had the tersest incantation which worked on my fresh 14.04.4 install:

sudo dpkg --add-architecture i386
sudo apt-get update
sudo apt-get install libuuid1:i386 libglu1-mesa:i386 libsm6:i386

(Documented at http://wiki.squeak.org/squeak/6134).

On Tue, Nov 22, 2016 at 10:46 AM, Igor Stasenko  wrote:
>
>
> On 22 November 2016 at 16:50, Sven Van Caekenberghe  wrote:
>>
>> Igor,
>>
>> For future reference,
>>
>> > On 22 Nov 2016, at 11:17, Sven Van Caekenberghe  wrote:
>> >
>> > (this is what I use for headless)
>> >
>> > sudo dpkg --add-architecture i386
>> > sudo apt-get update
>> > sudo apt-get install libc6:i386
>> > sudo apt-get install libssl1.0.0:i386
>> > sudo apt-get install libfreetype6:i386
>>
>> On a fresh Ubuntu 16.04.1 LTS 64-bit I did the above
>>
>> > (you might need more for full UI)
>>
>> And then just one other install
>>
>>  sudo apt-get install libgl1-mesa-glx:i386
>>
>> which installed lots of dependencies.
>>
>> After that I was able to run the download
>>
>>  get.pharo.org/60+vm
>>
>> in UI mode (using pharo-ui).
>>
>> Sven
>>
> ohh.. wait 6.0 is on the sight.. and i was mangling with 5.0..
> yeah, that explains why it needs 32-bit libs and all that mess  :)
>
>
> --
> Best regards,
> Igor Stasenko.



Re: [Pharo-dev] Breaking the 4GB barrier with Pharo 6 64-bit

2016-11-23 Thread Thierry Goubier

Le 23/11/2016 à 20:11, p...@highoctane.be a écrit :



On Wed, Nov 23, 2016 at 4:16 PM, Thierry Goubier
mailto:thierry.goub...@gmail.com>> wrote:



2016-11-23 15:46 GMT+01:00 p...@highoctane.be
 mailto:p...@highoctane.be>>:

Thanks Thierry.

Please also see that with new satellites, the resolution is ever
increasing (e.g.
Sentinel 
http://m.esa.int/Our_Activities/Observing_the_Earth/Copernicus/Overview4

)


It has allways been so. Anytime you reach a reasonable size, they
send a new satellite with higher res / larger images :)



I understand the tile thing and indeed a lot of the algos work
on tiles, but there are other ways to do this and especially
with real time geo queries on custom defined polygons, you go
only so far with tiles. A reason why we are using GeoTrellis
backed by Accumulo in order to pump data very fast in random order.


But that mean you're dealing with preprocessed / graph georeferenced
data (aka openstreetmap type of data). If you're dealing with
raster, your polygons are approximated by a set of tiles (with a
nice tile size well suited to your network / disk array).

I had reasonable success a long time ago (1991, I think), for
Ifremer, with an unbalanced, sort of quadtree based decomposition
for highly irregular curves on the seabed. Tree node size / tile
size was computed to be exactly equal to the disk block size on a
very slow medium. That sort of work is in the line of a geographic
index for a database: optimise query accesses to geo-referenced
objects... what is hard, and probably what you are doing, is
combining geographic queries with graph queries (give me all houses
in Belgium within a ten minutes bus + walk trip to a primary school)(*)

(*) One can work that out on a raster for speed. This is what GRASS
does for example.

(**) I asked a student to accelerate some raster processing on a
very small FPGA a long time ago. Once he had understood he could
pipeline the design to increase the frequency, he then discovered
that the FPGA would happily grok data faster than the computer bus
could provide it :) leaving no bandwith for the data to be written
back to memory.


Yes, but network can be pretty fast with bonded Ethernet interfaces
these days.


You mean they are not using HPC interconnects ?


We are adding 30+ servers to the cluster at the moment just to
deal with the sizes as there is a project mapping energy
landscape https://vito.be/en/land-use/land-use/energy-landscapes
. This
thing is throwing YARN containers and uses CPU like,
intensively. It is not uncommon for me to see their workload
eating everything for a serious amount of CPU seconds.


Only a few seconds ?


CPU-seconds, that the cluster usage unit for CPU.
http://serverfault.com/questions/138703/a-definition-for-a-cpu-second
So, says couple millions of them on a 640 core setup. CPU power is the
limiting factor in these workloads it seems.


If I understand well, the cluster has enough memory to load in RAM all 
the data, then.



It would be silly not to plug Pharo into all of this
infrastructure I think.


I've had quite bad results with Pharo on compute intensive code
recently, so I'd plan carefully how I use it. On that sort of
hardware, in the projects I'm working on, 1000x faster than Pharo on
a single node is about an expected target.


Sure, but lower level C/C++ things are run from Python or Java, so Pharo
will not do worse. The good bit about Pharo is that one can ship a
preloaded image and that is easier than sending gigabyte (!) sized
uberjars around, that Java will unzip before running, also true with
Python myriad of dependencies. An image file appears super small then.


Agreed. Pharo 64bits is interesting there because it installs a lot 
better than the 32bits version. And as far as I could see, at least as 
stable as the 32bits version for my needs.



Especially given the PhD/Postdoc/brainiacs per square meter
there. If you have seen the Lost TV show, well, it kind of feels
working there at that place. Especially given that is is kind of
hidden in the woods.

Maybe you could have interesting interactions with them. These
guys also have their own nuclear reactor and geothermal drilling.


I'd be interested, because we're working a bit on high performance
parallel runtimes and compilation for those. If one day you happen
to be ready to talk about it in our place? South of Paris, not too
hard to reach by public transport :)

Sure, that would be awesome. But Q1Y17 then because my schedule is
pretty packed at 

Re: [Pharo-dev] IDNA / punycode for Zinc?

2016-11-23 Thread p...@highoctane.be
I am always amazed about the cool things I can learn from this list.

Phil

On Wed, Nov 23, 2016 at 4:59 PM, Max Leske  wrote:

> :) Nice.
>
>
>
> > On 23 Nov 2016, at 16:51, Sven Van Caekenberghe  wrote:
> >
> > https://pharo.fogbugz.com/f/cases/19383/IDNA-punycode-for-Zinc
> >
> > After loading
> >
> > Name: Punycode-dTriangle.8
> > Author: dTriangle
> > Time: 26 August 2013, 10:19:11.728 am
> > UUID: 6493a3ee-43bb-44f0-86a8-5aa47a9b42ff
> > Ancestors: Punycode-dTriangle.7
> >
> > I can do the following
> >
> > 'http://üni.ch ' asUrl.
> >
> > "http://xn--ni-wka.ch/";
> >
> > 'http://üni.ch ' asUrl retrieveContents
> includesSubstring: 'Üni'.
> >
> > "true"
> >
> > Done ;-)
> >
> > Thank you, https://twitter.com/osashimitabenai, well done !
> >
> > Sven
> >
> >> On 23 Nov 2016, at 15:43, Sven Van Caekenberghe  wrote:
> >>
> >>
> >>> On 23 Nov 2016, at 15:36, Max Leske  wrote:
> >>>
> >>> Great!
> >>>
> >>> There’s a punycode implementation on smalltalkhub (
> http://smalltalkhub.com/#!/~dTriangle/Punycode) but it needs some
> polishing.
> >>
> >> Wow, that looks good, it even has ZnUrl integration, so we're done ;-)
> >>
> >> There are no tests though.
> >>
> >> How come we never heard of this ?
> >>
> >> Last commit was in 2013, hopefully the author is still around.
> >>
> >>> Should I open an issue on FogBugz so we don’t forget?
> >>
> >> Yes, OK.
> >>
> >>> Max
> >>>
>  On 23 Nov 2016, at 15:00, Sven Van Caekenberghe  wrote:
> 
>  Max,
> 
> > On 23 Nov 2016, at 14:34, Max Leske  wrote:
> >
> > Hi (Sven),
> >
> > Zinc can’t currently handle unicode domain names (e.g. http://üni.ch
> ). Are there any plans to implement punycode / IDNA
> conversion for Zinc? Or is there an explicit reason not to support it? I
> see that # parseHostPort: expects the host portion to be percent escaped,
> what is the use case for this? I have never seen a percent escaped host
> portion. Usually the host portion is either pure ASCII, unicode or punycode
> (in my experience at least).
> >
> > Just curious, as I just added IDNA conversion to one of our
> applications (I just let python perform the conversion:
> https://docs.python.org/2/library/codecs.html#module-encodings.idna).
> >
> > Cheers,
> > Max
> 
>  Yes, that would be nice to have.
> 
>  Just for future reference, we are talking about the following
> (IDN(A)):
> 
>  https://en.wikipedia.org/wiki/Internationalized_domain_name
>  https://en.wikipedia.org/wiki/Punycode
>  https://tools.ietf.org/html/rfc3490
>  https://www.charset.org/punycode
> 
>  Normal DNS hostnames are ASCII only (or used to be like that anyway),
> that is why it is (currently) implemented like that.
> 
>  Sven
> 
> 
> >>>
> >>
> >
> >
>
>
>


Re: [Pharo-dev] Breaking the 4GB barrier with Pharo 6 64-bit

2016-11-23 Thread p...@highoctane.be
On Wed, Nov 23, 2016 at 4:16 PM, Thierry Goubier 
wrote:

>
>
> 2016-11-23 15:46 GMT+01:00 p...@highoctane.be :
>
>> Thanks Thierry.
>>
>> Please also see that with new satellites, the resolution is ever
>> increasing (e.g. Sentinel http://m.esa.int/Our_
>> Activities/Observing_the_Earth/Copernicus/Overview4)
>>
>
> It has allways been so. Anytime you reach a reasonable size, they send a
> new satellite with higher res / larger images :)
>
>
>>
>> I understand the tile thing and indeed a lot of the algos work on tiles,
>> but there are other ways to do this and especially with real time geo
>> queries on custom defined polygons, you go only so far with tiles. A reason
>> why we are using GeoTrellis backed by Accumulo in order to pump data very
>> fast in random order.
>>
>
> But that mean you're dealing with preprocessed / graph georeferenced data
> (aka openstreetmap type of data). If you're dealing with raster, your
> polygons are approximated by a set of tiles (with a nice tile size well
> suited to your network / disk array).
>
> I had reasonable success a long time ago (1991, I think), for Ifremer,
> with an unbalanced, sort of quadtree based decomposition for highly
> irregular curves on the seabed. Tree node size / tile size was computed to
> be exactly equal to the disk block size on a very slow medium. That sort of
> work is in the line of a geographic index for a database: optimise query
> accesses to geo-referenced objects... what is hard, and probably what you
> are doing, is combining geographic queries with graph queries (give me all
> houses in Belgium within a ten minutes bus + walk trip to a primary
> school)(*)
>
> (*) One can work that out on a raster for speed. This is what GRASS does
> for example.
>
> (**) I asked a student to accelerate some raster processing on a very
> small FPGA a long time ago. Once he had understood he could pipeline the
> design to increase the frequency, he then discovered that the FPGA would
> happily grok data faster than the computer bus could provide it :) leaving
> no bandwith for the data to be written back to memory.
>

Yes, but network can be pretty fast with bonded Ethernet interfaces these
days.

>
>
>>
>> We are adding 30+ servers to the cluster at the moment just to deal with
>> the sizes as there is a project mapping energy landscape
>> https://vito.be/en/land-use/land-use/energy-landscapes. This thing is
>> throwing YARN containers and uses CPU like, intensively. It is not uncommon
>> for me to see their workload eating everything for a serious amount of CPU
>> seconds.
>>
>
> Only a few seconds ?
>

CPU-seconds, that the cluster usage unit for CPU.
http://serverfault.com/questions/138703/a-definition-for-a-cpu-second
So, says couple millions of them on a 640 core setup. CPU power is the
limiting factor in these workloads it seems.

>
>
>>
>> It would be silly not to plug Pharo into all of this infrastructure I
>> think.
>>
>
> I've had quite bad results with Pharo on compute intensive code recently,
> so I'd plan carefully how I use it. On that sort of hardware, in the
> projects I'm working on, 1000x faster than Pharo on a single node is about
> an expected target.
>

Sure, but lower level C/C++ things are run from Python or Java, so Pharo
will not do worse. The good bit about Pharo is that one can ship a
preloaded image and that is easier than sending gigabyte (!) sized uberjars
around, that Java will unzip before running, also true with Python myriad
of dependencies. An image file appears super small then.

>
>
>>
>> Especially given the PhD/Postdoc/brainiacs per square meter there. If you
>> have seen the Lost TV show, well, it kind of feels working there at that
>> place. Especially given that is is kind of hidden in the woods.
>>
>> Maybe you could have interesting interactions with them. These guys also
>> have their own nuclear reactor and geothermal drilling.
>>
>
> I'd be interested, because we're working a bit on high performance
> parallel runtimes and compilation for those. If one day you happen to be
> ready to talk about it in our place? South of Paris, not too hard to reach
> by public transport :)
>
> Sure, that would be awesome. But Q1Y17 then because my schedule is pretty
packed at the moment. I can show you the thing over the web from my side,
so you can see where are in terms of systems. I guess you are much more
advanced but one of the goals of the project here is to be pretty
approachable and gather a community that will cross pollinate algos and
datasets for network effects.

Phil


> Thierry
>
>
>
>> Phil
>>
>>
>>
>> On Wed, Nov 23, 2016 at 1:30 PM, Thierry Goubier <
>> thierry.goub...@gmail.com> wrote:
>>
>>> Hi Phil,
>>>
>>> 2016-11-23 12:17 GMT+01:00 philippe.b...@highoctane.be <
>>> philippe.b...@gmail.com>:
>>>
 [ ...]

 It is really important to have such features to avoid massive GC pauses.

 My use case is to load the data sets from here.
 https://www.google.be/url?sa=t&source=web&rct=j&ur

[Pharo-dev] [pharo-project/pharo-core] fe6296: 60303

2016-11-23 Thread GitHub
  Branch: refs/heads/6.0
  Home:   https://github.com/pharo-project/pharo-core
  Commit: fe62961ace16b96be8e62c61593e0e68924cf2d0
  
https://github.com/pharo-project/pharo-core/commit/fe62961ace16b96be8e62c61593e0e68924cf2d0
  Author: Jenkins Build Server 
  Date:   2016-11-23 (Wed, 23 Nov 2016)

  Changed paths:
M Alien.package/AlienWeakTable.class/instance/adding/add_finalizing_.st
M 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/class/accessing/project.st
R ConfigurationOfFuel.package/ConfigurationOfFuel.class/class/development 
support/DevelopmentSupport.st
R ConfigurationOfFuel.package/ConfigurationOfFuel.class/class/metacello 
tool support/isMetacelloConfig.st
R 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/class/private/addHacks.st
A 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/class/private/bootstrapMetacelloFrom_.st
A 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/class/private/bootstrapPackage_from_.st
A 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/class/private/ensureGoferVersion_repositoryUrl_.st
A 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/class/private/ensureMetacello_.st
A 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/class/private/retry_.st
A 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/class/private/retry_retryCount_.st
A ConfigurationOfFuel.package/ConfigurationOfFuel.class/class/unloading 
Metacello/unloadMetacello.st
A 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/instance/accessing/customProjectAttributes.st
M 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/instance/accessing/project.st
A 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/instance/accessing/projectClass.st
A 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/instance/accessing/project_.st
M 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/instance/baselines-1/baseline192_.st
M 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/instance/baselines-1/baseline193_.st
M 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/instance/baselines-1/baseline194_.st
M 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/instance/baselines-2/baseline200_.st
M 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/instance/baselines-2/baseline210_.st
A 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/instance/baselines-2/baseline218_.st
A 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/instance/baselines-helpers/fuelPlatform_.st
M ConfigurationOfFuel.package/ConfigurationOfFuel.class/instance/symbolic 
versions/development_.st
M ConfigurationOfFuel.package/ConfigurationOfFuel.class/instance/symbolic 
versions/stable_.st
A 
ConfigurationOfFuel.package/ConfigurationOfFuel.class/instance/versions-2/version218_.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/README.md
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/class/accessing/project.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/class/development
 support/validate.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/class/loading/loadAll.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/class/private/baseConfigurationClassIfAbsent_.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/class/private/bootstrapMetacelloFrom_.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/class/private/bootstrapPackage_from_.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/class/private/ensureGoferVersion_repositoryUrl_.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/class/private/ensureMetacello.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/class/private/ensureMetacelloBaseConfiguration.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/class/private/ensureMetacello_.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/class/private/retry_.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/class/private/retry_retryCount_.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/class/unloading
 Metacello/unloadMetacello.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/definition.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/instance/accessing/addCustomProjectAttribute_.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/instance/accessing/customProjectAttributes.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/instance/accessing/loadAll.st
A 
ConfigurationOfFuelPlatform.package/ConfigurationOfFuelPlatform.class/instance/accessing/project.s

[Pharo-dev] [pharo-project/pharo-core]

2016-11-23 Thread GitHub
  Branch: refs/tags/60303
  Home:   https://github.com/pharo-project/pharo-core


Re: [Pharo-dev] IDNA / punycode for Zinc?

2016-11-23 Thread Max Leske
:) Nice.



> On 23 Nov 2016, at 16:51, Sven Van Caekenberghe  wrote:
> 
> https://pharo.fogbugz.com/f/cases/19383/IDNA-punycode-for-Zinc
> 
> After loading 
> 
> Name: Punycode-dTriangle.8
> Author: dTriangle
> Time: 26 August 2013, 10:19:11.728 am
> UUID: 6493a3ee-43bb-44f0-86a8-5aa47a9b42ff
> Ancestors: Punycode-dTriangle.7 
> 
> I can do the following
> 
> 'http://üni.ch' asUrl.
> 
> "http://xn--ni-wka.ch/";
> 
> 'http://üni.ch' asUrl retrieveContents includesSubstring: 'Üni'.
> 
> "true"
> 
> Done ;-)
> 
> Thank you, https://twitter.com/osashimitabenai, well done !
> 
> Sven
> 
>> On 23 Nov 2016, at 15:43, Sven Van Caekenberghe  wrote:
>> 
>> 
>>> On 23 Nov 2016, at 15:36, Max Leske  wrote:
>>> 
>>> Great!
>>> 
>>> There’s a punycode implementation on smalltalkhub 
>>> (http://smalltalkhub.com/#!/~dTriangle/Punycode) but it needs some 
>>> polishing.
>> 
>> Wow, that looks good, it even has ZnUrl integration, so we're done ;-)
>> 
>> There are no tests though.
>> 
>> How come we never heard of this ?
>> 
>> Last commit was in 2013, hopefully the author is still around.
>> 
>>> Should I open an issue on FogBugz so we don’t forget?
>> 
>> Yes, OK.
>> 
>>> Max
>>> 
 On 23 Nov 2016, at 15:00, Sven Van Caekenberghe  wrote:
 
 Max,
 
> On 23 Nov 2016, at 14:34, Max Leske  wrote:
> 
> Hi (Sven),
> 
> Zinc can’t currently handle unicode domain names (e.g. http://üni.ch). 
> Are there any plans to implement punycode / IDNA conversion for Zinc? Or 
> is there an explicit reason not to support it? I see that 
> #parseHostPort: expects the host portion to be percent escaped, what is 
> the use case for this? I have never seen a percent escaped host portion. 
> Usually the host portion is either pure ASCII, unicode or punycode (in my 
> experience at least).
> 
> Just curious, as I just added IDNA conversion to one of our applications 
> (I just let python perform the conversion: 
> https://docs.python.org/2/library/codecs.html#module-encodings.idna).
> 
> Cheers,
> Max
 
 Yes, that would be nice to have.
 
 Just for future reference, we are talking about the following (IDN(A)):
 
 https://en.wikipedia.org/wiki/Internationalized_domain_name
 https://en.wikipedia.org/wiki/Punycode
 https://tools.ietf.org/html/rfc3490
 https://www.charset.org/punycode
 
 Normal DNS hostnames are ASCII only (or used to be like that anyway), that 
 is why it is (currently) implemented like that.
 
 Sven
 
 
>>> 
>> 
> 
> 




Re: [Pharo-dev] IDNA / punycode for Zinc?

2016-11-23 Thread Sven Van Caekenberghe
https://pharo.fogbugz.com/f/cases/19383/IDNA-punycode-for-Zinc

After loading 

Name: Punycode-dTriangle.8
Author: dTriangle
Time: 26 August 2013, 10:19:11.728 am
UUID: 6493a3ee-43bb-44f0-86a8-5aa47a9b42ff
Ancestors: Punycode-dTriangle.7 

I can do the following

'http://üni.ch' asUrl.

 "http://xn--ni-wka.ch/";

'http://üni.ch' asUrl retrieveContents includesSubstring: 'Üni'.

 "true"

Done ;-)

Thank you, https://twitter.com/osashimitabenai, well done !

Sven

> On 23 Nov 2016, at 15:43, Sven Van Caekenberghe  wrote:
> 
> 
>> On 23 Nov 2016, at 15:36, Max Leske  wrote:
>> 
>> Great!
>> 
>> There’s a punycode implementation on smalltalkhub 
>> (http://smalltalkhub.com/#!/~dTriangle/Punycode) but it needs some polishing.
> 
> Wow, that looks good, it even has ZnUrl integration, so we're done ;-)
> 
> There are no tests though.
> 
> How come we never heard of this ?
> 
> Last commit was in 2013, hopefully the author is still around.
> 
>> Should I open an issue on FogBugz so we don’t forget?
> 
> Yes, OK.
> 
>> Max
>> 
>>> On 23 Nov 2016, at 15:00, Sven Van Caekenberghe  wrote:
>>> 
>>> Max,
>>> 
 On 23 Nov 2016, at 14:34, Max Leske  wrote:
 
 Hi (Sven),
 
 Zinc can’t currently handle unicode domain names (e.g. http://üni.ch). Are 
 there any plans to implement punycode / IDNA conversion for Zinc? Or is 
 there an explicit reason not to support it? I see that #parseHostPort: 
 expects the host portion to be percent escaped, what is the use case for 
 this? I have never seen a percent escaped host portion. Usually the host 
 portion is either pure ASCII, unicode or punycode (in my experience at 
 least).
 
 Just curious, as I just added IDNA conversion to one of our applications 
 (I just let python perform the conversion: 
 https://docs.python.org/2/library/codecs.html#module-encodings.idna).
 
 Cheers,
 Max
>>> 
>>> Yes, that would be nice to have.
>>> 
>>> Just for future reference, we are talking about the following (IDN(A)):
>>> 
>>> https://en.wikipedia.org/wiki/Internationalized_domain_name
>>> https://en.wikipedia.org/wiki/Punycode
>>> https://tools.ietf.org/html/rfc3490
>>> https://www.charset.org/punycode
>>> 
>>> Normal DNS hostnames are ASCII only (or used to be like that anyway), that 
>>> is why it is (currently) implemented like that.
>>> 
>>> Sven
>>> 
>>> 
>> 
> 




Re: [Pharo-dev] Breaking the 4GB barrier with Pharo 6 64-bit

2016-11-23 Thread Thierry Goubier
2016-11-23 15:46 GMT+01:00 p...@highoctane.be :

> Thanks Thierry.
>
> Please also see that with new satellites, the resolution is ever
> increasing (e.g. Sentinel http://m.esa.int/Our_Activities/Observing_the_
> Earth/Copernicus/Overview4)
>

It has allways been so. Anytime you reach a reasonable size, they send a
new satellite with higher res / larger images :)


>
> I understand the tile thing and indeed a lot of the algos work on tiles,
> but there are other ways to do this and especially with real time geo
> queries on custom defined polygons, you go only so far with tiles. A reason
> why we are using GeoTrellis backed by Accumulo in order to pump data very
> fast in random order.
>

But that mean you're dealing with preprocessed / graph georeferenced data
(aka openstreetmap type of data). If you're dealing with raster, your
polygons are approximated by a set of tiles (with a nice tile size well
suited to your network / disk array).

I had reasonable success a long time ago (1991, I think), for Ifremer, with
an unbalanced, sort of quadtree based decomposition for highly irregular
curves on the seabed. Tree node size / tile size was computed to be exactly
equal to the disk block size on a very slow medium. That sort of work is in
the line of a geographic index for a database: optimise query accesses to
geo-referenced objects... what is hard, and probably what you are doing, is
combining geographic queries with graph queries (give me all houses in
Belgium within a ten minutes bus + walk trip to a primary school)(*)

(*) One can work that out on a raster for speed. This is what GRASS does
for example.

(**) I asked a student to accelerate some raster processing on a very small
FPGA a long time ago. Once he had understood he could pipeline the design
to increase the frequency, he then discovered that the FPGA would happily
grok data faster than the computer bus could provide it :) leaving no
bandwith for the data to be written back to memory.


>
> We are adding 30+ servers to the cluster at the moment just to deal with
> the sizes as there is a project mapping energy landscape
> https://vito.be/en/land-use/land-use/energy-landscapes. This thing is
> throwing YARN containers and uses CPU like, intensively. It is not uncommon
> for me to see their workload eating everything for a serious amount of CPU
> seconds.
>

Only a few seconds ?


>
> It would be silly not to plug Pharo into all of this infrastructure I
> think.
>

I've had quite bad results with Pharo on compute intensive code recently,
so I'd plan carefully how I use it. On that sort of hardware, in the
projects I'm working on, 1000x faster than Pharo on a single node is about
an expected target.


>
> Especially given the PhD/Postdoc/brainiacs per square meter there. If you
> have seen the Lost TV show, well, it kind of feels working there at that
> place. Especially given that is is kind of hidden in the woods.
>
> Maybe you could have interesting interactions with them. These guys also
> have their own nuclear reactor and geothermal drilling.
>

I'd be interested, because we're working a bit on high performance parallel
runtimes and compilation for those. If one day you happen to be ready to
talk about it in our place? South of Paris, not too hard to reach by public
transport :)

Thierry



> Phil
>
>
>
> On Wed, Nov 23, 2016 at 1:30 PM, Thierry Goubier <
> thierry.goub...@gmail.com> wrote:
>
>> Hi Phil,
>>
>> 2016-11-23 12:17 GMT+01:00 philippe.b...@highoctane.be <
>> philippe.b...@gmail.com>:
>>
>>> [ ...]
>>>
>>> It is really important to have such features to avoid massive GC pauses.
>>>
>>> My use case is to load the data sets from here.
>>> https://www.google.be/url?sa=t&source=web&rct=j&url=http://p
>>> roba-v.vgt.vito.be/sites/default/files/Product_User_Manual.p
>>> df&ved=0ahUKEwjwlOG-4L7QAhWBniwKHZVmDZcQFggpMAI&usg=AFQjCNGR
>>> ME9ZyHWQ8yCPgAQBDi1PUmzhbQ&sig2=eyaT4DlWCTjqUdQGBhFY0w
>>>
>> I've used that type of data before, a long time ago.
>>
>> I consider that tiled / on-demand block loading is the way to go for
>> those. Work with the header as long as possible, stream tiles if you need
>> to work on the full data set. There is a good chance that:
>>
>> 1- You're memory bound for anything you compute with them
>> 2- I/O times dominates, or become low enough to don't care (very fast
>> SSDs)
>> 3- It's very rare that you need full random access on the complete array
>> 4- GC doesn't matter
>>
>> Stream computing is your solution! This is how the raster GIS are
>> implemented.
>>
>> What is hard for me is manipulating a very large graph, or a sparse very
>> large structure, like a huge Famix model or a FPGA layout model with a full
>> design layed out on top. There, you're randomly accessing the whole of the
>> structure (or at least you see no obvious partition) and the structure is
>> too large for the memory or the GC.
>>
>> This is why I had a long time ago this idea of a in-memory working-set /
>> on-disk full structure with aut

Re: [Pharo-dev] About ~= and ~~

2016-11-23 Thread Clément Bera
Hi

Do you mean you want to add this pragma "" in Object >> #~~
?

Or do you mean replacing #blockCopy: by #~~ in the special selectors to get
it inlined by default like #== ?


On Wed, Nov 23, 2016 at 2:39 PM, Aliaksei Syrel 
wrote:

> Hi
>
> It is been a while...
> So, do we want to replace ~~ with a primitive? :)
>
> Cheers
> Alex
>
>
>
> --
> View this message in context: http://forum.world.st/About-
> and-tp3898409p4924391.html
> Sent from the Pharo Smalltalk Developers mailing list archive at
> Nabble.com.
>
>


Re: [Pharo-dev] Breaking the 4GB barrier with Pharo 6 64-bit

2016-11-23 Thread p...@highoctane.be
Thanks Thierry.

Please also see that with new satellites, the resolution is ever increasing
(e.g. Sentinel
http://m.esa.int/Our_Activities/Observing_the_Earth/Copernicus/Overview4)

I understand the tile thing and indeed a lot of the algos work on tiles,
but there are other ways to do this and especially with real time geo
queries on custom defined polygons, you go only so far with tiles. A reason
why we are using GeoTrellis backed by Accumulo in order to pump data very
fast in random order.

We are adding 30+ servers to the cluster at the moment just to deal with
the sizes as there is a project mapping energy landscape
https://vito.be/en/land-use/land-use/energy-landscapes. This thing is
throwing YARN containers and uses CPU like, intensively. It is not uncommon
for me to see their workload eating everything for a serious amount of CPU
seconds.

It would be silly not to plug Pharo into all of this infrastructure I
think.

Especially given the PhD/Postdoc/brainiacs per square meter there. If you
have seen the Lost TV show, well, it kind of feels working there at that
place. Especially given that is is kind of hidden in the woods.

Maybe you could have interesting interactions with them. These guys also
have their own nuclear reactor and geothermal drilling.

Phil



On Wed, Nov 23, 2016 at 1:30 PM, Thierry Goubier 
wrote:

> Hi Phil,
>
> 2016-11-23 12:17 GMT+01:00 philippe.b...@highoctane.be <
> philippe.b...@gmail.com>:
>
>> [ ...]
>>
>> It is really important to have such features to avoid massive GC pauses.
>>
>> My use case is to load the data sets from here.
>> https://www.google.be/url?sa=t&source=web&rct=j&url=http://p
>> roba-v.vgt.vito.be/sites/default/files/Product_User_Manual.
>> pdf&ved=0ahUKEwjwlOG-4L7QAhWBniwKHZVmDZcQFggpMAI&usg=AFQjCNG
>> RME9ZyHWQ8yCPgAQBDi1PUmzhbQ&sig2=eyaT4DlWCTjqUdQGBhFY0w
>>
> I've used that type of data before, a long time ago.
>
> I consider that tiled / on-demand block loading is the way to go for
> those. Work with the header as long as possible, stream tiles if you need
> to work on the full data set. There is a good chance that:
>
> 1- You're memory bound for anything you compute with them
> 2- I/O times dominates, or become low enough to don't care (very fast SSDs)
> 3- It's very rare that you need full random access on the complete array
> 4- GC doesn't matter
>
> Stream computing is your solution! This is how the raster GIS are
> implemented.
>
> What is hard for me is manipulating a very large graph, or a sparse very
> large structure, like a huge Famix model or a FPGA layout model with a full
> design layed out on top. There, you're randomly accessing the whole of the
> structure (or at least you see no obvious partition) and the structure is
> too large for the memory or the GC.
>
> This is why I had a long time ago this idea of a in-memory working-set /
> on-disk full structure with automatic determination of what the working set
> is.
>
> For pointers, have a look at the Graph500 and HPCG benchmarks, especially
> the efficiency (ratio to peak) of HPCG runs, to see how difficult these
> cases are.
>
> Regards,
>
> Thierry
>


Re: [Pharo-dev] IDNA / punycode for Zinc?

2016-11-23 Thread Sven Van Caekenberghe

> On 23 Nov 2016, at 15:36, Max Leske  wrote:
> 
> Great!
> 
> There’s a punycode implementation on smalltalkhub 
> (http://smalltalkhub.com/#!/~dTriangle/Punycode) but it needs some polishing.

Wow, that looks good, it even has ZnUrl integration, so we're done ;-)

There are no tests though.

How come we never heard of this ?

Last commit was in 2013, hopefully the author is still around.

> Should I open an issue on FogBugz so we don’t forget?

Yes, OK.

> Max
> 
>> On 23 Nov 2016, at 15:00, Sven Van Caekenberghe  wrote:
>> 
>> Max,
>> 
>>> On 23 Nov 2016, at 14:34, Max Leske  wrote:
>>> 
>>> Hi (Sven),
>>> 
>>> Zinc can’t currently handle unicode domain names (e.g. http://üni.ch). Are 
>>> there any plans to implement punycode / IDNA conversion for Zinc? Or is 
>>> there an explicit reason not to support it? I see that #parseHostPort: 
>>> expects the host portion to be percent escaped, what is the use case for 
>>> this? I have never seen a percent escaped host portion. Usually the host 
>>> portion is either pure ASCII, unicode or punycode (in my experience at 
>>> least).
>>> 
>>> Just curious, as I just added IDNA conversion to one of our applications (I 
>>> just let python perform the conversion: 
>>> https://docs.python.org/2/library/codecs.html#module-encodings.idna).
>>> 
>>> Cheers,
>>> Max
>> 
>> Yes, that would be nice to have.
>> 
>> Just for future reference, we are talking about the following (IDN(A)):
>> 
>> https://en.wikipedia.org/wiki/Internationalized_domain_name
>> https://en.wikipedia.org/wiki/Punycode
>> https://tools.ietf.org/html/rfc3490
>> https://www.charset.org/punycode
>> 
>> Normal DNS hostnames are ASCII only (or used to be like that anyway), that 
>> is why it is (currently) implemented like that.
>> 
>> Sven
>> 
>> 
> 




Re: [Pharo-dev] IDNA / punycode for Zinc?

2016-11-23 Thread Max Leske
Great!

There’s a punycode implementation on smalltalkhub 
(http://smalltalkhub.com/#!/~dTriangle/Punycode 
) but it needs some polishing.

Should I open an issue on FogBugz so we don’t forget?

Max

> On 23 Nov 2016, at 15:00, Sven Van Caekenberghe  wrote:
> 
> Max,
> 
>> On 23 Nov 2016, at 14:34, Max Leske  wrote:
>> 
>> Hi (Sven),
>> 
>> Zinc can’t currently handle unicode domain names (e.g. http://üni.ch). Are 
>> there any plans to implement punycode / IDNA conversion for Zinc? Or is 
>> there an explicit reason not to support it? I see that #parseHostPort: 
>> expects the host portion to be percent escaped, what is the use case for 
>> this? I have never seen a percent escaped host portion. Usually the host 
>> portion is either pure ASCII, unicode or punycode (in my experience at 
>> least).
>> 
>> Just curious, as I just added IDNA conversion to one of our applications (I 
>> just let python perform the conversion: 
>> https://docs.python.org/2/library/codecs.html#module-encodings.idna).
>> 
>> Cheers,
>> Max
> 
> Yes, that would be nice to have.
> 
> Just for future reference, we are talking about the following (IDN(A)):
> 
> https://en.wikipedia.org/wiki/Internationalized_domain_name
> https://en.wikipedia.org/wiki/Punycode
> https://tools.ietf.org/html/rfc3490
> https://www.charset.org/punycode
> 
> Normal DNS hostnames are ASCII only (or used to be like that anyway), that is 
> why it is (currently) implemented like that.
> 
> Sven
> 
> 



Re: [Pharo-dev] IDNA / punycode for Zinc?

2016-11-23 Thread Sven Van Caekenberghe
Max,

> On 23 Nov 2016, at 14:34, Max Leske  wrote:
> 
> Hi (Sven),
> 
> Zinc can’t currently handle unicode domain names (e.g. http://üni.ch). Are 
> there any plans to implement punycode / IDNA conversion for Zinc? Or is there 
> an explicit reason not to support it? I see that #parseHostPort: expects the 
> host portion to be percent escaped, what is the use case for this? I have 
> never seen a percent escaped host portion. Usually the host portion is either 
> pure ASCII, unicode or punycode (in my experience at least).
> 
> Just curious, as I just added IDNA conversion to one of our applications (I 
> just let python perform the conversion: 
> https://docs.python.org/2/library/codecs.html#module-encodings.idna).
> 
> Cheers,
> Max

Yes, that would be nice to have.

Just for future reference, we are talking about the following (IDN(A)):

 https://en.wikipedia.org/wiki/Internationalized_domain_name
 https://en.wikipedia.org/wiki/Punycode
 https://tools.ietf.org/html/rfc3490
 https://www.charset.org/punycode

Normal DNS hostnames are ASCII only (or used to be like that anyway), that is 
why it is (currently) implemented like that.

Sven




Re: [Pharo-dev] About ~= and ~~

2016-11-23 Thread Aliaksei Syrel
Hi

It is been a while...
So, do we want to replace ~~ with a primitive? :)

Cheers
Alex



--
View this message in context: 
http://forum.world.st/About-and-tp3898409p4924391.html
Sent from the Pharo Smalltalk Developers mailing list archive at Nabble.com.



[Pharo-dev] IDNA / punycode for Zinc?

2016-11-23 Thread Max Leske
Hi (Sven),

Zinc can’t currently handle unicode domain names (e.g. http://üni.ch 
). Are there any plans to implement punycode / IDNA 
conversion for Zinc? Or is there an explicit reason not to support it? I see 
that #parseHostPort: expects the host portion to be percent escaped, what is 
the use case for this? I have never seen a percent escaped host portion. 
Usually the host portion is either pure ASCII, unicode or punycode (in my 
experience at least).

Just curious, as I just added IDNA conversion to one of our applications (I 
just let python perform the conversion: 
https://docs.python.org/2/library/codecs.html#module-encodings.idna 
).

Cheers,
Max

Re: [Pharo-dev] Breaking the 4GB barrier with Pharo 6 64-bit

2016-11-23 Thread Thierry Goubier
Hi Phil,

2016-11-23 12:17 GMT+01:00 philippe.b...@highoctane.be <
philippe.b...@gmail.com>:

> [ ...]
>
> It is really important to have such features to avoid massive GC pauses.
>
> My use case is to load the data sets from here.
> https://www.google.be/url?sa=t&source=web&rct=j&url=http://
> proba-v.vgt.vito.be/sites/default/files/Product_User_
> Manual.pdf&ved=0ahUKEwjwlOG-4L7QAhWBniwKHZVmDZcQFggpMAI&usg=
> AFQjCNGRME9ZyHWQ8yCPgAQBDi1PUmzhbQ&sig2=eyaT4DlWCTjqUdQGBhFY0w
>
I've used that type of data before, a long time ago.

I consider that tiled / on-demand block loading is the way to go for those.
Work with the header as long as possible, stream tiles if you need to work
on the full data set. There is a good chance that:

1- You're memory bound for anything you compute with them
2- I/O times dominates, or become low enough to don't care (very fast SSDs)
3- It's very rare that you need full random access on the complete array
4- GC doesn't matter

Stream computing is your solution! This is how the raster GIS are
implemented.

What is hard for me is manipulating a very large graph, or a sparse very
large structure, like a huge Famix model or a FPGA layout model with a full
design layed out on top. There, you're randomly accessing the whole of the
structure (or at least you see no obvious partition) and the structure is
too large for the memory or the GC.

This is why I had a long time ago this idea of a in-memory working-set /
on-disk full structure with automatic determination of what the working set
is.

For pointers, have a look at the Graph500 and HPCG benchmarks, especially
the efficiency (ratio to peak) of HPCG runs, to see how difficult these
cases are.

Regards,

Thierry


Re: [Pharo-dev] STON doesn't produce valid JSON - it shouldn't escape quation mark

2016-11-23 Thread Peter Uhnak
Thanks!

P

On Sun, Nov 20, 2016 at 08:02:35PM +0100, Sven Van Caekenberghe wrote:
> I committed the following in STON (sthub.com, ss3.gemstone.com & github.com) 
> #bleedingEdge:
> 
> ===
> Name: STON-Core-SvenVanCaekenberghe.78
> Author: SvenVanCaekenberghe
> Time: 20 November 2016, 7:48:23.799323 pm
> UUID: ee198da8-80e8-4944-ba9b-dffae331c57c
> Ancestors: STON-Core-SvenVanCaekenberghe.77
> 
> In JSON compatibility mode STON should not encode single quotes. (Thanks 
> Peter Uhnák for reporting this). Patch STONWriter>>#encodeCharacter:
> 
> Fix the encoding of forward slash ($/). This was already in the (external) 
> documentation. Patch STONWriter class>>#initializeSTONCharacters - force 
> STONWriter class>>#initialize to change too
> 
> Fix the representation of Time by adding nanoseconds when there are any. This 
> was already in the (external) documentation. Patch Time>>#stonOn:
> 
> Adjust unit tests accordingly (esp. 
> STONWriterTests>>#testDoubleQuotedString). 
> 
> Minor comment changes.
> ===
> Name: STON-Tests-SvenVanCaekenberghe.68
> Author: SvenVanCaekenberghe
> Time: 20 November 2016, 7:48:46.115446 pm
> UUID: 70402f28-f0e8-44e9-8746-3539c4add09b
> Ancestors: STON-Tests-SvenVanCaekenberghe.67
> 
> In JSON compatibility mode STON should not encode single quotes. (Thanks 
> Peter Uhnák for reporting this). Patch STONWriter>>#encodeCharacter:
> 
> Fix the encoding of forward slash ($/). This was already in the (external) 
> documentation. Patch STONWriter class>>#initializeSTONCharacters - force 
> STONWriter class>>#initialize to change too
> 
> Fix the representation of Time by adding nanoseconds when there are any. This 
> was already in the (external) documentation. Patch Time>>#stonOn:
> 
> Adjust unit tests accordingly (esp. 
> STONWriterTests>>#testDoubleQuotedString). 
> 
> Minor comment changes.
> ===
> 
> Now you get
> 
> NeoJSONReader fromString: (STON toJsonString: 'single='' & double="').
>  "'single='' & double=""'"
> 
> STON toString: { Time now. Date today. DateAndTime now }.
>  
> "'[Time[''19:51:15.452023''],Date[''2016-11-20''],DateAndTime[''2016-11-20T19:51:15.454147+01:00'']]'"
> 
> STON toString: 'back-slash=\ & forward-slash=/'. 
>  "'''back-slash=\\ & forward-slash=\/'''"
> 
> Sven
> 
> > On 19 Nov 2016, at 20:56, Sven Van Caekenberghe  wrote:
> > 
> > Peter,
> > 
> > You are right, that is a bug. 
> > 
> > STON should be a bit more strict when it is in JSON mode. I'll fix it and 
> > let you know.
> > 
> > Thanks for reporting this issue.
> > 
> > Sven
> > 
> >> On 19 Nov 2016, at 17:46,   wrote:
> >> 
> >> \’ is not a valid JSON escape character, so STON’s JSON ouput is not a 
> >> valid JSON string.
> >> 
> >> ston := STON toJsonString: 'element''s'. -> "element\'s"
> >> neo := NeoJSONWriter toString: 'element''s'. -> "element's"
> >> 
> >> NeoJSONReader fromString: ston. -> invalid escape character \’
> >> 
> >> Peter
> > 
> 
> 



Re: [Pharo-dev] Breaking the 4GB barrier with Pharo 6 64-bit

2016-11-23 Thread philippe.b...@highoctane.be
Le 23 nov. 2016 12:07, "Igor Stasenko"  a écrit :
>
>
>
> On 23 November 2016 at 12:41, p...@highoctane.be 
wrote:
>>
>>
>>
>> On Wed, Nov 23, 2016 at 10:51 AM, Igor Stasenko 
wrote:
>>>
>>>
>>>
>>> On 23 November 2016 at 10:50, p...@highoctane.be 
wrote:



 On Wed, Nov 23, 2016 at 12:53 AM, Eliot Miranda <
eliot.mira...@gmail.com> wrote:
>
>
>
> On Tue, Nov 22, 2016 at 10:26 AM, Sven Van Caekenberghe 
wrote:
>>
>>
>> > On 22 Nov 2016, at 19:16, p...@highoctane.be wrote:
>> >
>> >
>> >
>> > On Tue, Nov 22, 2016 at 5:57 PM, Igor Stasenko 
wrote:
>> >
>> >
>> > On 15 November 2016 at 02:18, Eliot Miranda <
eliot.mira...@gmail.com> wrote:
>> > Hi Phil,
>> >
>> > On Thu, Nov 10, 2016 at 2:19 AM, p...@highoctane.be <
p...@highoctane.be> wrote:
>> >
>> >
>> > On Thu, Nov 10, 2016 at 10:31 AM, Denis Kudriashov <
dionisi...@gmail.com> wrote:
>> >
>> > 2016-11-10 9:49 GMT+01:00 p...@highoctane.be :
>> > Ah, but then it may be more interesting to have a data image
(maybe a lot of these) and a front end image.
>> >
>> > Isn't Seamless something that could help us here? No need to bring
the data back, just manipulate it through proxies.
>> >
>> > Problem that server image will anyway perform GC. And it will be
slow if server image is big which will stop all world.
>> >
>> > What if we asked it to not do any GC at all? Like if we have tons
of RAM, why bother? Especially if what it is used to is to keep datasets:
load them, save image to disk. When needed trash the loaded stuff and
reload from zero.
>> >
>> > Basically that is what happens with Spark.
>> >
>> >
http://sujee.net/2015/01/22/understanding-spark-caching/#.WCRIgy0rKpo
>> > https://0x0fff.com/spark-misconceptions/
>> >
>> > While global GC may not be useful for big-data scavenging probably
will be for any non-trivial query.  But I think I see a misconception
here.  The large RAM on a multiword machine would be divided up between the
cores.  It makes no sense to run a single Smalltalk across lots of cores
(we're a long way from having a thread-safe class library).  It makes much
more sense to have one Smalltalk per core.  So that brings the heap sizes
down and makes GC less scary.
>> >
>> > yep, that approach what we're tried in HydraVM
>> >
>> >
>> > and Tachyon/Alluxio is kind of solving this kind of issue (may be
nice to have that interacting with Pharo image). http://www.alluxio.org/
This thing basically keeps stuff in memory in case one needs to reuse the
data between workload runs.
>> >
>> > Sure.  We have all the facilities we need to do this.  We can add
and remove code at runtime so we can keep live instances running, and send
the code to them along with the data we want them to crunch.
>> >
>> >
>> > Or have an object memory for work and one for datasets (first one
gets GC'd, the other one isn't).
>> >
>> > Or have policies which one can switch.  There are quite a few
levers into the GC from the image and one can easily switch off global GC
with the right levers.  One doesn't need a VM that doesn't contain a GC.
One needs an image that is using the right policy.
>> >
>> > or just mark whole data (sub)graphs with some bit, telling GC to
skip over this so it won't attempt to scan it treating them as always
alive..
>> > this is where we getting back to my idea of heap spaces, where you
can toss a subgraph into a special heap space that has such policy, that it
is never scanned/GCed automatically and can be triggered only manually or
something like that.
>> >
>> > Could be very useful for all kinds of large binary data, like
videos and sounds that we can load once and keep in the heap space.
>> >
>> > How hard would it be to get something like that?
>>
>> Large binary data poses no problem (as long as it's not a copying
GC). Since a binary blob contains no subpointers, no work needs to be done.
A 1M or 1G ByteArray is the same amount of GC work.
>
>
> +1


 Amen to that. But a dataset made of a gazillion of composites is not
the same, right?

>>> yep, as soon as you have references in your data, you add more work for
GC
>>
>>
>> That's what I tought. I have seen Craig Latta marking some objects with
special flags in the object headers. Could there be some generic mechanism
there now that we have 64-bit, super large headers? Like setting/resetting
a kind of bitmask to let some spaces be GC'd or left alone? Things that we
could manage image side?
>>
> well, adding bit(s) is just a simplest part of story. the main one is
implement GC discipline to not walk over marked object(s), but as well, is
by having a mechanism to ensure that marked object(s) form a closed
subgraph (i.e. there's no references coming outside of it)
> scanning+marking a graph is usually a simple matter, you just need to

Re: [Pharo-dev] Breaking the 4GB barrier with Pharo 6 64-bit

2016-11-23 Thread Igor Stasenko
On 23 November 2016 at 12:41, p...@highoctane.be  wrote:

>
>
> On Wed, Nov 23, 2016 at 10:51 AM, Igor Stasenko 
> wrote:
>
>>
>>
>> On 23 November 2016 at 10:50, p...@highoctane.be 
>> wrote:
>>
>>>
>>>
>>> On Wed, Nov 23, 2016 at 12:53 AM, Eliot Miranda >> > wrote:
>>>


 On Tue, Nov 22, 2016 at 10:26 AM, Sven Van Caekenberghe 
 wrote:

>
> > On 22 Nov 2016, at 19:16, p...@highoctane.be wrote:
> >
> >
> >
> > On Tue, Nov 22, 2016 at 5:57 PM, Igor Stasenko 
> wrote:
> >
> >
> > On 15 November 2016 at 02:18, Eliot Miranda 
> wrote:
> > Hi Phil,
> >
> > On Thu, Nov 10, 2016 at 2:19 AM, p...@highoctane.be <
> p...@highoctane.be> wrote:
> >
> >
> > On Thu, Nov 10, 2016 at 10:31 AM, Denis Kudriashov <
> dionisi...@gmail.com> wrote:
> >
> > 2016-11-10 9:49 GMT+01:00 p...@highoctane.be :
> > Ah, but then it may be more interesting to have a data image (maybe
> a lot of these) and a front end image.
> >
> > Isn't Seamless something that could help us here? No need to bring
> the data back, just manipulate it through proxies.
> >
> > Problem that server image will anyway perform GC. And it will be
> slow if server image is big which will stop all world.
> >
> > What if we asked it to not do any GC at all? Like if we have tons of
> RAM, why bother? Especially if what it is used to is to keep datasets: 
> load
> them, save image to disk. When needed trash the loaded stuff and reload
> from zero.
> >
> > Basically that is what happens with Spark.
> >
> > http://sujee.net/2015/01/22/understanding-spark-caching/#.WC
> RIgy0rKpo
> > https://0x0fff.com/spark-misconceptions/
> >
> > While global GC may not be useful for big-data scavenging probably
> will be for any non-trivial query.  But I think I see a misconception
> here.  The large RAM on a multiword machine would be divided up between 
> the
> cores.  It makes no sense to run a single Smalltalk across lots of cores
> (we're a long way from having a thread-safe class library).  It makes much
> more sense to have one Smalltalk per core.  So that brings the heap sizes
> down and makes GC less scary.
> >
> > yep, that approach what we're tried in HydraVM
> >
> >
> > and Tachyon/Alluxio is kind of solving this kind of issue (may be
> nice to have that interacting with Pharo image).
> http://www.alluxio.org/ This thing basically keeps stuff in memory in
> case one needs to reuse the data between workload runs.
> >
> > Sure.  We have all the facilities we need to do this.  We can add
> and remove code at runtime so we can keep live instances running, and send
> the code to them along with the data we want them to crunch.
> >
> >
> > Or have an object memory for work and one for datasets (first one
> gets GC'd, the other one isn't).
> >
> > Or have policies which one can switch.  There are quite a few levers
> into the GC from the image and one can easily switch off global GC with 
> the
> right levers.  One doesn't need a VM that doesn't contain a GC.  One needs
> an image that is using the right policy.
> >
> > or just mark whole data (sub)graphs with some bit, telling GC to
> skip over this so it won't attempt to scan it treating them as always
> alive..
> > this is where we getting back to my idea of heap spaces, where you
> can toss a subgraph into a special heap space that has such policy, that 
> it
> is never scanned/GCed automatically and can be triggered only manually or
> something like that.
> >
> > Could be very useful for all kinds of large binary data, like videos
> and sounds that we can load once and keep in the heap space.
> >
> > How hard would it be to get something like that?
>
> Large binary data poses no problem (as long as it's not a copying GC).
> Since a binary blob contains no subpointers, no work needs to be done. A 
> 1M
> or 1G ByteArray is the same amount of GC work.
>

 +1

>>>
>>> Amen to that. But a dataset made of a gazillion of composites is not the
>>> same, right?
>>>
>>> yep, as soon as you have references in your data, you add more work for
>> GC
>>
>
> That's what I tought. I have seen Craig Latta marking some objects with
> special flags in the object headers. Could there be some generic mechanism
> there now that we have 64-bit, super large headers? Like setting/resetting
> a kind of bitmask to let some spaces be GC'd or left alone? Things that we
> could manage image side?
>
> well, adding bit(s) is just a simplest part of story. the main one is
implement GC discipline to not walk over marked object(s), but as well, is
by having a mechanism to ensure that marked object(s) form a closed
subgraph (i.e. there's no references coming outside 

Re: [Pharo-dev] Breaking the 4GB barrier with Pharo 6 64-bit

2016-11-23 Thread p...@highoctane.be
On Wed, Nov 23, 2016 at 10:51 AM, Igor Stasenko  wrote:

>
>
> On 23 November 2016 at 10:50, p...@highoctane.be 
> wrote:
>
>>
>>
>> On Wed, Nov 23, 2016 at 12:53 AM, Eliot Miranda 
>> wrote:
>>
>>>
>>>
>>> On Tue, Nov 22, 2016 at 10:26 AM, Sven Van Caekenberghe 
>>> wrote:
>>>

 > On 22 Nov 2016, at 19:16, p...@highoctane.be wrote:
 >
 >
 >
 > On Tue, Nov 22, 2016 at 5:57 PM, Igor Stasenko 
 wrote:
 >
 >
 > On 15 November 2016 at 02:18, Eliot Miranda 
 wrote:
 > Hi Phil,
 >
 > On Thu, Nov 10, 2016 at 2:19 AM, p...@highoctane.be <
 p...@highoctane.be> wrote:
 >
 >
 > On Thu, Nov 10, 2016 at 10:31 AM, Denis Kudriashov <
 dionisi...@gmail.com> wrote:
 >
 > 2016-11-10 9:49 GMT+01:00 p...@highoctane.be :
 > Ah, but then it may be more interesting to have a data image (maybe a
 lot of these) and a front end image.
 >
 > Isn't Seamless something that could help us here? No need to bring
 the data back, just manipulate it through proxies.
 >
 > Problem that server image will anyway perform GC. And it will be slow
 if server image is big which will stop all world.
 >
 > What if we asked it to not do any GC at all? Like if we have tons of
 RAM, why bother? Especially if what it is used to is to keep datasets: load
 them, save image to disk. When needed trash the loaded stuff and reload
 from zero.
 >
 > Basically that is what happens with Spark.
 >
 > http://sujee.net/2015/01/22/understanding-spark-caching/#.WCRIgy0rKpo
 > https://0x0fff.com/spark-misconceptions/
 >
 > While global GC may not be useful for big-data scavenging probably
 will be for any non-trivial query.  But I think I see a misconception
 here.  The large RAM on a multiword machine would be divided up between the
 cores.  It makes no sense to run a single Smalltalk across lots of cores
 (we're a long way from having a thread-safe class library).  It makes much
 more sense to have one Smalltalk per core.  So that brings the heap sizes
 down and makes GC less scary.
 >
 > yep, that approach what we're tried in HydraVM
 >
 >
 > and Tachyon/Alluxio is kind of solving this kind of issue (may be
 nice to have that interacting with Pharo image).
 http://www.alluxio.org/ This thing basically keeps stuff in memory in
 case one needs to reuse the data between workload runs.
 >
 > Sure.  We have all the facilities we need to do this.  We can add and
 remove code at runtime so we can keep live instances running, and send the
 code to them along with the data we want them to crunch.
 >
 >
 > Or have an object memory for work and one for datasets (first one
 gets GC'd, the other one isn't).
 >
 > Or have policies which one can switch.  There are quite a few levers
 into the GC from the image and one can easily switch off global GC with the
 right levers.  One doesn't need a VM that doesn't contain a GC.  One needs
 an image that is using the right policy.
 >
 > or just mark whole data (sub)graphs with some bit, telling GC to skip
 over this so it won't attempt to scan it treating them as always alive..
 > this is where we getting back to my idea of heap spaces, where you
 can toss a subgraph into a special heap space that has such policy, that it
 is never scanned/GCed automatically and can be triggered only manually or
 something like that.
 >
 > Could be very useful for all kinds of large binary data, like videos
 and sounds that we can load once and keep in the heap space.
 >
 > How hard would it be to get something like that?

 Large binary data poses no problem (as long as it's not a copying GC).
 Since a binary blob contains no subpointers, no work needs to be done. A 1M
 or 1G ByteArray is the same amount of GC work.

>>>
>>> +1
>>>
>>
>> Amen to that. But a dataset made of a gazillion of composites is not the
>> same, right?
>>
>> yep, as soon as you have references in your data, you add more work for GC
>

That's what I tought. I have seen Craig Latta marking some objects with
special flags in the object headers. Could there be some generic mechanism
there now that we have 64-bit, super large headers? Like setting/resetting
a kind of bitmask to let some spaces be GC'd or left alone? Things that we
could manage image side?

(damn, I need more money in the bank to let me work on these things for a
long stretch, it is so frustrating ).

Phil


>
>> Phil
>>
>>>
>>> _,,,^..^,,,_
>>> best, Eliot
>>>
>>
>>
>
>
> --
> Best regards,
> Igor Stasenko.
>


Re: [Pharo-dev] Breaking the 4GB barrier with Pharo 6 64-bit

2016-11-23 Thread Igor Stasenko
On 23 November 2016 at 10:50, p...@highoctane.be  wrote:

>
>
> On Wed, Nov 23, 2016 at 12:53 AM, Eliot Miranda 
> wrote:
>
>>
>>
>> On Tue, Nov 22, 2016 at 10:26 AM, Sven Van Caekenberghe 
>> wrote:
>>
>>>
>>> > On 22 Nov 2016, at 19:16, p...@highoctane.be wrote:
>>> >
>>> >
>>> >
>>> > On Tue, Nov 22, 2016 at 5:57 PM, Igor Stasenko 
>>> wrote:
>>> >
>>> >
>>> > On 15 November 2016 at 02:18, Eliot Miranda 
>>> wrote:
>>> > Hi Phil,
>>> >
>>> > On Thu, Nov 10, 2016 at 2:19 AM, p...@highoctane.be <
>>> p...@highoctane.be> wrote:
>>> >
>>> >
>>> > On Thu, Nov 10, 2016 at 10:31 AM, Denis Kudriashov <
>>> dionisi...@gmail.com> wrote:
>>> >
>>> > 2016-11-10 9:49 GMT+01:00 p...@highoctane.be :
>>> > Ah, but then it may be more interesting to have a data image (maybe a
>>> lot of these) and a front end image.
>>> >
>>> > Isn't Seamless something that could help us here? No need to bring the
>>> data back, just manipulate it through proxies.
>>> >
>>> > Problem that server image will anyway perform GC. And it will be slow
>>> if server image is big which will stop all world.
>>> >
>>> > What if we asked it to not do any GC at all? Like if we have tons of
>>> RAM, why bother? Especially if what it is used to is to keep datasets: load
>>> them, save image to disk. When needed trash the loaded stuff and reload
>>> from zero.
>>> >
>>> > Basically that is what happens with Spark.
>>> >
>>> > http://sujee.net/2015/01/22/understanding-spark-caching/#.WCRIgy0rKpo
>>> > https://0x0fff.com/spark-misconceptions/
>>> >
>>> > While global GC may not be useful for big-data scavenging probably
>>> will be for any non-trivial query.  But I think I see a misconception
>>> here.  The large RAM on a multiword machine would be divided up between the
>>> cores.  It makes no sense to run a single Smalltalk across lots of cores
>>> (we're a long way from having a thread-safe class library).  It makes much
>>> more sense to have one Smalltalk per core.  So that brings the heap sizes
>>> down and makes GC less scary.
>>> >
>>> > yep, that approach what we're tried in HydraVM
>>> >
>>> >
>>> > and Tachyon/Alluxio is kind of solving this kind of issue (may be nice
>>> to have that interacting with Pharo image). http://www.alluxio.org/
>>> This thing basically keeps stuff in memory in case one needs to reuse the
>>> data between workload runs.
>>> >
>>> > Sure.  We have all the facilities we need to do this.  We can add and
>>> remove code at runtime so we can keep live instances running, and send the
>>> code to them along with the data we want them to crunch.
>>> >
>>> >
>>> > Or have an object memory for work and one for datasets (first one gets
>>> GC'd, the other one isn't).
>>> >
>>> > Or have policies which one can switch.  There are quite a few levers
>>> into the GC from the image and one can easily switch off global GC with the
>>> right levers.  One doesn't need a VM that doesn't contain a GC.  One needs
>>> an image that is using the right policy.
>>> >
>>> > or just mark whole data (sub)graphs with some bit, telling GC to skip
>>> over this so it won't attempt to scan it treating them as always alive..
>>> > this is where we getting back to my idea of heap spaces, where you can
>>> toss a subgraph into a special heap space that has such policy, that it is
>>> never scanned/GCed automatically and can be triggered only manually or
>>> something like that.
>>> >
>>> > Could be very useful for all kinds of large binary data, like videos
>>> and sounds that we can load once and keep in the heap space.
>>> >
>>> > How hard would it be to get something like that?
>>>
>>> Large binary data poses no problem (as long as it's not a copying GC).
>>> Since a binary blob contains no subpointers, no work needs to be done. A 1M
>>> or 1G ByteArray is the same amount of GC work.
>>>
>>
>> +1
>>
>
> Amen to that. But a dataset made of a gazillion of composites is not the
> same, right?
>
> yep, as soon as you have references in your data, you add more work for GC


> Phil
>
>>
>> _,,,^..^,,,_
>> best, Eliot
>>
>
>


-- 
Best regards,
Igor Stasenko.


Re: [Pharo-dev] Breaking the 4GB barrier with Pharo 6 64-bit

2016-11-23 Thread p...@highoctane.be
On Wed, Nov 23, 2016 at 12:53 AM, Eliot Miranda 
wrote:

>
>
> On Tue, Nov 22, 2016 at 10:26 AM, Sven Van Caekenberghe 
> wrote:
>
>>
>> > On 22 Nov 2016, at 19:16, p...@highoctane.be wrote:
>> >
>> >
>> >
>> > On Tue, Nov 22, 2016 at 5:57 PM, Igor Stasenko 
>> wrote:
>> >
>> >
>> > On 15 November 2016 at 02:18, Eliot Miranda 
>> wrote:
>> > Hi Phil,
>> >
>> > On Thu, Nov 10, 2016 at 2:19 AM, p...@highoctane.be 
>> wrote:
>> >
>> >
>> > On Thu, Nov 10, 2016 at 10:31 AM, Denis Kudriashov <
>> dionisi...@gmail.com> wrote:
>> >
>> > 2016-11-10 9:49 GMT+01:00 p...@highoctane.be :
>> > Ah, but then it may be more interesting to have a data image (maybe a
>> lot of these) and a front end image.
>> >
>> > Isn't Seamless something that could help us here? No need to bring the
>> data back, just manipulate it through proxies.
>> >
>> > Problem that server image will anyway perform GC. And it will be slow
>> if server image is big which will stop all world.
>> >
>> > What if we asked it to not do any GC at all? Like if we have tons of
>> RAM, why bother? Especially if what it is used to is to keep datasets: load
>> them, save image to disk. When needed trash the loaded stuff and reload
>> from zero.
>> >
>> > Basically that is what happens with Spark.
>> >
>> > http://sujee.net/2015/01/22/understanding-spark-caching/#.WCRIgy0rKpo
>> > https://0x0fff.com/spark-misconceptions/
>> >
>> > While global GC may not be useful for big-data scavenging probably will
>> be for any non-trivial query.  But I think I see a misconception here.  The
>> large RAM on a multiword machine would be divided up between the cores.  It
>> makes no sense to run a single Smalltalk across lots of cores (we're a long
>> way from having a thread-safe class library).  It makes much more sense to
>> have one Smalltalk per core.  So that brings the heap sizes down and makes
>> GC less scary.
>> >
>> > yep, that approach what we're tried in HydraVM
>> >
>> >
>> > and Tachyon/Alluxio is kind of solving this kind of issue (may be nice
>> to have that interacting with Pharo image). http://www.alluxio.org/ This
>> thing basically keeps stuff in memory in case one needs to reuse the data
>> between workload runs.
>> >
>> > Sure.  We have all the facilities we need to do this.  We can add and
>> remove code at runtime so we can keep live instances running, and send the
>> code to them along with the data we want them to crunch.
>> >
>> >
>> > Or have an object memory for work and one for datasets (first one gets
>> GC'd, the other one isn't).
>> >
>> > Or have policies which one can switch.  There are quite a few levers
>> into the GC from the image and one can easily switch off global GC with the
>> right levers.  One doesn't need a VM that doesn't contain a GC.  One needs
>> an image that is using the right policy.
>> >
>> > or just mark whole data (sub)graphs with some bit, telling GC to skip
>> over this so it won't attempt to scan it treating them as always alive..
>> > this is where we getting back to my idea of heap spaces, where you can
>> toss a subgraph into a special heap space that has such policy, that it is
>> never scanned/GCed automatically and can be triggered only manually or
>> something like that.
>> >
>> > Could be very useful for all kinds of large binary data, like videos
>> and sounds that we can load once and keep in the heap space.
>> >
>> > How hard would it be to get something like that?
>>
>> Large binary data poses no problem (as long as it's not a copying GC).
>> Since a binary blob contains no subpointers, no work needs to be done. A 1M
>> or 1G ByteArray is the same amount of GC work.
>>
>
> +1
>

Amen to that. But a dataset made of a gazillion of composites is not the
same, right?

Phil

>
> _,,,^..^,,,_
> best, Eliot
>