Re: Feedback request regarding speed optimisations in trace mode

2019-06-14 Thread Joshua Root
On 2019-6-13 16:40 , Mihir Luthra wrote:
> Hi,
>  
> 
> I would like to see the time for each run (if 10 runs, then 10 columns
> i.e., xx_run1, xx_run2, ...), rather than only average of them.
> 
> Collect as much as insights we could, maybe we find some pattern or
> something that might help us (not sure what though). Since I have a
> feeling, we might get better performance in one run and not so good in
> another.
> 
> 
> Thats true, that’s what troubling me the most. Before testing I read on
> stack overflow that sys+user time is always correct,
> but the most disturbing factor was that same port gives results
> significant differences on multiple runs. Like in case of gettext +
> reps, sometimes I get time as high as 13 mins and sometimes 11:30.
> Differences are also noticeable in sys and user times.
> Also I always made these tests on a completely new installation of the base.

With modern macOS and modern hardware there are performance effects that
you can't control, perhaps most significantly thermal throttling, but
also background jobs like Time Machine backups and iCloud sync.

You need to apply a little bit of statistical theory. Standard error
increases with variance, but can be decreased by increasing the sample
size. Calculate what sample size is needed to bring the standard error
down to an acceptable level.

> In an ideal scenario, I would prefer a clean environment (can be a
> fresh install of OS, bare minimum apps running, reboot before each
> run, etc.) to run each test (containers maybe? or some CI like travis)
> which might not be affected by other processes running in background.
> 
> I didn’t reboot the system last time when I tested, but I had all my
> apps closed and only 2 tabs open in terminal, 1 for modified base and 1
> for original. Also I didn’t make any runs simultaneously. This time I
> will reboot after each run, but testing like that for sure Is gonna take
> some days for a good number of ports.
> Fresh install of OS probably would be hard for me, can’t it be like a
> guest user.

I don't think there is much value in going to so much effort to ensure a
completely cold cache before starting. Close other apps yes, but don't
reboot or reinstall everything. Since we don't have control over disk
cache misses, and a warm cache is more likely in the real world anyway,
it's common practice to pre-warm the cache by doing one run before you
start recording results.

- Josh


Re: sqlite3 database in macports

2019-06-14 Thread Joshua Root
On 2019-6-13 16:27 , Mihir Luthra wrote:
> In general what I found is, like in port gettext, out of 400,000 calls
> to the sever, 50,000 calls need to get queried through registry.
> Remaining 350,000 are always allowed or always denied prefixes such as /bin.
> Till now I was simply caching any path returned from
> __darwintrace_sandbox_check(). Before I never noticed 350,000 times
> server is asked only for prefixes. This led to many cache miss in case
> of prefix searches.
> When I do caching of all this data, making shared memory capable of
> handling prefixes also. only “300” calls actually go to server. This
> actually means most of the time socket doesn’t even need to get setup.
> Doing this, gettext took (real:11m.9.701s,
> user:12m47.990s,sys:6m51.499s) which is pretty much an improvement
> compared to last time.

Excellent.

> Although due to __darwintrace_setup(), being called every time before
> path lookup, setup gets done every time for no reason. Also path
> normalisation code is mixed up with path lookup . I guess I should try
> and modularise them somehow before next testing.

Yes. It sounds like it would be worth doing the setup on demand.

> And most importantly, do some profiling so you know what is actually
> taking the time. Even if you just print out a bunch of timestamps at
> different places you at least know something.
> 
> 
> That really helped.

I thought it might. :)

- Josh


Re: Feedback request regarding speed optimisations in trace mode

2019-06-14 Thread Mihir Luthra
Hi,


>
> Not necessary to stick to 10 runs, just because I said so. It can be
> even 3 or 5 runs. Whichever you feel seem to be good enough to give
> some insights. E.g., you might feel that after 3 runs you are getting
> constant time and not much difference, in another case you might get
> all different values even on running it more than 3-5 times.
>

That gave me some good insights. Rebooting surely keeps results consistent
to some level. Also, I noticed with subsequent runs, it would slow down(not
much, but does). After rebooting also only one thing remains consistent
which is the temperature of my Mac. I searched google[1] and it mentions
that as the Mac heats up more, the fans start running fast and processes
slow down. Generally Mac heats up to a nice extent when I try installing
ports continuously(even after reboot). Also my Mac has a weird
configuration(2012 mid edition, dual core, running Mojave with a WD ssd and
8gb symtronics ram). When I wait for Mac to cool down, results actually
become a bit more consistent.


>
> These were just some random thoughts I had. Not necessary to stick to
> these or follow each and every one. Just try different ways. Guest
> user might be a good option (not sure though).
>
> I would say if these things (i.e., reboot/fresh install of OS) impact
> the timeline if you end up running tests for several days or weeks, I
> would say then just run acceptable number of tests and when you feel
> 'Ya, this is the sweet spot', you may skip extensive tests (like
> rebooting system for each and every port 10 times or so).
>

Currently I only tested gettext port [2], and you would see the time taken
increases with subsequent runs(in case of modified trace mode).
Unmodified I tested later, which is why results stay consistent maybe, and
the 6th test(which I didn’t count in avg results), I took that after
waiting for Mac to cool down. Also this time for tests, I didn’t count
distfiles fetching time because the time taken for that varies a lot.


>
> Also, I would rather not do fresh install of OS on personal laptop
> each time I want to test a port. I was thinking more on terms of a
> Sandboxed environment or a VM(?) which has same setup and resources on
> each run, and which can be easily spinned up and deleted (in matter or
> minutes, in parallel, or whatever). Like a cloud platform as a service
> kind of thing, if it helps. It's easier if you can run these tests on
> Linux, since cloud has linux servers, but I don't believe major cloud
> providers provide MacOS VM's or anything of that sorts.
>
>
I searched about VM’s but generally I am not sure of it. Some places it
mentions something like its illegal to create iso file of macOS. Some
places it tells its legal so I am pretty much unsure to do that or no. For
cloud services I found this [3], although if the need for cloud services
arises, I would still need to make sure that if they would allow me to use
sudo privileges before getting a plan for a month.

Currently I feel at the least after rebooting and waiting for Mac to cool
down, results become consistent to a really nice extent.
For testing on hdd and fusion drive, I will probably have to rent a Mac
from somewhere.

One question, can I setup MacPorts base on an external hdd? I tried setting
up one but it asks for specifying something like —build, —host ,—target.

[1] https://setapp.com/how-to/how-to-fix-an-overheating-mac
[2]
https://docs.google.com/spreadsheets/d/1ksj3Fex-AnTEU4f4IRzwUkTpN4XfUye-HqSdZwXOsKs/edit#gid=0
[3] https://www.macincloud.com

Thanks for the help,
Mihir


Re: Slack-like chat (also for GSOC)

2019-06-14 Thread Nils Breunese


Chris jones wrote:

> ( The main problem I have, is not something specific to Riot or whatever 
> forum we use, but more we need to perhaps give some thought to the channels. 
> Currently there is really only one, which gets (some) chat but also messages 
> for each and every commit. I don't think the average user needs to see each 
> and every commit. I think we probably need to split these, for instance into 
> Dev and User channels, to match the mailing lists perhaps. )

I’d go even more granular: create a dedicated channel for commits and everyone 
can decide whether to follow that or not.

Nils.


Re: Slack-like chat (also for GSOC)

2019-06-14 Thread Chris Jones




I would say we can try Riot (also mentioned earlier by Rainer), as I
see lot of pros put up for it in [1]. We can also explore discord as
an option. If none works, we can stick to IRC.


I've been trying Riot out for a bit now, since this discussion started. 
Its not bad I would say..


( The main problem I have, is not something specific to Riot or whatever 
forum we use, but more we need to perhaps give some thought to the 
channels. Currently there is really only one, which gets (some) chat but 
also messages for each and every commit. I don't think the average user 
needs to see each and every commit. I think we probably need to split 
these, for instance into Dev and User channels, to match the mailing 
lists perhaps. )


Chris