Re: UNIVERSAL_ARCHFLAGS

2020-02-09 Thread Mihir Luthra
> Why would a universal library be incompatible with building for a single
> arch? What is the actual failing command?
>
>
I went through the old travis builds again. it was because I added arch
flags to a variable named `CFLAFS` which being a typo never got added to
CFLAGS.

The issue is now resolved when using -s flag with ar. [1]

[1]
https://travis-ci.org/macports/macports-base/builds/648249435?utm_source=github_status_medium=notification

Thanks,
Mihir


Re: UNIVERSAL_ARCHFLAGS

2020-02-09 Thread Mihir Luthra
Hi,

Creating a static library from fat object files works fine as long as
> there's an index. That means using the -s option with ar, or running
> ranlib after creation. The usual command is "ar crs .a *.o".
>
>
I was making a static lib that can be used by both darwintracelib1.0 and
pextlib1.0.

darwintracelib1.0 uses UNIVERSAL_ARCHFLAGS in its build and pextlib1.0
doesn't.

So, on Catalina, the build works as UNIVERSAL_ARCHFLAGS resolve to x86_64
only.
But on 10.13, build fails as library is incompatible,

Probably, for that reason sip_copy_proc.c is copied from pextlib1.0 to
darwintracelib1.0 so that it is again separately compiled
as per darwintracelib1.0 flags.

I was thinking to add UNIVERSAL_ARCHFLAGS to pextlib1.0's build as well so
that the static lib is compatible with
both. I don't see what would be the issue if pextlib1.0 gets built for
multiple -archs.

Will it be okay to do so?


MIhir


UNIVERSAL_ARCHFLAGS

2020-02-08 Thread Mihir Luthra
Hi,

There is something I am not able to understand in base.

Looking into aclocal.m4, based on the version of  macOS, supported archs
are being determined. Based on them -arch flags are constructed.

For example, if using macos 10.13,
UNIVERSAL_ARCHFLAGS= -arch x86_64 -arch i386.

Would adding them to CFLAGS and compiling mean that the result would be
compatible with both x86_64 and i386? Or it it overriding the previous flag?
Doesn't it automatically default to the arch as per the os version without
using -arch flags?

As for what it makes sense, I assume its producing library to be
compatible with both x86_64 and i386. It is being used in darwintracelib1.0
and darwintracelib needs to be compatible with all supported archs on OS as
it is getting injected into the installation. How important is it to
support i386 as for now? Will it be okay if i386 support is withdrawn and
-arch is specified as x86_64 only?

Is there anyway to produce static libraries like this?
The best option I find is libtool -static -arch_only  but that works
only if it's a single argument.

Thanks,
Mihir


Re: right way to add a library to src/

2020-02-07 Thread Mihir Luthra
> Without taking a closer look, I think there is only one other file that is
> shared between darwintrace and pextlib. Unfortunately, there is no good
> solution
> to this in the current structure. What it currently does is to copy the
> *.c file
> around during compilation, but I think this is a really horrible
> workaround.


> I think the better option would be to have a new subdirectory for a
> library with
> the shared code that is then linked (statically) into both darwintrace and
> pextlib.
>

Yes, doing that with a static library was the final solution.

Meanwhile, I tried many ways to somehow get it working using a shared
library, maybe because with shared lib things were much clean and compact.
But the issue was darwintrace code would intercept sys calls in our shared
library as well which would make things even more complicated and slower.

Thanks,
Mihir


Re: GSOC mentor candidates

2020-01-29 Thread Mihir Luthra
Hi,


> In order to apply for GSOC we need to publish an up-to-date idea list.
>
> MacPorts base make a significant (80%?) portion of the ideas list, so
> it would be nice if we could make it clear as soon as possible whether
> we have ideally two mentors willing to mentor base projects.
>
> Marcus, Sat, or anyone else: please raise your hands if you would be
> willing to participate this year as mentors, else we need to archive
> those ideas and only leave a much shorter list.
>
>
I am willing to help with the following 2 as a co-mentor if needed:

1) Fakeroot Functionality

2) Auto detection of build dependencies


Mihir


right way to add a library to src/

2020-01-16 Thread Mihir Luthra
Hi,

I was trying to add some code to a new directory in macports-base/src/ dir
such that I can use it in both porttrace.tcl(as a tcl cmd) and
darwintrace.c.

Basically it is a library that needs to be dynamically generated(because it
uses `__attribute__((constructor))`)

I see 2 choices:

1) To generate .o files in the new directory and use those in
darwintracelib1.0 and pextlib1.0. As both darwintrace and pextlib are
ultimately dynamic libraries, it should work I suppose.

2) To create a separate dylib file which I am not sure how to handle
linking to both darwintrace code and pextlib. Like what are the standard
lib storage locations for macports src? Should I simply add it to tcl
package path and link it in Makefiles of both darwintracelib and pextlib?

What would be the right way to do this? Also, is there any reference on how
to conform to macports Makefile structure?

Mihir


gcc/g++ failures after xcode11 update

2019-09-23 Thread Mihir Luthra
Hi,

After the xcode update, there have been many question on stackoverflow
regarding gcc and g++ linking fails. Any ideas on what can be done?

https://stackoverflow.com/questions/58072318/cannot-link-any-c-program-with-gcc-on-mac-mojave

https://stackoverflow.com/questions/58073301/linker-error-when-trying-to-use-lzma-in-boostiostreams-from-macports

https://stackoverflow.com/questions/58071057/macports-g-fails-to-find-headers-after-recent-xcode-update

Mihir


Re: Feedback request regarding speed optimisations in trace mode

2019-06-22 Thread Mihir Luthra
Hi,

>
> With modern macOS and modern hardware there are performance effects that
> you can't control, perhaps most significantly thermal throttling, but
> also background jobs like Time Machine backups and iCloud sync.
>
> You need to apply a little bit of statistical theory. Standard error
> increases with variance, but can be decreased by increasing the sample
> size. Calculate what sample size is needed to bring the standard error
> down to an acceptable level.
>

I will try to get more samples this week. [1]. Till now I made so many
changes to code that I had to reconduct tests.
Also, this seems to be right that many ports need to be tested. Even if I
test a port 5 times in a row(everything else closed) and then I again
conduct
testing 5 times in a row for the same port some other time, the results
vary pretty nicely. Generally that applies mostly for real time. sys and
user time show differences, but not that big.
Surprisingly, while testing this week I noticed __darwintrace_setup() took
more time than __darwintrace_is_in_sandbox().
Generally I was thinking registry querying is the slowest of all but as per
the times that I printed, it turns out that wasn’t the case.

>
>

> I don't think there is much value in going to so much effort to ensure a
> completely cold cache before starting. Close other apps yes, but don't
> reboot or reinstall everything. Since we don't have control over disk
> cache misses, and a warm cache is more likely in the real world anyway,
> it's common practice to pre-warm the cache by doing one run before you
> start recording results.
>
>

[1]
https://docs.google.com/spreadsheets/d/1ksj3Fex-AnTEU4f4IRzwUkTpN4XfUye-HqSdZwXOsKs/edit#gid=0


Re: Feedback request regarding speed optimisations in trace mode

2019-06-14 Thread Mihir Luthra
Hi,


>
> Not necessary to stick to 10 runs, just because I said so. It can be
> even 3 or 5 runs. Whichever you feel seem to be good enough to give
> some insights. E.g., you might feel that after 3 runs you are getting
> constant time and not much difference, in another case you might get
> all different values even on running it more than 3-5 times.
>

That gave me some good insights. Rebooting surely keeps results consistent
to some level. Also, I noticed with subsequent runs, it would slow down(not
much, but does). After rebooting also only one thing remains consistent
which is the temperature of my Mac. I searched google[1] and it mentions
that as the Mac heats up more, the fans start running fast and processes
slow down. Generally Mac heats up to a nice extent when I try installing
ports continuously(even after reboot). Also my Mac has a weird
configuration(2012 mid edition, dual core, running Mojave with a WD ssd and
8gb symtronics ram). When I wait for Mac to cool down, results actually
become a bit more consistent.


>
> These were just some random thoughts I had. Not necessary to stick to
> these or follow each and every one. Just try different ways. Guest
> user might be a good option (not sure though).
>
> I would say if these things (i.e., reboot/fresh install of OS) impact
> the timeline if you end up running tests for several days or weeks, I
> would say then just run acceptable number of tests and when you feel
> 'Ya, this is the sweet spot', you may skip extensive tests (like
> rebooting system for each and every port 10 times or so).
>

Currently I only tested gettext port [2], and you would see the time taken
increases with subsequent runs(in case of modified trace mode).
Unmodified I tested later, which is why results stay consistent maybe, and
the 6th test(which I didn’t count in avg results), I took that after
waiting for Mac to cool down. Also this time for tests, I didn’t count
distfiles fetching time because the time taken for that varies a lot.


>
> Also, I would rather not do fresh install of OS on personal laptop
> each time I want to test a port. I was thinking more on terms of a
> Sandboxed environment or a VM(?) which has same setup and resources on
> each run, and which can be easily spinned up and deleted (in matter or
> minutes, in parallel, or whatever). Like a cloud platform as a service
> kind of thing, if it helps. It's easier if you can run these tests on
> Linux, since cloud has linux servers, but I don't believe major cloud
> providers provide MacOS VM's or anything of that sorts.
>
>
I searched about VM’s but generally I am not sure of it. Some places it
mentions something like its illegal to create iso file of macOS. Some
places it tells its legal so I am pretty much unsure to do that or no. For
cloud services I found this [3], although if the need for cloud services
arises, I would still need to make sure that if they would allow me to use
sudo privileges before getting a plan for a month.

Currently I feel at the least after rebooting and waiting for Mac to cool
down, results become consistent to a really nice extent.
For testing on hdd and fusion drive, I will probably have to rent a Mac
from somewhere.

One question, can I setup MacPorts base on an external hdd? I tried setting
up one but it asks for specifying something like —build, —host ,—target.

[1] https://setapp.com/how-to/how-to-fix-an-overheating-mac
[2]
https://docs.google.com/spreadsheets/d/1ksj3Fex-AnTEU4f4IRzwUkTpN4XfUye-HqSdZwXOsKs/edit#gid=0
[3] https://www.macincloud.com

Thanks for the help,
Mihir


Re: Feedback request regarding speed optimisations in trace mode

2019-06-13 Thread Mihir Luthra
Hi,


> I would like to see the time for each run (if 10 runs, then 10 columns
> i.e., xx_run1, xx_run2, ...), rather than only average of them.
>
> Collect as much as insights we could, maybe we find some pattern or
> something that might help us (not sure what though). Since I have a
> feeling, we might get better performance in one run and not so good in
> another.
>

Thats true, that’s what troubling me the most. Before testing I read on
stack overflow that sys+user time is always correct,
but the most disturbing factor was that same port gives results significant
differences on multiple runs. Like in case of gettext + reps, sometimes I
get time as high as 13 mins and sometimes 11:30. Differences are also
noticeable in sys and user times.
Also I always made these tests on a completely new installation of the base.

>
> I notice that some ports take less time with modification and some
> ports take more time. Just want to make sure it happens in multiple
> runs.
>

I have made some more improvements since the last time, This time will make
tests on every port for 10 times and upload on spreadsheet.

>
> In an ideal scenario, I would prefer a clean environment (can be a
> fresh install of OS, bare minimum apps running, reboot before each
> run, etc.) to run each test (containers maybe? or some CI like travis)
> which might not be affected by other processes running in background.
>
> I didn’t reboot the system last time when I tested, but I had all my apps
closed and only 2 tabs open in terminal, 1 for modified base and 1 for
original. Also I didn’t make any runs simultaneously. This time I will
reboot after each run, but testing like that for sure Is gonna take some
days for a good number of ports.
Fresh install of OS probably would be hard for me, can’t it be like a guest
user.

Thanks for the help,
Mihir


Re: sqlite3 database in macports

2019-06-13 Thread Mihir Luthra
Hi,

>
> The database lives on disk in ${portdbpath}/registry/registry.db. It has
> to, or we wouldn't have the D in ACID [1].
>
>
Probably then it should make a huge difference in hdd or fusion drives. All
tests I made were on ssd.
Also I tried connecting an external hdd, but when I tried installing
MacPorts base on it, while configuring
it gave me errors saying something like specify —build, —target and —host.


> An SQL database is likely not optimal for the access pattern of trace
> mode. Using another way of looking up file ownership would be a good
> direction to explore. Perhaps also look at doing more caching if
> possible, of negative results as well as positive.
>

In general what I found is, like in port gettext, out of 400,000 calls to
the sever, 50,000 calls need to get queried through registry.
Remaining 350,000 are always allowed or always denied prefixes such as /bin.
Till now I was simply caching any path returned from
__darwintrace_sandbox_check(). Before I never noticed 350,000 times server
is asked only for prefixes. This led to many cache miss in case of prefix
searches.
When I do caching of all this data, making shared memory capable of
handling prefixes also. only “300” calls actually go to server. This
actually means most of the time socket doesn’t even need to get setup.
Doing this, gettext took (real:11m.9.701s, user:12m47.990s,sys:6m51.499s)
which is pretty much an improvement compared to last time.

Although due to __darwintrace_setup(), being called every time before path
lookup, setup gets done every time for no reason. Also path normalisation
code is mixed up with path lookup . I guess I should try and modularise
them somehow before next testing.


>
> And most importantly, do some profiling so you know what is actually
> taking the time. Even if you just print out a bunch of timestamps at
> different places you at least know something.
>

That really helped.


>
> - Josh
>
> [1] https://en.wikipedia.org/wiki/ACID_(computer_science)


Thanks for the help,
Mihir


sqlite3 database in macports

2019-06-11 Thread Mihir Luthra
hi,

i just wanted to ask a small question.
Is the sqlite3 database used in macports in memory? I don't have a nice
knowledge about databases. The main question concerning me is the time
taken by darwintrace calls to tracelib and querying of registry. This may
make a lot of difference if called on a machine with ssd, hdd or fusion
drive. Not really sure about this.

Regards,
Mihir


Fwd: Feedback request regarding speed optimisations in trace mode

2019-06-09 Thread Mihir Luthra
Hi,


> This is certainly an improvement. How does it compare with running the
> same builds without trace mode? The ideal scenario would of course be to
> have trace mode incur only a barely noticeable performance penalty.
>

Kindly check out this link. I made the comparisons again for those ports.(I
was stupid enough to not note down the comparisons last time)
Also, I made some changes to the code since that mail, for which I also
made a PR.

https://docs.google.com/spreadsheets/d/19OpjSl9Ys47Gp0vkwJ7Yi2b6x3JiRlI_EQm4-FUqZYc/edit?usp=sharing

I will keep adding any new data to these sheets whenever I make more tests.
Also in case of some ports like db48 and glib2(+deps), optimisations didn’t
work well. I will test ports cmake, mariadb, and some more
which I tested last time as well and they showed nice improvements. I will
keep updating data along with tests.

I could figure out 2 points where I can make the library better
1) Darwin trace not always checks the registry. Like sometimes the case is
of an always allowed/denied prefix such as /usr/include(always allowed) or
/usr/local(always denied).  Despite of this fact, shared memory would store
complete paths. This point can be made much better if shared memory stores
just the prefix in such cases and on the last character of prefix it marks
* or some character to indicate all paths with this prefix are allowed.
Also this happens really frequently. So maybe after doing this,
optimisations will improve.
2) As the data struct for shared memory is a extension of a trie, each node
stores huge arrays for every possible characters. It is possible for a path
to contain any character from unicode, so making an array of size 256  and
traversing through path by utf-8 representation is possible. But its rare
to see chars 127-255 in paths and this leads to a huge wastage. So shared
mem just doesn’t store such paths. Similar is the case for chars 0-32 which
are just like carriage return, esc, etc Probably they can also be filtered
out somehow. Because there are lots of nodes getting inserted, reducing
this array size ends up reducing insertion time to a large extent.

Also thanks for providing me with the help.

Regards,
Mihir


Re: Feedback request regarding speed optimisations in trace mode

2019-06-09 Thread Mihir Luthra
Hi jan,


> How exactly were these numbers obtained? Is it one run?
> An average of ten runs? All following a complete distclean?
> Is it the "real" time as reported by time(1) or somethin else?
> What are the other times reported by time(1), as in
>
> $ time sleep 5
> 0m05.01s real 0m00.00s user 0m00.00s system
>

The time I sent that mail, I was mainly testing same ports repeatedly to
rule out the bugs. So I just took an average of them.
I have made updations to the code since the last time [1].

I tested those ports + some others too last night, you can find the results
in the link below.

https://docs.google.com/spreadsheets/d/19OpjSl9Ys47Gp0vkwJ7Yi2b6x3JiRlI_EQm4-FUqZYc/edit#gid=0



Thanks for the help.

[1] https://github.com/MihirLuthra/macports-base/tree/dtsm-darwintrace

Regards,
Mihir


Re: Speed up trace mode

2019-06-08 Thread Mihir Luthra
>
> Hi Clemens,

I have added a readme to the repository with the latest updates.[1]
I have attempted to explain the code the best I could.

The code functions correctly in the macports-base for the 57 ports that I
tested.[2]
The speed improvements are decent as per me. In real I don’t know the
expectations so can’t comment on how good it is.

Also, if there is anything I can do other than readme, please let me know.

[1] https://github.com/MihirLuthra/dtsharedmemory
[2] https://github.com/MihirLuthra/macports-base/tree/dtsm-darwintrace

Regards,
Mihir


Feedback request regarding speed optimisations in trace mode

2019-06-06 Thread Mihir Luthra
Hi everyone,

I have been working on trace mode optimisation project.
I have added the functionalities to improve the speed. [1]
I tested it on some ports, and it works stably.

Here are some comparisons I made on my machine:

In Original Code
Port gettext (+ deps :ncurses, libiconv, gperf) : 13 mins
In modified code 11 mins

In Original Code
Port glib2 (+ deps :autoconf, automake, bzip2, libedit, libtool, pcre,
pkgconfig, zlib) : 9:50 mins
In modified code 9:30 mins

In Original Code
Port db48 : 2:45 mins
In modified code 2:30 mins

In Original Code
Port perl5.28 (+deps : gdbm, readline) : 7:35 mins
In modified code 7:20 mins

I wanted to know if the speed improvements meet the expectations?
Kindly share some feedback regarding this.

[1] https://github.com/MihirLuthra/macports-base

Regards,
Mihir


openssl source install with trace mode bugged

2019-06-03 Thread Mihir Luthra
Hi,

I noticed that on a “new" MacPorts installation, if we try to install
openssl with flags   -st, it will fail.

Problematic lines in main.log with debug on are :-

:info:configure darwintrace[30583:0x10fc915c0]:
posix_spawn(/opt/original-base/var/macports/sip-workaround/502/usr/bin/perl5.18)
= 2


:info:configure perl: posix_spawn:
/opt/original-base/var/macports/sip-workaround/502/usr/bin/perl5.18: No
such file or directory


:info:configure Command failed:  cd
"/opt/original-base/var/macports/build/_opt_original-base_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_devel_openssl/openssl/work/openssl-1.0.2s"
&& ./Configure --prefix=/opt/original-base -L/opt/original-base/lib no-krb5
--openssldir=/opt/original-base/etc/openssl shared zlib darwin64-x86_64-cc


:info:configure Exit code: 1



Also this won’t occur if I install perl5.28 port separately. It doesn’t
depend on that port but configuring in port perl5.28 does certain steps
that makes installing port openssl possible. So even if I terminate
installing of port perl5.28 after it is done configuring, it would be
possible to install port openssl.

I tried checking main.log with debug enabled but it crashes possibly due
default logging being in stderr and causing interference with port
install. I tried changing default logging location but doesn’t seem to work
for me.


It can be a machine specific problem. Can anyone else confirm this?


Regards,

Mihir


Re: Speed up trace mode

2019-06-02 Thread Mihir Luthra
Hi,

I have implemented the points that we discussed till now in macports-base.
You can find the updated macports-base in my forked repository.

https://github.com/MihirLuthra/macports-base

I also have made many modifications to library to optimise it for use in
base. I haven’t updated in the readme or comments about those so I made a
separate repo for the time being.
Here is the new code:
https://github.com/MihirLuthra/sm_mp

Currently I am deleting the shared memory file after every phase and
creating new.
 I am not sure on this but I think it should be ok to share same shared
file among all phases of a single port because it looks like every next
phase just expands the existing sandbox. Although I am not sure about this.

Also, I am just storing data for paths containing ascii char 32 to 128.
Storing in a utf-8 fashion would just double the time taken by library in
insertion and paths rarely contain characters outside range 32-128.
Although its easy to change anytime if needed.
I am taking 50MB as initial file size and generally ports don’t take above
this and if they do memory will expand.

It would be great if you can test with 1 or 2 ports and tell me if the
speed improvements you were looking for are achieved or not.
I tested on few ports and it seems to be working okayish. Probably some
more improvements would make it better.
Along with dependencies I tried installing gettext, glib2, db48.

Like when installing gettext(along with dependencies nurses, libiconv and
gperf) it takes 11 - 11:30 mins with library and without library it takes
around 11:30-12:30. I don’t know how to make the analysis of speed
improvements in correct way, so I have just used the `time` command.
Although I belief I still need to do some changes which may reduce time
taken to some more extent and probably would be much better after I have
your suggestions regarding the code.

Also when source installing openssl port, i need to install perl5.28 by
myself otherwise it crashes using perl5.18(says no such file or dir). That
maybe my machine specific problem or maybe problem caused by my code.
Although all other ports I tested worked fine.

Regards,
Mihir


Re: Speed up trace mode

2019-05-29 Thread Mihir Luthra
Hi Clemens,

The library seems to be working good now in the macports-base.
Although I wanted to know till what extent should I let the data be shared?
Like just among a single phase? or all the phases of a port installation
can share common trace mode data?
Sandbox gets set after every phase, so will that mean I should reset shared
file every phase?
It would actually be great if data collected in configure is available for
build time and generally  I saw in log sandbox is mostly just upgraded as
phase moves ahead. Is that so?

Regards,

>


Re: Speed up trace mode

2019-05-25 Thread Mihir Luthra
Hi Clemens,

Probably I solved some issues from before. One general reason I found was
that my file descriptor was getting closed by some process.
I handled that by checking in close.c that the fd getting closed is not
referencing my files. This has made some errors to vanish. Installation of
some ports works smoothly and for some doesn’t. Till what I figure out, I
need to handle in exec, posix_spawn and dup2 possibly. Am I correct on
that? Is there somewhere else also where I need to make checks?
Also I had a question in my mind, recently I read that mmap(2) is not
async-safe, so if the library interposes a func which is async safe and was
called in a signal handler, will it be okay?

Regards,
Mihir


Re: Speed up trace mode

2019-05-21 Thread Mihir Luthra
Hi,

I have made many improvements to existing code.[1]

As for the latest details, in test file [2], where I have created 4 threads
and inputted 18,228 strings as arguments, it took
0.44 seconds for both insert and search 18228 threads at the same time and
3.5 MB memory (which is 4 times less as per what it was before).
When I did insert and search separately, insert takes 0.4 seconds and
search takes 0.08 seconds.(18228 strings)

Also when I tried running the code into MacPorts-base, it installs the port
to the very end, but when it
“port activate” starts, it says "Backtrace: Image error” and “cannot lstat
it”.

insert and search are probably working correctly because for testing I
compared results from search() and __darwintrace_is_in_sandbox()
and the results were always the same and I didn’t find any other seg faults
or any other errors in general.

“:error:activate Failed to activate gperf: Image error: Source file
/opt/local/var/macports/software/gperf/mpextract8J7hl6Re/opt/local/bin/gperf
does not appear to exist (cannot lstat it).  Unable to activate port gperf.”

The log files due to my outputs and darwintrace debug became way too large
and pastbin is not allowing me to host them there.
Should I send log files without Darwin trace debug enabled?


[1] https://github.com/MihirLuthra/sharedMemory
[2]
https://github.com/MihirLuthra/sharedMemory/blob/master/test_with_4_Threads.c

>


Re: Speed up trace mode

2019-05-15 Thread Mihir Luthra
On Mon, May 13, 2019 at 2:19 AM Clemens Lang  wrote:

> Hi Mihir,
>
> In my experience, it will make it much simpler to review the entire code
> if you expand your repository with a README file which gives a (textual)
> rough overview of how the code works, possibly with a link to the paper
> you implemented. I'd also advise extensively commenting your code to
> explain why you do things the way to implemented them, but from a quick
> glance, it seems you already did that to a certain extent.
>

https://github.com/MihirLuthra/sharedMemory

I made a readme for the repository just as you suggested. Although if it
lacks to be understandable, I will update it accordingly as you say. I have
made the code extensively commented now and refactored it to a nice extent
I guess.

Also when reading code I suggest reading “sharedmemory.h” first and then
“sharedmemory.c” will make the comments more understandable.

Although this is the first time I coded multi-threaded program, and now I
realise that it really takes long to remove a small bug. But now I belief
the code is bug free and works completely correct.
Two extra things which I plan to do is reference counting for munmap(2)
after expanding and utilisation of wasted space in file(which will drop the
memory usage to almost half of what it currently is).
I have planned for them and probably will do these two in 2-3 days more.


>
> You can further abstract using a separate SharedMemoryManager object per
> thread if you use the pthread API. See for example
> __darwintrace_sock_set and sock_key in darwintrace.h and darwintrace.c.
>
> This should allow you to write a get_memory_manager() function that will
> automatically give you a thread-specific SharedMemoryManager object or
> create one if none exists for the current thread yet.
>
>
I saw the code in darwintrace.c.
Probably yes, I can call insert() and inside it I could call
get_memory_manager() which checks thread specific storage for it and it
makes the code pretty clean too.
Is that the way you mean?

Also, this reminded me of one more thing, if I implement search() in
open.c, access.c etc. I need not call __darwintrace_setup() right?
Like is there any other important thing darwintrace calls do which I need
to do separately after search() or can I straight away skip to the end
according to search() results?


>
> It's probably a good idea to move the path normalization to a separate
> function, yes. Note that the current does not correctly deal with
> directory symlinks. For example, trace mode does not recognize that
>
>   /opt/local/libexec/qt5/include/QtCore/QByteArray
>
> is a file installed by the qt5-qtbase port, because
>
>   /opt/local/libexec/qt5/include/QtCore
>
> is a symbolic link to
>
>   /opt/local/libexec/qt5/lib/QtCore.framework/Versions/5/Headers
>
> Once path lookup is fast, we could actually modify the normalization
> code to also check folder symbolic links inside the MacPorts prefix
> against the database of ports.
>
> This would require intertwining the path lookup with path normalization.
> Keep that in mind for later, but don't do it now.
>
> Got it. Will do once this shared memory functionality is complete.

>
>
>
> On Wed, May 08, 2019 at 04:24:28PM +0530, Mihir Luthra wrote:
> > I figured out the right ways to use my files in code. Can you explain
> > me why is it causing these errors.[1]
> >
> > Just before __darwintrace_is_in_sandbox_check returns path permission,
> > I captured it in a bool variable and inserted the normal path in the
> > shared file. Each time it prints insertion successfull, so probably I
> > feel the extra code I inserted didn’t’ crash, but if you can give me
> > some idea what possibly could have gone wrong I maybe able to debug
> > it.
> >
> > [1] https://pastebin.com/1t7a0GJq
>
> :info:extract sip_copy_proc:
> mkdir(/opt/macports-test/var/macports/sip-workaround): No such file or
> directory
>
> We use this directory to create a copy of system binaries that we
> wouldn't be able to inject into because they have the magic
> "unmodifiable system file" bit introduced by Apple in 10.13. Normally,
> typing 'sudo make install' in your macports development copy should have
> created this directory (see doc/base.mtree.in), but that doesn't seem to
> have happened for you. You should be able to create the directory
> manually and chmod it to 1777.
>

Got it. That error is gone now.

>
>
> > 2) When I install copies with multiple names, i mean I tried with
> > prefixes macports-test2, macports-test3 but just as like I sent you
> > the log file last time, it actually reads hardcoded “macports-test”
> > somewhere which I am unable to understand.
>
> You'll have to re-run ./configure wi

Re: Speed up trace mode (GSoC Project)

2019-05-05 Thread Mihir Luthra
>
>
>
>
> From what I understand from the stackoverflow post you're right that
> cmxpchg16b will not give a consistent view of the 16 bytes of memory
> across multiple NUMA nodes. However, maybe two 4 byte values right next
> to each other would be sufficient for your use case and could then be
> casted to a 8 byte values for CAS?
>

Thats a great idea. The offset shall never go out of range of 4 byte but
still I will put a check there for ensuring. I will update you when I am
done with testing on main base code.

Regards,
Mihir


Re: Speed up trace mode (GSoC Project)

2019-05-01 Thread Mihir Luthra
Hi Clemens,

For making my implementation of shared memory data structure more space
efficient, I was trying to implement a stack which stores offsets to unused
locations in the shared memory file. But as stack is being shared it also
needs to be edited in a lock free way. While editing stack I need to
atomically CAS both top of stack and the element on it. For this I found
double compare and swap. Also, in my data structure at one point I need
DCAS as well to ensure correct editing.
To implement DCAS I came across some instruction “cmpxchg16”. But I think
its still not as per my need. [1]

Do you know any alternative with which I can DCAS atomically or anything
which atomically checks 2 old values before replacing value at an address?

[1]
https://stackoverflow.com/questions/7646018/sse-instructions-which-cpus-can-do-atomic-16b-memory-operations

Regards,
Mihir


Re: Speed up trace mode (GSoC Project)

2019-04-28 Thread Mihir Luthra
Hi Clemens,


> What's your current progress? Do you have some code already?
>


I made a complete offset based ctrie implementation in which any process
can insert and search on basis of a shared memory. Kindly provide me with
your views on it :)

All main code is in [1].
The header file contains all definitions.[2]

Shell script [3] runs the makefile and executes [4].

[4] is a basic test file which takes strings as arguments and all those
strings are attempted to be inserted into ctrie by different threads.

initSharedMemory() does all initialising of shared memory and mmaping and
returns a SharedMemoryManager type struct.
insert() and search() need that struct as an argument from the calling
process along with string to be inserted and permission.

I have tested it with few threads working concurrently and inserting few
strings.

This works as intended till now.
The implementation still needs to be made space efficient and needs to
increase memory size if we run out of it.



[1]
https://github.com/MihirLuthra/offsetBasedCtrie/blob/master/libpathSearch.c
[2] https://github.com/MihirLuthra/offsetBasedCtrie/blob/master/PathSearch.h
[3] https://github.com/MihirLuthra/offsetBasedCtrie/blob/master/make.sh
[4]
https://github.com/MihirLuthra/offsetBasedCtrie/blob/master/pathSearchTest2.c

Regards,
Mihir


Re: Speed up trace mode (GSoC Project)

2019-04-24 Thread Mihir Luthra
Hi,

Sorry for the late response.

What's your current progress? Do you have some code already?
>

Currently I am done with a basic Ctrie implementation as suggested in the
arXiv paper that is capable of inserting and searching paths. The
implementation is still with pointers which I will convert soon to offsets
[1]
I have only tested the working manually till now and with a single thread.
I will soon make a better test file to test the ctrie when being used by
multiple threads.

[1] https://github.com/MihirLuthra/basicPathSearch

Regards,
Mihir


Re: Speed up trace mode (GSoC Project)

2019-04-21 Thread Mihir Luthra
Hi,

Thanks for the tips. ^_^
This is almost the same way as in the arXiv paper you shared with me.
As we are dealing with paths, I saw implementing the above hash function
would make the search faster as this happens quite a few times that files
with same name have same path length from the root. So to get information
about a file “abc.txt”, we just need to get the length of the path to this
file from the root. Suppose the length of string of path from root is 16.
Then the hash function takes the input “abc.txt” and makes it
“16-abc.txt”. Now in the tree, the first level categorises it as per the
path length, which just leaves the need to check the file name and in the
worst case where files with same name have same length of path from the
root, they are chained at the end of the tree.

Although I haven’t tested the time complexities yet but even if this
structure is not to be implemented, it won’t make much difference because
all we would need is to put the hash function and simply search for the
complete length instead of complete path which seems to be a quite easy
replacement.
I would stick with the basic structure which is less complex and tested and
will only try to switch and test the this when all other things are done,
that maybe a good idea.

Regards,
Mihir


Re: Speed up trace mode (GSoC Project)

2019-04-16 Thread Mihir Luthra
Hi Clemens,

Kindly provide your suggestions for this.
In the path search Ctrie data structure, I categorised the paths with the
hash function working like:

If I input a path /test/files/abc.h for check
Here we have open to abc.h, the hash function simply makes it “12:abc.h"
where 12 is the length of path "/test/files/" to abc.h
The first level of the categorises the paths by the length of their real
path.
After this the search is made as per the file name i.e. abc.h and at the
final node key-value pair(s) are matched.

1st level being categorised by real path length to the file reduces the
search time is most cases.
After that character by character, file name is checked.
Two files with same name having same length of path from the root seems
rare, but if any two have the same, chaining is done(which seems to be
appropriate here because of less number of such cases).


https://drive.google.com/file/d/16HCVUpljPSz1Wn4UEx_gYQ3qDg3JyfH_/view?usp=sharing

Regards,
Mihir


Re: Speed up trace mode (GSoC Project)

2019-04-12 Thread Mihir Luthra
Hi,

I needed some advise regarding ctrie implementation.

I was constructing the trie data structure which has to be operated from
the address space mapped by mmap(2).
In place of “array of next nodes”, I am using “array of offsets”.
the void * returned from calling mmap(2), I would type cast it as my struct
type.
When needing to write data, go to the last offset *(base + offset) and CAS.

I have doubts regarding “type” of offset, simply off_t or int.
As here we are mapping file memory into process address space, I m not
certain what “type” would be safe to use for offsets.


Regards,
Mihir


Re: Feedback Request for final GSoC Proposal [Speed Up Trace Mode]

2019-04-09 Thread Mihir Luthra
Hi,

Sorry for re-mailing. I know it's almost the last moment.
But if possible, please provide any possible feedback.

https://docs.google.com/document/d/14eSXwZ6N1vRcBudaJWlwO5G17dY5B1nHgfhh52tSnv0/edit#

Regards,
Mihir


Feedback Request for final GSoC Proposal [Speed Up Trace Mode]

2019-04-08 Thread Mihir Luthra
Hi everyone,

Kindly check my final proposal and provide any possible feedback.

https://docs.google.com/document/d/14eSXwZ6N1vRcBudaJWlwO5G17dY5B1nHgfhh52tSnv0/edit#


Regards,
Mihir


Re: Speed up trace mode (GSoC Project)

2019-04-08 Thread Mihir Luthra
On Tue, Apr 9, 2019 at 1:59 AM Clemens Lang  wrote:

> Hi,
>
> On Sun, Apr 07, 2019 at 01:03:12AM +0530, Mihir Luthra wrote:
> > > Can you pastebin a main.log of a build that fails like this?
> >
> > https://pastebin.com/FVdp4WTw
>
> The problematic lines are
>
> :debug:archivefetch failed verification with key
> /opt/local/share/macports/macports-pubkey.pem
> :debug:archivefetch openssl output: Verified OK
>
> which is very weird, because it says the verification failed, but it
> also says the OpenSSL output is "Verified OK", which says exactly the
> opposite.
>
> The command it runs is
>
>   openssl dgst \
> -ripemd160 \
> -verify \
> -pubkey /opt/local/share/macports/macports-pubkey.pem \
> -signature ncurses-6.1_0.darwin_18.x86_64.tbz2.rmd160 \
> ncurses-6.1_0.darwin_18.x86_64.tbz2
>
> Maybe you can reproduce this command on your system manually and see if
> it works? It seems this command will return a non-zero exit code for
> reasons I don't know.
>

I will surely check it tomorrow, as today is the last date of application
submission so improvising it as much as I can :)

Also can you please give a look to the proposal:

https://docs.google.com/document/d/14eSXwZ6N1vRcBudaJWlwO5G17dY5B1nHgfhh52tSnv0/edit?usp=sharing

I was editing the schedule content to fit it into table so its not
currently done.
Rest others, i.e. methodology and phases which contain all the main
information are done. Please consider giving some feedback. :)

 regards,
Mihir


Re: GSoC 2019 - trace mode improvements

2019-04-07 Thread Mihir Luthra
>
>
> read [1] and related thread and probably I am too late to contribute
> through
> GSoC. So I give up.
>
>
> /davide
>
> [1]
> https://lists.macports.org/pipermail/macports-dev/2019-April/040495.html


Hi ,

https://docs.google.com/document/d/15cVbH6f6hBr9HryJEHUZEbRsToN1BAjY8My-oRstO9A/edit#heading=h.ng3lyvrem2d1

This doc is really clumsy tbh, and more like just rough notes made by me.
Though the links to learning resources attached in the document under the
heading
“surfing through the code flow”
 are quite useful. That’s all needed to understand the code and
And this video by Clemens [ https://www.youtube.com/watch?v=46qshiDskrM ]
would be useful for understanding how the trace mode code gets activated
like when and where.

With 2 days left, maybe you can give this day to learning from those links
and understanding the code and next day on project proposal.

Worth a try!
We may work together helping each other. I on speed up trace mode (well if
I get selected) & you on auto detection of build dependencies and as you
have contributed previously, you stand a better chance compared to me on
getting your proposal accepted I guess.


Regards,
Mihir


Re: Speed up trace mode (GSoC Project)

2019-04-07 Thread Mihir Luthra
> That's unfortunately not the same, since while you're swapping the
> second value, a different thread could swap the first for a different
> value again.
>

I get your point.
Generally what comes to the mind in such circumstances is blocking the
other thread some way like mostly by spin locks.
But at the same time, if I use spin locks here, there is no point of using
compare and swap.
So need to figure out a better solution to change variables staying with
cas.

>
> That should be possible using atomic fetch-and-add [1]. This may end up
> increasing the storage by more than we need, but that's not a problem.
>
> The problem I see is that we actually need to do the truncation and
> remapping at some point. We should end up in a situation where the size
> is actually larger than the file/memory block.
>

I understand. Many checks can be put to avoid this but at last they will
make the main reason of this code which is optimisation useless.


> Unfortunately ftruncate(2) is all that we have, and without writing
> kernel code I don't see a way to do this.
>

Actually I had kind of interpose to  __dt_ftruncate() and __dt_mmap in mind
that time. I should be more precise in words, sorry for that.
These would allow us to  make the checks beforehand but the same thing
again, doing so much would just erase the optimisation benefit we are
planning.
Although I am not sure about that because memory expansion would be
required very less if we predict to a good extent the memory required.


>
> Can you pastebin a main.log of a build that fails like this?
>

https://pastebin.com/FVdp4WTw

I guess now this was for some similar reason as path or co-existence.


>
>
> [1] https://en.wikipedia.org/wiki/Fetch-and-add
>
> Thanks.


Regards,
Mihir


Re: Speed up trace mode (GSoC Project)

2019-04-07 Thread Mihir Luthra
Hi,

I was trying to test the code by making changes but I am stuck with one
issue.
If I install from git and set it up I always receive this error message. [1]

1) I tried `make` on base code taken as it is from git without checking out
to latest version, it showed this error.
2) I tried `make` on v2.5.4, it still shows same error.
3) I tried source install, here the installation completes successfully and
mostly everything works fine except trace mode.( I didn’t make any changes
to source code)
I tried installing 7-8 ports in this case and whenever I used trace
mode, it said unable to fetch archive.


In case of git install, it seems to be some error of that macro
HAVE_DECL_RL_USERNAME_COMPLETION_FUNCTION.
Because only if it is false, it should move to next macro and choose to use
username_completion_function.
But in make errors, it says rl_username_completion_function is declared
somewhere and exists. So I guess  HAVE_DECL_RL_USERNAME_COMPLETION_FUNCTION
should have been defined. [2]


Regards,
Mihir


https://drive.google.com/file/d/0Bz1h_hHcxZVEbmdBa2Y3dzdNUVltaUV5b3ZURFJrT3VWQXZR/view?usp=sharing
  [1]
https://drive.google.com/file/d/0Bz1h_hHcxZVEZEdNWVdlVjd6djhVOWljYVVHSlE4azVjVTRZ/view?usp=sharing
   [2]


Re: Speed up trace mode (GSoC Project)

2019-04-05 Thread Mihir Luthra
Hi Clemens,

I see you're getting the hang of the difficulties of the project now :)
>

That’s really encouraging for me to know that you think so ^_^


> You have the right ideas to solve the problem. Do keep in mind though
> that CAS will only work up to a word size or a double word size at most,
> i.e. swapping more than 64 bit with CAS atomically is probably not going
> to work.
>

Got it. I may swap one by one instead if swapping the complete structure
and its just 4 variables.


>
> I'm not sure whether we will actually need a separate process to
> increase the allocation size. Consider this proposal:
>
>   struct shared_block_mgmt {
> size_t last_known_size;
> size_t refcnt;
> int blockfd;
> struct shared_block* block;
>   };
>
>   struct shared_block_mgmt block_mgmt;
>
>   struct shared_block {
> size_t size;
> size_t used;
> char memory[];
>   };
>
> This struct would mean that the first `used` bytes of `memory` are
> in-use by our data structure and everything after that up to `size`
> bytes is free[1].
>
> Any allocation request where used + request_size <= size would succeed
> and we could change `used` using CAS to ensure that this allocation
> operation is atomic.
>
> Any allocation request where used + request_size > size would trigger
> growing the memory area. Growing would calculate the size we want to
> grow to, let's call this `target_size`. Then, it would attempt to grow
> atomically using:
>
>   size_t local_size;
>   size_t local_used;
>   size_t target_used;
>   size_t target_size;
>   bool needs_resize;
>   do {
> local_used = block_mgmt.block->used;
> local_size = block_mgmt.block->size;
> target_used = local_used + request_size;
> needs_resize = target_used < local_size;
> if (needs_resize) {
>   // growing required
>   target_size = local_size + BLOCK_GROWTH;
>   ftruncate(block_mgmt.blockfd, target_size);
>

What I was doing is to keep a file created beforehand and append it
whenever needed to the existing file.
ftruncate() is much better & cleaner approach I guess.


> }
>   } while (
>   (needs_resize &&
> !CAS(_mgmt.block->size, local_size, target_size) &&
> !CAS(_mgmt.block->used, local_used, target_used)) ||
>   (!needs_resize &&
> !CAS(_mgmt.block->used, local_used, target_used)));
>
>   // At this point, either resizing was not needed and block->used is
>   // what we expect it to be (i.e. allocation succeeded), or resizing
>   // was needed, and we did successfully resize, and after resizing we
>   // did update block->used (i.e. allocation also succeeded).
>
> This will oportunistically call ftruncate with the bigger size on the
> mmap'd file descriptor, but calling ftruncate in multiple processes at
> the same time should not be a problem unless the truncation would ever
> shrink the file (which we can not completely avoid, but make very
> unlikely by making BLOCK_GROWTH


Maybe, we can call ftruncate outside the loop and just set size first.
I dunno if it is possible but if it is we can modify compare and swap to
behave in the following way :
If the process detects that used + request_size > size and area needs to be
grown, instead of swapping the size with
its new value, we can just increment the old value without caring what is
the old value by ‘requested_size’.
Now in this type of swapping, we just don’t need to care what is old value,
all we do is increment requested size to it.
So if 2 or more process do this simultaneously, they just end up
incrementing the memory correctly. And to keep a margin we can add some
extra_growth to each.
Although I m not sure if atomic operations allow this,
a better approach maybe to just share a variable called maxFileMemory which
can only be swapped if the new value being provided is greater.
Before mmaping for write or read we may just ensure if current file size
matches maxFileMemory.
This maybe would prevent shrinking.

Or maybe we somehow can block shrinking by implementing our own ftruncate
or some other way maybe.

I guess I will need to think of them more briefly about cases and
everything to get it in right words I guess.


> sufficiently large so that any pending
> extension requests would be processed before another extension is
> required).


>
You correctly identified that we'd have to re-mmap(2) the file after
> this operation, which is why I've included last_known_size in struct
> shared_block_mgmt. If last_known_size differs from the size stored in
> struct shared_memory, that would trigger creation of a new struct
> shared_block_mgmt with a fresh mmap with the larger size. We cannot
> unmap the old mapping until all threads are no longer using it, so we'd
> have to keep a reference count (field refcnt). And finally, we'd
> probably have to check whether any offsets in the data structure are
> larger than last_known_size, and if so, retry the current lookup
> operation with a fresh mmap(2) of the shared memory area, 

Re: Speed up trace mode (GSoC Project)

2019-04-04 Thread Mihir Luthra
Hi,

I have made the memory expansion idea precise and filtered out bugs from
the one I sent previously.
Please check the pdf in the attachment.

Regards,
Mihir


MemoryExpansion.pdf
Description: Adobe PDF document


Re: Speed up trace mode (GSoC Project)

2019-04-04 Thread Mihir Luthra
Hi Clemens,

I had some points in mind about mapping more memory when we run out of
memory, which I wanted to discuss with you.

Also you are right, this project is not at all short. I realise now, many
conditions need to be taken care of while doing tasks.

Each process shared 2 types of memory. One which is for the path checking
data structure.


And other shared memory which is mapped for read and write before the other
shared memory is for these:

bool isExpanding; —— (i)
int memoryAvailable; (ii)

pid_t expanderPid;  (iii)
pthread_id_np_t expanderTid;(iv)

//(iii) and (iv) variables are only required for Approach A.

bool isReading (v)
pid_t readerPid; (vi)
pthread_id_np_t readerTid;(vii)


The memoryAvailable(ii) variable here is used to specify the size of the
cache data struct memory.



*Approach A:*

There are many conditions which need to be taken care of while expanding
the shared memory knowing other threads and processes may also access the
same memory for read or write.

The shared memory which is constructed of Ctrie(including an offset
variable in it to get location of next node), gets accessed for:
1) Read
By the processes who are checking this memory if they find path here
and don’t need to make server connection.
This type of process would open mapped region for read
And in other memory they set readerTid and readerPid by compare and
swap and also sets isReading to true.
While done reading, when the thread tries to reset these readerTid and
readerPid by CAS, if the old value is found the same as expected, the
isReading is set to false. (This ensures, isReading is only true if any of
the process is reading)

2) Write
  This type of process didn’t find the desired path in the shared
memory, which is why had to make a socket connection.
  a) Whoever made a server connection and checked a new path would try
to insert that new path in the shared memory.
  b) Also, this type of process before sending path data to server for
check should also check if the amount of shared memory left is less than a
certain limit because there is a high chance that if a server connection is
being made, the newly checked path may want to get written to the shared
memory and at that time the shared memory shouldn’t lack in space. So
before the path data is sent, more shared memory is mapped.

Now one or more process may be under the same condition, all wanting to map
more memory.
So here,  the region responsible for mapping more memory is our critical
section which needs to be ensured that is getting accessed only by a single
thread.

The entry section of critical section would comprise of 3 variables.
expanderPid, expanderTid and isExpanding. (These are created temporarily in
the respective processes first).
This entry section is very much similar to dekker’s solution of critical
section. where isExpanding acts like a turn variable and expanderTid and
expanderPid act as flag for a particular process.
Now before entering the critical section for mapping memory, the same three
variables at the beginning of the shared memory are compared and swapped
atomically (the atomic CAS is already implemented in file map function).
The process try to swap from the shared memory and even once of the old
value doesn’t match with new value, the particular process stops the
attempt to map more memory because the old value won’t match for a sole
reason that some other process was ahead of this one if taking lead towards
critical section. The tid and pid variables at the start ensure that the
process is being expanded by the right thread.

When the isExpanding is changed, trying to read also stop right there and
enter a loop until isExpanding is set back to false.
Now whichever thread enters critical section, creates a new file, maps
memory, and

Now this process if made to map more memory first sets isExpanding(i) true.
The isExpanding variable is for processes reading from memory, so that they
start looping until this value is set to false.
Then the process has to create a new file with mkstemp again, map more
shared memory (all tasks of mapping more memory done here) and then it
checks continuously the isReading variable of shared memory, and whenever
it is found false, it changes memoryAvailable(ii) accordingly.
This memoryAvailable is checked by processes each time they are using the
memory, so that if this changes they map newly available memory correctly.
This ensures memory management in a lock free manner.
But the only downside here is that the thread responsible for creating all
this memory isn’t under our control, it might die any moment.
Now this can be taken care by some more variables informing other processes
that they need to continue this mapping memory instead.
But a better approach should be the next one-

*Approach B:*

The process responsible for mapping the memory can be made by us.
We don’t have any control on the processes that our 

Enabled commenting on GSoC Proposal

2019-04-02 Thread Mihir Luthra
>
>
> Mihir,
>
> The proposal is view-only. You might need to give comment-access for
> us to leave feedback on the draft.
>
> Done


Re: macports-dev Digest, Vol 152, Issue 1

2019-04-01 Thread Mihir Luthra
>
>
>
> Note that kevent(2) is a multi-purpose API. The call you refer to as (1)
> is not actually the mainloop of tracelib run. Rather, the call a little
> further down is:
>
>   keventstatus = kevent(kq, NULL, 0, res_kevents, MAX_SOCKETS, NULL))
>
> This will also return for new connections to the socket, which
> darwintrace.dylib creates. Depending on why kevent returned, the code
> will either process a line from a socket, or accept a new socket and
> then eventually call kevent again.
>
>
> Actually my question is what is the purpose of the call (1), because till
what I understood, kevent(2) only lets the program proceed further when it
will return and it only returns when it detects an event.
Call (1) is supposed to detect a read event. I am asking where exactly that
read event happens that makes that kevent return.


> I actually think it would be enough to just implement a Ctrie, but if
> you can find a suitable BSD-licensed (or compatible) implementation that
> you can reuse instead of re-implement you could also do the demo you
> suggested. I don't mind either way.
>
>
Ok, I will do that.

>
> >From what I can see, the proposal looks more detailed to me now.
> However, you are not yet addressing the inherent race conditions with
> the suggested approach in the shared memory area and the special
> conditions required due to use of shared memory, such as not being able
> to use normal pointers.
>

I will update the draft to add those.


>
> Please also look into the paper on arXiv I sent you a while ago, because
> that did seem the data structure proposed there would already solve some
> of our problems.
>

Sure, I will make it a priority.


> Don't underestimate the effort required to get the data structure
> implemented given the constraints of the environment. To be certain that
> your data structure works as it should, you will also have to write an
> extensive test suite for all functionality of the data structure,
> possibly using code coverage measurements.


I didn’t mean to underestimate, I already see it's really complex to deal
with so many things at a time.
I wanted to ask to get more clarity. Will put all effort & do the max I can.


>
>
Additionally, if you really think you'd be done with all of that early,
> you can include optional "stretch goals" that we can take up if you're
> done early, such as writing a test suite for trace mode functionality,
> writing documentation for the on-wire format between darwintrace.dylib
> and the server, etc.
>

Sure, I will analyse all points again and add accordingly.


Mihir


Re: Speed up trace mode (GSoC Project)

2019-03-31 Thread Mihir Luthra
Hi Clemens,

I was wondering if the trace mode optimisation project would be sufficient
for entire summer?
Because maximum effort is about understanding the code which needs to be
done before gsoc starts.
After that maybe the shared memory concept can be achieved within a month.
Maybe testing that will be a big deal because bugs maybe possible.
I donno if I am right.
Should i include that "auto detection of build dependencies" for last 1 and
a half month because it says this can be combined with the current project.

I am not sure if I am right on this, if this much work would be legitimate
for the summer.
Looking forward for your reply. ^_^

Regards,
Mihir


Re: Speed up trace mode (GSoC Project)

2019-03-31 Thread Mihir Luthra
Hi there,

@Mojca Miklavec   As I told you before, I will be
documenting trace mode alongside with understanding it,
I have been working on that

https://docs.google.com/document/d/15cVbH6f6hBr9HryJEHUZEbRsToN1BAjY8My-oRstO9A/edit#heading=h.oby3p7ljhsu

Sorry I told you I will get it done by the end of week but I am still stuck
on understanding some internals of server side trace mode. I have mailed
doubts on mailing lists and I guess soon Clemens will help me with those.
Right now I will complete the documentation of client side and arrange the
above documents in a better understandable way within a few hours.
Maybe after all that you can tell me the right place to put these or maybe
if its not in correct way, please let me know.


Also, I think I can make sort of a demo for this project, but I dunno if it
would be a good enough to prove the skills.
@Clemens Lang 
Demo would be kind of like replicating functionality of client side trace
mode in a 2 process program also with 2 threads which call functions like
open, close etc(which I will replace by my own implementations by a library
with dyld_insert_libraries). For their sandbox bounds I will pass my own
arguments and try mapping memory with a file using Ctrie which will be
checked before sandbox bounds function.
Most probably I will deliver this demo in 3-4 days.

And please consider giving a look to the draft
https://docs.google.com/document/d/1qH5VMtrQ3tvd5gFPf51lmJtd6dYfUuEmO1AvXmU_4qM/edit#heading=h.9cfgiuh00i0k

I have made some changes in technical details “problem” & “solutions” and I
have correctly specified right deliverables for first evaluation.
If this way is correct, I may proceed the same way for second and final
evals.
Currently, ctrie seemed the best implementation for checking a path,
although, if I get more time before submission data I will try the best to
look for even better alternative if possible.

Regards,
Mihir


Re: Speed up trace mode (GSoC Project)

2019-03-31 Thread Mihir Luthra
Hi,

I need help understanding *static* *int* TracelibRunCmd(Tcl_Interp *in),
it has blown my mind, I am totally new with these kevents.

I have been trying to understand that from a while now.
Till what I understand, the thread responsible for creating server
eventually calls tracelib run
and then here
 *if* (1 != kevent(kq, , 1, , 1, *NULL*)).  —— (1)

waits here until the kqueue detects that someone is trying to read via
socket and returns a value.

in porttrace.tcl, the library gets injected and sandbox gets set.
and then in portutil.tcl tracelib setdeps sets the dependencies.

Now the in  (1) , it would only return when someone tries to read data from
the socket, when does that happen? Is it when darwintrace calls frecv? That
seems to be the only part where data is read from that socket and for
breaking going beyond that kevent(), reading seems necessary. But to do
that accept() must be called which is called later.

Sorry to make it so cluttered.
Kindly help me in understanding this :)

Regards,
Mihir


Re: Speed up trace mode (GSoC Project)

2019-03-28 Thread Mihir Luthra
Hi there,

I had a question.
Before the build, dependencies are checked.
Enabling trace mode hides incompatible versions of the current software
being installed or the versions installed by other package managers and
more such.
The injected darwintrace.dylib will replace file operations if needed to be
hidden.
But I don’t exactly understand which “processes” call those file operations
and exactly when. Like some examples of such processes?
And how is this lookup being done that arises the need for trace mode? Is
it just because of autoconf of gnu? And if yes, can’t that autoconf be
changed? I am not very sure about this.

I feel this question won’t concern much with the project I am working on
maybe because my working area is darwintrace and tracelib where I am just
acting on the basis of data provided about the process pid or thread tid
variables.
But still not very sure.

Regards,
Mihir


Re: Speed up trace mode (GSoC Project)

2019-03-27 Thread Mihir Luthra
On Wed, Mar 27, 2019 at 10:22 PM Mojca Miklavec  wrote:

> Dear Mihir,
>
> On Wed, 27 Mar 2019 at 17:25, Mihir Luthra wrote:
> >
> > Hi,
> >
> > I have shared my draft application from the GSoC dashboard.
> > Please provide me with feedbacks. :)
>
> Please note again that I'm not familiar with the contents at all, so
> I'm providing just some general feedback.
>
> What I miss a bit is some clear definition of deliverables, what
> pieces of code would be suitable enough for merging them into base and
> when.
>
> Background: One of the problems with many earlier projects with
> macports base was that a student was working all summer in his own
> branch, waiting for the code to be "complete and well-tested", and the
> code was subsequently never merged into the trunk / master, or maybe
> it's still considered useful, just not 100% polished, and some members
> still plan to clean up and merge the code as old as 10 years. On the
> other hand, if the code ends up in the master branch early and some
> issues are discovered, they would still be fixed. If the code lies in
> its own branch, it doesn't get nearly as much testing and might get
> forgotten.
>
> Of course you cannot always merge immediately, as some pieces might
> need a bigger pile of code at once to produce something useful without
> breaking stuff. But it would be very helpful if some code could be
> merged into master at least once per week. If you could split it into
> smaller reasonable chunks, it would be even better to do this on an
> almost daily basis when possible (it's still ok to have two weeks of
> some bigger feature rework in the meantime).
>
>

Thanks for the feedback Mojca. ^_^.

That definitely makes sense.
Clemens told me to work on trie, ctrie or related data structures in his
last mail.
Most likely till weekend I will understand how to relate the main code with
these data structs and plan where to place the desired functions.
Will also try to break in the smallest chunks possible. ^_^



> And please try to find a way to try to contribute some patches, docs,
> bugfixes ... to the MacPorts base in the near future. Maybe you could
> write some unit tests for the base related to the trace mode? Talk to
> Marcus or Clemens about some challenges if you fail to find some
> yourself.
>
>

I made a contribution, not a big one though, day before yesterday I guess
.Here is the link:
 https://github.com/macports/macports-base/pull/117

And I am working on docs already, I will share them too till the weekend.
Hope those docs proof helpful :)

I will give a research on unit test task and see the best I can do right
now. ^_^

Till now what I saw the code in each file works in conjunction with many
others file, so while trying to fix bugs or adding patches right now I may
cause more bugs, but I will try to contribute patches as soon as I can. :)



> > Also, should I share the link to document here as well?
>
> Yes, that would definitely make sense.
>

Here is the link to my draft proposal.

https://docs.google.com/document/d/1qH5VMtrQ3tvd5gFPf51lmJtd6dYfUuEmO1AvXmU_4qM/edit#heading=h.tal46x1pbsaj

Please provide with any more feedbacks where ever possible. ^_^

Regards,

Mihir


>
> Mojca
>


Re: Speed up trace mode (GSoC Project)

2019-03-27 Thread Mihir Luthra
Hi,

I have shared my draft application from the GSoC dashboard.
Please provide me with feedbacks. :)

Also, should I share the link to document here as well?

Regards,
Mihir


Re: Speed up trace mode Project GSoC

2019-03-27 Thread Mihir Luthra
Hi,

Thanks for the helpful response ^_^.

I have been through the code files of porttrace.tcl, tracelib &
darwintrace. I understood their high level working.
I will go through the function __darwintrace_get_filemap() to understand
more about compare & swap & will look for more lock free primitives. ^_^.

Regards,
Mihir


On Tue, Mar 26, 2019 at 1:19 AM Clemens Lang  wrote:

> Hi,
>
> On Wed, Mar 20, 2019 at 09:06:11PM +0530, Mihir Luthra wrote:
> > I am not a master in dealing with low-level system stuff, but as I
> > have worked pretty much with unix shell scripting and C language, I
> > can connect to points.
> >
> > I wanted a few tips from you regarding the project:
> >  1) What way should I proceed?
> >  Should I start with understanding the trace code?
> >  If yes, then can you give me a some idea on which files I
> >  have to work with and in what order should I start
> >  understanding them?
>
> That's definitely required for this task, yes. The most relevant parts
> are the client-side in src/darwintracelib1.0, most notably darwintrace.c
> [1]. Check __darwintrace_setup() and __darwintrace_get_filemap() (most
> notably the use of compare-and-swap in that function) and
> dependency_check(). You should understand why the compare-and-swap in
> __darwintrace_get_filemap() is required and look at the different
> scenarios that can happen when two threads call this function at the
> same time. The same principle of non-blocking synchronization using
> compare-and-swap would need to be applied for this project.
>
> Server-side, look at pextlib1.0/tracelib.c [2]. The main server is
> handled in TracelibRunCmd which accepts new connections and processes
> pending requests using kqueue(2)/kevent(2) to ensure non-blocking
> processing. dep_check() does the query of the registry using SQLite.
>
> Additionally, port1.0/porttrace.tcl is the somewhat higher-level setup
> of the tracelib.c server component from Tcl which eventually calls
> 'tracelib run' in a separate thread to start the C code.
>
> >  2) Are there any books or papers or anything which I can refer
> >  to?
>
> I'm not in a position to recommend current papers on this, but a good
> starting point might be research on lock-free map-like data structures,
> e.g. a Ctrie [4] or related data structures [5].
>
> >  3) Shall I continue contacting you on mail or should I use some
> >  other way?
>
> The usual way is sending me an email but Cc'ing the macports-dev list,
> so that others can also contribute. I've added the list to Cc now.
>
>
> >
> ***
> > A short Report on what I understood and worked upon
> >
> ***
> >
> > [...]
> >
> > I got some high level understanding on the way code is working
> >
> > 1)That DYLD_INSERT_LIBRARIES dynamically links the tracelib during build
> so
> > that all file related operations are under our control.
> > 2)Now during run time( most probably the desertroot phase) when check is
> > being made , in the 3rd possibility, as data is always being fetched
> > from server and not cached, we need to implement an efficient way for
> > cache storage which as u told by mmap(2) and forming a shared memory
> > for all processes and it is spoc for server interaction. Some data
> > structure needs to be implemented to reach to all processes fast. but
> > we can’t point to the this shared memory as we only get offset info
> > from mmap 2. (This point maybe a bit ambiguous I need to research more
> > to understand this completely)
>
> This does not only apply to the destroot phase - we need to limit file
> accesses during both the configure and build phases, too, so that the
> build systems to not inadvertently find things on the filesystem they
> shouldn't see.
>
> As for the offset portion, the problem is as follows:
>
>  - You mmap() a file descriptor into memory (where that file descriptor
>comes from remains to be seen, but it can also just be a file on disk
>for all intents and purposes for now). That file descriptor will
>point to the same file in all processes.
>  - We do not control the memory layout of the processes our library is
>injected into – hence we cannot assume a fixed address for our cache
>data structure, but must let mmap(2) pick a free address. This means
>that different processes will have the same cache data structure at
>different address

Re: Speed up trace mode (GSoC Project)

2019-03-25 Thread Mihir Luthra
Thanks for the helpful information ^_^.

Wiki seems to be a the right place to put all this information and later
(if possible), maybe some quick links on the main website to these wiki
pages would be helpful.

For now I guess you are right, will put anything I document on wiki.

Regards,

Mihir


Re: Speed up trace mode (GSoC Project)

2019-03-24 Thread Mihir Luthra
Hi,

I had a few questions regarding Darwin trace library.

Darwintrace library being injected, most I/O operations get reimplemented.
If a “single” process is working on files, it more or less should call most
of these functions again and again.
Like that particular process may call open, rename, rmdir etc.
So lets say a process was allowed to open file “X”. (closed it too)
The same process has to do rmdir now. So it will again call
__darwintrace_is_in_sandbox().
If a file has already been checked once, that file path can be stored at a
place so that all these checks don’t need to be re-performed
and next time if the same process is calling any of these functions, it
doesn’t need to call the time taking __darwintrace_is_in_sandbox() and
should check that list of already checked files first.
This should optimise the trace mode to some extent.
Is that right or am I mistaken somewhere?


On Sun, Mar 24, 2019 at 12:22 AM Mihir Luthra <1999mihir.lut...@gmail.com>
wrote:

> Hi,
>
> I wanted to contribute to the MacPorts documentation.
> The guide says that it currently is in docboook format with work in
> progress to change it to adoc.
> Actually I got a bit confused with this.
> Where exactly should I add a file in order to create a new section?
> Like currently there are 7 sections in MacPorts documentation.
>
> I was thinking to add an 8th section which gives a small high level tour
> of the codebase to a newcomer joining the community.
> I personally would only be able to give tour of trace mode, but that
> section can be populated with other topics as well later.
>
> Mojca, I will stick to this email thread for any further communications
> Apologies for previously creating many!
>
> Regards
>
> -Mihir
>
>


Speed up trace mode (GSoC Project)

2019-03-23 Thread Mihir Luthra
Hi,

I wanted to contribute to the MacPorts documentation.
The guide says that it currently is in docboook format with work in
progress to change it to adoc.
Actually I got a bit confused with this.
Where exactly should I add a file in order to create a new section?
Like currently there are 7 sections in MacPorts documentation.

I was thinking to add an 8th section which gives a small high level tour of
the codebase to a newcomer joining the community.
I personally would only be able to give tour of trace mode, but that
section can be populated with other topics as well later.

Mojca, I will stick to this email thread for any further communications
Apologies for previously creating many!

Regards

-Mihir


made a pull request(GSoC)

2019-03-22 Thread Mihir Luthra
hey everyone,

I was checking porttrace.tcl code.

In the code tracelib is being used both as a command and a variable name.

While loading darwintrace library through DYLD_INSERT_LIBRARIES,
tracelib is used as a variable for storing darwintrace library path.

I renamed that tracelib variable to be darwintracepath.

Mihir


Regarding GSoC

2019-03-22 Thread Mihir Luthra
Thanks for the helpful response ^_^

Directly altering the code in the base, maybe I am not that much ready I
guess.

I think making a small introduction in the documentation for newcomers to
MacPorts community to get a really easy high level understanding on how the
MacPorts code flows to get trace mode working, is that a good idea? Maybe
coz I went through the same phase last few days.. @Mojca Miklavec


@mcalh...@macports.org   I have been surfing through
the code, I can actually explain 50% I guess on how it works.

I am still halfway through tcl tutorial, but because I was now
understanding how it switched from tcl code to C code, I started looking
over it first.

Here a small flow of my understanding->
ports.tcl (if -t then global_options(ports_trace) = yes).
—>  macports.tcl(set porttrace)
—> porttrace.tcl(calling tracelib, and loading Darwintrace lib by
DYLD_INSERT_LIBRARIES)
—> tracelib is defines in C code as function(TraceLibCmd) which is
interfaced to be a tcl command through Pextlib.c by the Tcl C api
TclCreateObjCommand.

Although the C code is much relieving to read but still I am new to
sockets. I read its basics to get pretty much familiar.
As per the plan, to provide cache storage for the data being received from
the server component, I still need a better understanding on this.

It would be great if you could provide me with some efficient learning way
further from here…
and should I focus on tcl or the Darwin C library code is my working place?


Regards,

Mihir


Regarding GSoC

2019-03-22 Thread Mihir Luthra
Hi there,

@Mojca Miklavec  I have been working on getting a good
understanding of MacPorts base.

My project would be trace mode optimisation.
I feel I have come quite far in understanding the trace mode working in
code.

I have made plans on what I will do for the project and have started to
understand the code base(well still working more).

This project has been labelled as medium to hard.
I think till the application period I will be able to give a good
explanation on trace mode working and how I m planning to optimise it.

If demo is needed in this, what are your expectations from the demo?
If not, will it be enough to give a detailed explanation of my
understanding and my plans in the application?

Before you said we will have period from 25 march to 9 April for improving
on our application?
Will google let us reload our application or do I have to submit a rough
application on here first so I can finalise with everyone here?


Re: GSoC Application

2019-03-21 Thread Mihir Luthra
Thanks for the tips,

I have tried using MacPorts and tried installing some packages as-well.
And the project I am interested in optimisation of trace mode.
I discussed it with the potential mentor mentioned.
Before I didn’t knew correctly how to use mailing list. Now I will msg here
as asked.

I have a 4 yr experience with C language, and tcl I have been working on it
now, and I found that guidance youtube vid linked on the website which I
will go through.
I have been using Mac for iOS app dev since 2 years. I have read lots on
unix shell scripting in past so I have some knowledge of unix systems.

Further as mentioned by Clemens sir,
This project would require knowledge of low level system, so which books or
papers shall I refer in order to get a better understanding?

I planned for myself this way:
learn tcl -> watch the MacPorts code base vid -> go through code files

Any improvements or suggestion?


On Thu, Mar 21, 2019 at 2:09 PM Mojca Miklavec  wrote:

> Dear Mihir,
>
> (CC-ing another student with a similar question and no particular
> project proposal yet.)
>
> Welcome to the MacPorts community!
>
> On Thu, 21 Mar 2019 at 08:20, Mihir Luthra wrote:
> >
> > Hi everyone,
> >
> > I had a few questions
> >
> > What all should I work upon before applying
>
> - Decide on and discuss the idea for the project you want to work on.
> - Get a decent understanding about how MacPorts works (get it up and
> running, install a few ports etc.) and get a good understanding about
> what you need to do to finish the project.
> - Prove your skills by either creating some demo or submitting some
> pull requests (you may ask for guidance about what you could do, but
> it makes sense to first pick a project, so that the tasks can be more
> related to the project).
> - Make sure that you submit your first draft proposal *as early as
> possible*, so that you can still have sufficient time (2 weeks) to
> make significant improvements based on the feedback you get from us.
> - Read this mailing list or archives where there will be plenty of
> GSOC-related discussion going on, optionally follow us on IRC.
>
> You may keep submitting patches also after submitting the application,
> but discussing the idea is absolutely essential for success. You
> should allow at least 10 days for proposal review & improvements,
> ideally even more.
>
> > and
> > Will a mentor be assigned to me or do I need to discuss with the mentor
> and then submit proposal with mentor name mentioned?
>
> You don't need to find a mentor yourself. The mentor would be assigned
> to you based on the project idea (but yes, you definitely want to
> discuss the idea before submitting the proposal, else you might be
> wasting a lot of time going in the wrong direction instead of using
> that time productively with some guidance).
>
> You should not contact the mentor(s) directly, the ideas are best
> discussed on this mailing list where other experienced developers can
> also provide feedback, not just the potential mentor.
>
> If you want some guidance, you might want to tell us a bit more about
> yourself and your interests, and tell us which project ideas sound
> interesting to you. Ideally you would do at least a tiny bit of
> research into some ideas yourself (or ask if idea description doesn't
> sound clear enough) and then come up with additional questions and
> suggestions.
>
> You could pick your idea in one of the following areas:
> (a) working on new packages or improving existing ones (but that
> requires taking on a bit more than just a single package, more like a
> whole group of packages that need extra care; this could be done for
> almost any given software in existence :)
> (b) working on python modules for automatic generation of packages
> from any "upstream package manager" to MacPorts, like conversion of
> ruby gems / python pypi / perl cpan / haskell cabal / javascript npm /
> ... (no need to work on all of them, just some subset)
> (c) working on any of the plenty projects that improves the package
> manager itself (C/C++ and Tcl)
> (d) standalone web application and improvements to our build infrastructure
>
> Projects in (c) are relatively important and you may pick almost
> anything, even if it's not on the project list, but any other area is
> suitable as well. So far there was probably most interest in (d), so
> you might want to pick from others?
>
> Mojca
>


GSoC Application

2019-03-21 Thread Mihir Luthra
Hi everyone,

I had a few questions

What all should I work upon before applying
and
Will a mentor be assigned to me or do I need to discuss with the mentor and
then submit proposal with mentor name mentioned?