Re: Please welcome the four GSOC projects this year

2019-05-07 Thread Arjun Salyan via macports-dev
Thank You Mojca!

Congrats Satryaji, Karan and Rajdeep.

The community has shown amazing support for all of us during the
application period itself. I can't thank enough for the help I received in
tackling all the problems and doubts.

Looking forward to a really productive summer ahead while being able to
contribute something valuable.

Thank you everyone


Re: GSoC 2019 [Collect build statistics]

2019-04-15 Thread Arjun Salyan via macports-dev
Hi Mojca,

On Mon, Apr 15, 2019 at 9:46 PM Mojca Miklavec  wrote:

> Given the current state of the app with sufficient complexity, I
> believe that it would be wise to introduce some unit tests to be able
> to extensively test what happens with data you import, and to prevent
> / detect any breakages in the future.
>

Thank you. Since, I am currently working on parsing of maintainers I began
testing from maintainers only. It helped me make significant improvements
to the code which extracted the maintainers ( added to the pull request  :
https://github.com/macports-gsoc/macports-gsoc-2019-webapp/pull/1 ).
[update: this file has further changed since I updated the pull request,
logic remains the same, just the JSON object structure has changed]

I ran the tests and got desired results. I will show the final code and
results in around 24 hours after I get done with my viva voce and extra
classes, but below I am discussing the approach. Sorry, if this is not the
right way or the presentation is not fine.

I created five ports:

   1. portA maintainers {@github gmail.com:test1}
   2. portB maintainers {@github gmail.com:test2}SAME GITHUB,
   DIFFERENT EMAIL
   3. portC maintainers {@newgithub gmail.com:test2}SAME EMAIL,
   DIFFERENT GITHUB
   4. portD maintainers {gmail.com:test2}EMAIL REPEATED WITHOUT
   GITHUB
   5. portE maintainers {@github}GITHUB REPEATED WITHOUT EMAIL

I received 3 unique Github and Email pairs (according to the Logic[1] ) and
I am considering each as a different maintainer.
[
{
"github": "github",
"name": "test1",
"domain": "gmail.com"
},
{
"github": "github",
"name": "test2",
"domain": "gmail.com"
},
{
"name": "test2",
"domain": "gmail.com",
"github": "newgithub"
}
]

Now to each maintainer I added all those ports which had GitHub or Email or
both same as that of the unique maintainer.

[
{
"model": "ports.Maintainer",
"pk": 0,
"fields": {
"github": "github",
"name": "test1",
"domain": "gmail.com",
"ports": [
[
"portA",
"portB",
"portD"
]
}
},
{
"model": "ports.Maintainer",
"pk": 1,
"fields": {
"github": "github",
"name": "test2",
"domain": "gmail.com",
"ports": [
[
"portA",
"portB",
"portD",
"portC"
"portE"
]
}
},
{
"model": "ports.Maintainer",
"pk": 2,
"fields": {
"name": "test2",
"domain": "gmail.com",
"github": "newgithub",
"ports": [
[
"portE",
"portB",
"portC"
   ]
}
}
]


 For querying we can now use email/ GitHub and show all the ports for all
the maintainers received.

This should not break because of any inconsistency in the maintainer
details. But there is one disadvantage- On the port-detail page, we will
now show x maintainers, if the same maintainer provided x different pairs
of GitHub and email. However this disadvantage might prove to be helpful in
getting rid of the inconsistencies.

Thank You

[1]
Currently I am using the following Logic for adding maintainers (comparing
with already parsed maintainers) :

   - If neither the email nor GitHub is repeated: CREATE NEW
   - If the email and GitHub both are repeated: SKIP
   - If the email is repeated and not the GitHub handle (provided) : CREATE
   NEW with inconsistency flag
   - If the GitHub handle is repeated and not the email address (provided)
   : CREATE NEW with inconsistency flag
   - If the Github handle is repeated and email is not provided: SKIP
   - If the email address is repeated and GitHub is not provided: SKIP


Re: GSoC 2019 [Collect build statistics]

2019-04-12 Thread Arjun Salyan via macports-dev
Hi Mojca,

On Fri, Apr 12, 2019 at 3:03 AM Mojca Miklavec  wrote:

> Awesome!


Thank You.

One thing that "urgently" needs to be done is to obfuscate the email
>
(if written at all). I'm not even sure whether we actually want the
> email being displayed there. We need it to send automated emails from
> the buildbot in case some failure happens, or to occasionally contact
> the maintainer directly. But exposing that info on the website might
> be too much.
>

I had fixed it as soon as I saw the email, but couldn't reply that time.

As far as accessing the information is concerned: at the moment you
> use email as unique identifier. I would probably use github handle by
> default / as the main entry point. I think that '@' is an allowed
> character in URL. If so, we could use
> /maintainer/@ryandesign
> to access the same page


There was a big error in my parsing script due to which the GitHub handles
of a lot of maintainers were not being parsed. And hence, I went with
email. But still, some 300 maintainers have not provided GitHub handles.


> Alternatively we could allow macports handles (for those with
> @macports.org email addresses), so
> /maintainer/ryandesign
> could work just as well.
>

Yes, this would be good.

For maintainers with a super long list of ports, or, for
> non-maintained ports in particular, we might eventually need a way to
> shorten that list (have multiple pages) or provide a similar search
> functionality as for global ports, except that here it would be
> limited to the ports by that particular maintainer.

I would put that list of ports in a table and add version & short
> description.
>

Yes, I will add pagination and show more details for both list of ports on
maintainer's page and on category page. The filter would also be amazing. I
will do this.


> I still need to check the code: what's your current strategy for
> showing links to the tickets?
> At some point we could differentiate different types of tickets (for
> example mark bugs separately).
>

Sorry to disappoint you here, but right now I am using web scrapping to do
this. I am looking for a plugin that could add public api feature to track
tickets, but maybe it doesn't exist right now. We can make our own for sure.


> One minor suggestion. I really like the "search for port" field. Could
> this be added to every page? There is "ports" in the top right corner,
> but that one is a lot less useful in itself (not saying that it should
> go, just suggesting search on each page).
>

I have added it to maintainer-detail and port-detail. But it still needs
some work on its position on the page.


> Here's what I would do, but feel free to propose an alternative and/or
> discuss further. (Actually, I have two slightly different ideas for
> implementation in my mind, I'll describe one of them first.)
>
> For the maintainers you could declare a unique keyword over the
> combination of github handle + email. Every maintainer of every port
> has a uniquely specified pair (github + email) when you import it to
> the database (neither github handles not emails would be unique on its
> own). Note that you still have index specified on both columns
> (separately on each, but the uniqueness is only defined on combination
> of the two).
>
> When you read the port, you check the pair (@github, email). If the
> pair already exists in the database, you enter the (port, author) pair
> into the database of maintainerships. If it doesn't exist, you create
> it first, and then assign the maintainership to the port. (Note that
> whenever you are updating the port, you also need to check if you need
> to remove some maintainers from that port.)
>
> When you display a particular maintainer, say @somerandomgithubhandle,
> you run a query and if you hit more that one entry with that github
> handle in the database:
>

This would solve the problem of multiple emails with same GitHub handle.
But there are cases when the maintainer has provided 'GitHub and email'
both for one port and just 'email' for other port. Sorry If I am missing
something.

Example:
for libsmf {ryandesign openmaintainer}
for penal-soft {{ryandesign @ryandesign} openmaintainer}

And while many haven't provided GitHub handles, some haven't provided
emails.

You could also have a separate page which runs different queries and
> looks for all maintainers with inconsistencies (that's for later). It
> would generally be helpful to have a collection of such pages for
> different things, like: all broken builds on buildbot, all outdated
> ports, ...
>

This for sure would be very good to add into the app once everything is
ready.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-04-11 Thread Arjun Salyan via macports-dev
Hi,

I have added maintainer views and tables to the demo app.

   - List of maintainers is clickable on the port-detail page.
   - A maintainer-detail view that display info and list of maintained ports

Examples:
maintainer-detail:
https://frozen-falls-98471.herokuapp.com/maintainer/ryandesign__macports.org/

port-detail: https://frozen-falls-98471.herokuapp.com/ports/faust/


But, while extracting 'maintainers' from the portindex, maintaining
uniqueness was very difficult. There are a lot of inconsistencies-
- Same maintainer has provided GitHub details for one port and not for the
other.
- Same maintainer has provided different email for different ports.

I understand that it should be web-app's job to detect this and for now the
problem is mostly solved. But in future, one odd case and things can break.
What best can be done about this?

On Tue, Apr 9, 2019 at 3:21 AM Mojca Miklavec  wrote:

> A general suggestion from me would be to study in depth some good and
> exhaustive book on relational database design to fill in the holes.
> (There might also be some online courses.)
>

Thanks Mojca. I did some research for a detailed book and I found "An
Introduction to Database Systems" by C.J. Date (I also found it in the
library). I will let you know how it goes.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-04-08 Thread Arjun Salyan via macports-dev
Dear MacPorts Community,

I have submitted my Final Proposal. I do understand that during these last
hours it might not be possible to give feedback on the proposals. But if I
am lucky enough to get more of them, I will try to get the job done (around
23 hours still remaining).

Google Doc:
https://docs.google.com/document/d/198Ivygxb2NJQz_sqzDrbDPVEYZ5Ye5Yw0LV6Bt2QmG4/edit?usp=sharing

Thank for being so helpful!


Re: GSoC 2019 [Collect build statistics]

2019-04-07 Thread Arjun Salyan via macports-dev
>
> Hi,

On Fri, Apr 5, 2019 at 8:58 AM Umesh Singla 
> wrote:
>
>> It’s always to good to show your work and get feedback. It’s difficult to
>> comment on the quality otherwise. Please do not forget to make a PR.
>>
>
I have submitted the PR for adding an option to portindex (which would
generate a separate file with Changed Ports)

https://github.com/macports/macports-base/pull/121

I am still working on handling deletions.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-04-05 Thread Arjun Salyan via macports-dev
On Fri, Apr 5, 2019 at 8:58 AM Umesh Singla 
wrote:

>
> It’s okay to share all the project related updates on the list. In fact,
> it’s preferred that way.
>

Thanks, I was just being hesitant to write that on the list.


> Does this include additions and/or deletions?
>

Portindex uses modified time to detect changes, so it does not seem like if
it would be able to detect deletions, as Mojca said. But, I just realised
that we can make some further changes- if a port is available in the old
portindex but the port directory does not exist, then we can mark the port
as deleted.


> It’s always to good to show your work and get feedback. It’s difficult to
> comment on the quality otherwise. Please do not forget to make a PR.
>

Thank you, I will do this. I was just uselessly worried that this command
might not be very useful in general- but then it is important to get code
reviewed.

Thank you

>


Re: GSoC 2019 [Collect build statistics]

2019-04-02 Thread Arjun Salyan via macports-dev
Hi Mojca,

On Tue, Apr 2, 2019 at 3:14 AM Mojca Miklavec  wrote:

> The drawbacks may include:
> - some ports will be skipped on the builder, for various reasons (port
> is known not to build on a particular builder, it may not be
> distributable, ...)
> - the buildbot master may be down or experience problems, so data
> might go missing
>

Thanks. I will consider these factors when improving upon this.


> A strange observation from your source code: you synced portindex and
> ran the conversion, but then loaded the data from another json file?
> Am I missing something?
>

No, the conversion "tclsh portindex2json.tcl portindex" is writing to the
file "syncedportindex.json". And I am reading from the same file. I am
really sorry that I did not submit a PR and it was difficult for you to
review the code.


> There are various ways to achieve the goal. Note that if you run
> portindex yourself, it will detect which files have been updated and
> only ever touch data of those ports. The portindex command could be
> modified to only output the file with changes (when you pass some
> options to it). This will still miss deletes, but it would be an
> efficient way with almost no dependencies.
>

Does this imply that we will keep a clone of macports-contrib locally and
run a modified 'portindex' command to generate a file with only the updated
ports?


> One way would be to generate portindex yourself and always remember
> what git shasum has been used, and store that shasum to the database.
> Next time when you update, check and store the latest shasum, then ask
> git which paths have changed between the two commits, and only update
> ports whose paths match the paths reported by git as changed.
>
> It could also help if you stored a "complete" git history to the
> database (shasum, which ports changed at that point, timestamp,
> parents). Not sure if that's really so helpful, just as an option.
>
> What might be an interesting approach would be to try to squeeze the
> git shasum to the PortIndex. This could also help when submitting
> statistics as it would be easier to determine how old the database is
> / when the user last synced. (It would not work for people with their
> own modifications of the tree.) If you had the shasum in portindex,
> you could still run git independently to check for the difference.
>

These methods are not very clear to me, I haven't dealt with shasums yet. I
will discuss about them, after my research.


> Just some random ideas.


Thank you so much.

Regarding updates of builds: just ask the database about which build
> you synced last, and then sync any builds newer than that, up to the
> last one. You may need to check whether a build was complete when you
> last enquired.
>

Thanks, I am already using the same method.

Arjun


Re: GSoC 2019 [Collect build statistics]

2019-04-01 Thread Arjun Salyan via macports-dev
Hi,
I was working on keeping the PortIndex updated, and was able to achieve
this:

   - Sync Portindex from 'rsync://
   rsync.macports.org/macports//trunk/dports/PortIndex_darwin_16_i386/PortIndex
   '
   - Update or Add ports that were recently built on 10.14_x86_64 (using
   time frame 'last 24 hours' for now).
   - New ports, (SoapyAirspy
   ,
   SoapyAirspyHF
    etc)
   were successfully added, and can now be seen on the demo app.

This is exactly the approach I wrote in the proposal and I wanted to show a
working demo, so that I can get feedback about how efficient this method is.
The script I used: update_portindex.py

.
( note: the code might not be very well written, I was just looking to get
things working. Also, I am only updating ports built on '10.14_x86_64')

I am also scared of running a for-loop over the entire portindex to update/
add the recently built ports. Could you please take a look and comment on
how good this approach is?

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-31 Thread Arjun Salyan via macports-dev
On Sun, Mar 31, 2019 at 2:05 PM Mojca Miklavec  wrote:

> There seem to be some issues with subports. (For example any p5.28-*
> under perl would give an error page.)
>
Yes, it was happening for the ports that contain '.' in their names. I have
solved it using regex now.

Just curious: what's the order of magnitude of the time it took?
>
When the database was on the same machine, it took like 30 seconds. And
with my AWS Free Tier Database it took near 40-45 minutes.


> I now created:
> https://github.com/macports-gsoc/macports-gsoc-2019-webapp
>
> You have full commit rights there, but the idea would be to first
> populate the repository with a basic README (else you probably cannot
> clone the repository at all), then clone it, and finally create a pull
> request with the relevant changes, and not commit directly before the
> code gets reviewed.
>

Sorry that I messed it up. I have created the pull request now.

Thank you, but I don't see it in our dashboard [yet?].
>
Oh, Sorry! I thought the first step was enough. I will quickly finish the
remaining.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-30 Thread Arjun Salyan via macports-dev
Thanks Mojca, I will update the Demo App with the suggested changes very
shortly.

On Sat, Mar 30, 2019 at 8:30 PM Mojca Miklavec  wrote:

> Anyway, I'm just curious: what's the current situation with
> django/database hosting, network and other limitations etc.?
> Now that you have the code which can both import the full database, as
> well as parse and show the builds, it's actually a pity that there's
> only a small fraction of ports available, and no idea which ports
> actually show some useful build info.
>
All ports are now available. It took quite a time, but now all ports are
there on the web app.
It is the build history now, how many logs would you suggest I fetch? I was
being careful in this as you told.

What would be really cool though is to start some actual review
> process for the Django code, as that's where there would be a lot more
> work, and probably more substantial comments.
>

That will be exciting.


> Personally I don't have permissions to create a new repository under
> macports organisation, but as Umesh suggested, we can create a
> temporary org somewhere, create an empty repository, and then submit a
> pull request to that one.
>

 Shall I create the org? Or how do we proceed?

And since you already have a draft proposal ready, it would also make
> sense to submit it. (The final version gets submitted later.)
>

I have submitted it, thanks.


Re: GSoC 2019 [Collect build statistics]

2019-03-30 Thread Arjun Salyan via macports-dev
Hi,
I have installed an ajax based search box in the demo app:
https://frozen-falls-98471.herokuapp.com

Also, just wanted to give a polite reminder about my open PR:
https://github.com/macports/macports-contrib/pull/3

Thank You

>


Re: GSoC 2019 [Collect build statistics]

2019-03-28 Thread Arjun Salyan via macports-dev
On Thu, Mar 28, 2019 at 12:05 PM Mojca Miklavec  wrote:

> What if there's a server outage?
>

Then the best way is to use HttpStatusPush to deliver instant updates, and
so that any build is not missed due to server failure, we could run our
fetching script once per day. The script can easily match if any of the
build number present in logs is absent from the database.


> (3) The database needs to be designed in such a way (and the software
> needs to be written in such a way) that frequent updates of the full
> portindex2json:
> (a) works correctly (ports missing from PortIndex are marked as
> gone, no duplicate entries of ports, all info up-to-date)
> (b) works super efficiently
> (c) works with minimal overhead
> If network speed is the bottleneck, make sure that you feed / update
> the database from the same machine where the database is running.
> Updating via git is super fast, you want to avoid transferring the
> full 20MB file over network over and over again. Even if the testing
> system is running at strange configurations, suggest the architecture
> of how it would ideally be implemented if you can design the system
> and architecture yourself.
>
For keeping an updated copy of portindex.json this seems a fine pathway:

   - Generate portindex.json file along with Portindex, i.e. run
   portindex2json.tcl on our own. [ this would also help in our discussion
   with repology ]
   - portindex.json can be stored in the same directory as PortIndex and if
   we run our web-app on a different machine [ which is the most probable case
   ] then we could keep web-app's version of portindex.json updated using
   rsync [ repology is doing the same, not sure though ].
   - Then using os.stat on web-app's version of portindex.json, we can
   continuously check the file's 'last modified' time and can hence, can
   detect if there are any changes.

Now as we have an updated copy of portindex.json, we go back to our build
history which is constantly receiving updates from the server [ without
delay, if everything is fine and with some delay in case of server outage ]
and detect which ports had been recently built, and for those ports we
would then update the database using portindex.json.
To ensure things remain in right manner, we can schedule a weekly 'complete
syncing of database and portindex.json'.



> (4) Suggest a way to minimize the data transfer, so that it will only
> include the changes rather than the full data set. How to get such
> data? What would need to be changed / improved?


rsync would do exactly this.

(5) You won't be getting port renames. What you do get is
> "replaced_by" information at best (say, perl5.26 could be replaced_by
> perl5.28). When a port is renamed, treat it as a different port, but
> the old port could be marked as "inactive" and "replaced_by  port>" (if it's not deleted yet). This information is probably not in
> PortIndex, either portindex would need to be improved, or you need to
> find a different way.
>

Okay! So the name change problem can be handled. We can have a column of
"replaced_by" in out table and as long as it is empty/ NULL -> the port is
active else it is inactive and has been replaced by a new port.

Please let me know if these approaches look fine.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-27 Thread Arjun Salyan via macports-dev
>
> - A more elaborate plan about how you plan to handle updates / keep
> the database up-to-date. Sure, we can trigger certain actions from the
> buildbot, but those various "actions" need to be implemented. Keeping
> the app up to date in a safe and reliable way is a very important part
> of the project, and requires collecting data from various sources.
> "Look for the most efficient ways to keep the PortIndex and Build
> History up-to-date" should be already attempted now.
>

We can keep the build history up-to-date by using HttpStatusPush, I read
about it in buildbot documentation
. It sends a
json object containing build data. This would even remove the need of a
parsing script on web-app's end which fetches the logs from buildbot.

But I am having a problem in reaching at a good method to keep PortIndex
updated. PortIndex does not give any id to each port, and suppose I assign
them ids in the database. Then if a port is renamed, it would be impossible
to identify which port was renamed because PortIndex has no idea about the
ids in the database.

Another problem is the size of the file- every time running
portindex2json.tcl over the generated portindex and then looking for
changes does not appear to be very efficient. Neither does the build page
 seem to
provide any relevant info about the changes.

Any suggestions on tackling these would be very helpful.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-27 Thread Arjun Salyan via macports-dev
Thank You Mojca!
I read you inline comments as well, I have already started working over the
suggestions.
I will include these very shortly!

>

> Does your new semester already start at end of July?


Yes, it does. But it won’t affect the project. I have done projects in a
semester as tightly scheduled as the one going on- and the upcoming is
hardly this packed!

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-27 Thread Arjun Salyan via macports-dev
Dear all,

Using the valuable information and suggestions by all of you here at
MacPorts, especially the potential mentors, I have come up with the first
draft of my proposal for GSoC 2019.
Link to Google Doc:
https://docs.google.com/document/d/198Ivygxb2NJQz_sqzDrbDPVEYZ5Ye5Yw0LV6Bt2QmG4/edit?usp=sharing

I am eager to make it further better by taking inputs from you.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-24 Thread Arjun Salyan via macports-dev
On Sun, Mar 24, 2019 at 10:02 PM Mojca Miklavec  wrote:

> Here are some examples of why I don't see a single correct answer to
> your initial question. Let's assume that you know absolutely
> everything about all MacPorts installation (exact timestamp of when
> each port was installed or uninstalled, exact timestamp of MacPorts
> installations / upgrades / removals ...) and you want to know the
> answer to
> "How many users have port Foo installed on each OS version in March
> 2019?"
>

If we go with the current setup, mpstats submits data weekly, and hence to
make the reporting as precise as possible, we would need to present reports
on per-week basis, also as Craig suggested.
I have tried something here:
https://docs.google.com/document/d/1VReRyPYKifZ1ub77oXXP7ZCqi20nq2jPrKzNxQJ7hxk/edit?usp=sharing
.
Please take a look when you get time.

Thanks


Re: GSoC 2019 [Collect build statistics]

2019-03-24 Thread Arjun Salyan via macports-dev
Hi,

I have prepared a Google Doc on the implementation of installation
statistics. I do not know if this is the right way to get suggestions. But
it would be great if I could get feedback and suggestions on this:
https://docs.google.com/document/d/1VReRyPYKifZ1ub77oXXP7ZCqi20nq2jPrKzNxQJ7hxk/edit?usp=sharing

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-23 Thread Arjun Salyan via macports-dev
On Sat, Mar 23, 2019 at 7:58 PM Craig Treleaven 
wrote:

> See:
>

> http://stats.macports.neverpanic.de/os_statistics#os_platform
>
> It says all 239 reported platforms are Darwin.  So this appears to be the
> conglomeration of all reporting over the past several years.  This explains
> why the charts for OS X Version and MacPorts Version contain so many old
> versions.  All versions ever reported are being added together--which is
> useless.
>

Does this imply that the current system is reporting all os x/ macports
versions a unique user ever had?


> Note that the port ‘mpstats’ must be installed in order to report.  Thus,
> it MUST be the “top port” for the month, every month.  Not helpful
> information.
>

Also, since we will be already reporting the number of users who are
submitting reports, it does not make sense to include mpstats in the top
installations table.


> The top list includes items like libffi, gettext and expat.  Generally,
> these are installed as dependencies of over things that users have actually
> chosen to install.  However, we don’t capture whether a user “Requested” a
> port or not.  I would really be interested in a list of top Requested ports.
>
mpstats reports whether a port was requested or not, so it would be easy to
display stats for only requested ports.

I think good installation stats could help us understand our users and how
> they are using MacPorts.  I can’t recall if we ever had a design document
> that identified the sorts of information we wanted to capture and report.
>

I will try to come up with an initial design of how and what can be
reported and then we could brainstorm to reach somewhere.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-23 Thread Arjun Salyan via macports-dev
On Sat, Mar 23, 2019 at 3:15 PM Mojca Miklavec  wrote:

> I would use the first definition: number of users currently having the
> port installed. It might be pretty common to have to reinstall the
> same port multiple times (maybe just for debugging / development
> reasons) and we don't want to count the port developer 20 times. If
> the user uninstalled the port, it's equivalent to me as never having
> it installed in the first place.
>

Thanks. But in that case what would be considered as number of
installations in a particular month? Suppose, the first weekly submission
contains port P in active_ports, but during second submission(in the same
month), the port is uninstalled.

One way would be to have it consider the number of users having it in
active ports on the last day of the month or on 15th.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-23 Thread Arjun Salyan via macports-dev
Hi,
I am working on the design of tables for installation statistics. I have a
doubt here:

Suppose there is a port P. Now for number of installations of P, there are
many definitions I am having in my mind:

   1. Number of users currently having P in active_ports/ inactive_ports.
   [ACTIVE INSTALLATIONS]
   2. Number of users for which P ever appeared in active/ inactive ports,
   no matter if it is there at this point of time or not. [TOTAL
   INSTALLATIONS- counted only once per user]
   3. If any particular user installs P two times, then count that as two
   different installations. [TOTAL INSTALLATIONS]

Which one would make more sense? Maybe we can have two fields- "Total
Installations" (definition 2) and "Active Installations" (definition 1)? Or
just one?

Thank You
Arjun

>


Re: GSoC 2019 [Collect build statistics]

2019-03-21 Thread Arjun Salyan via macports-dev
On Thu, Mar 21, 2019 at 8:35 PM Umesh Singla 
wrote:

> a) We have seen a quick demo of this already. However, the major part I
> think is missing is the search. We can brainstorm over the details like
> search-as-you-type, adding new ports etc according to the timeline. Not
> sure how much browsing by the first letter helps.
>

Search-as-you-type would be good, and can be further supported with: "New
Ports", "Related Ports" on port detail, "Popular Ports"- overall and for
each category.


> b) You cannot have a class Ports when it represents a Port:
> https://github.com/arjunsalyan/MacPorts-Demo-App/blob/master/ports/models.py#L6
> .
>

Yes, there are some major code improvements to be done. I will finish these
shortly.


> 2. build statistics:
>
> a) In Time Elapsed of builds, it would be incorrect to show time taken for
> only one of the build stages. Example, in the case of
> https://build.macports.org/builders/ports-10.12_x86_64-builder/builds/87301,
> total-time-taken (12 min 59 secs) is right to be shown and not "6 mins 45
> secs".
>
Thanks for picking this out.


> b) There are some things which seem hard-coded to me. I
> see '10.14_x86_64', '10.13_x86_64' at multiple places - in port detail
> view, build to database view and jinja templates. It's time to define some
> constants config file now. For build statuses as well. With a new release
> of macOS, we do not want to have to change multiple files in code. In this
> project, it is important that a part which works, it is accurate and
> complete.
>

What I have planned is to have a separate table of builders with relations
to the build history table. Any upcoming versions can then easily be added
to the table. Since, I wasn't fetching many logs from

c) Also, as Mojca mentioned, errors like these:
> http://frozen-falls-98471.herokuapp.com/ports/database/ should not be
> exposed. What is it intended to do anyway?
>

Initially, I used this to parse build history into the database. But now I
am using a separate script- just forgot to remove this. Sorry. As for
errors we will be throwing custom 404 for doesnotexist exceptions

3. installation statistics:
>
Thank you, I will look into this.


> As Mojca said, I am not seeing any way to provide code review on Github
> when it's already merged. Since you have the base application ready, it's
> time to use PRs. I would also advise starting to follow at least some of
> the PEP8 style guide conventions, it's good to follow clean code practices
> from the beginning. You can either use coala lint or pylint before pushing
> the code, if familiar.
>

 I can submit PRs to the temporary repository Mojca mentioned about once it
is available. We can then have a very fresh start. I will make the initial
commit after improving the code based on the suggestions.

Thanks
Arjun


Re: GSoC 2019 [Collect build statistics]

2019-03-21 Thread Arjun Salyan via macports-dev
Hi, I have created the pull request. The new output is shown below:


{

   * "variants" : ["debug"],*

"depends_build" :
["path:bin/cmake:cmake","port:pkgconfig","path:share/ECM/cmake/ECMConfig.cmake:kde-extra-cmake-modules"],

"portdir" : "audio\/phonon-backend-vlc",

"depends_fetch" : "bin:git:git",

"description" : "VLC backend for Phonon",

"homepage" : "http:\/\/projects.kde.org
\/projects\/kdesupport\/phonon\/phonon-vlc",

"epoch" : "0",

"platforms" : "darwin",

"name" : "phonon-backend-vlc-qt5",

*"depends_lib" :
["path:lib/libvlc.dylib:libVLC","port:phonon-qt5","path:lib/pkgconfig/Qt5Core.pc:qt5-qtbase"],*

   * "openmaintainer" : True,*

"license" : "{LGPL-2.1 LGPL-3}",

"long_description" : "A VLC backend for the Phonon4Qt5 multimedia
library.",

*"maintainers" : [{*

*"email" : {"domain":"gmail.com
","name":"rjvbertin"},*

*"github" : "RJVB"*

*}],*

"categories" : ["audio","kde","kf5"],

"version" : "0.9.0.7",

"revision" : "0"

},

On Thu, Mar 21, 2019 at 6:18 PM Mojca Miklavec  wrote:

> That we have a bug. Please report all such instances that you find (or
> submit a PR to macports-ports, removing "nomaintainer").
>

I have skipped the "closedmaintainer" key until we fix this bug. Or shall I
implement it?

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-21 Thread Arjun Salyan via macports-dev
Thanks, it is clear now. I will do the changes and submit the PR.


Re: GSoC 2019 [Collect build statistics]

2019-03-21 Thread Arjun Salyan via macports-dev
On Thu, Mar 21, 2019 at 5:42 PM Mojca Miklavec  wrote:


> Just create an empty list of maintainers.
>
There are some ports which have : *{ryandesign @ryandesign} nomaintainer} *as
the output of maintainers . What does "nomaintainer" mean here?


> We also need to add emails. Maybe something like
> "email" : { "name" : "ryandesign", "domain" : "macports.org" },
> "github": "ryandesign"
>
>
Suppose current output is this: {something @someotherthing}. So, here
'something' is the 'name'? And that name followed by @macports.org would
give the email?

And for some ports the maintainers output is like: {gmail.com:name @gname},
so there the email would be n...@gmail.com instead of n...@macports.org ?


> Alternative would be to treat users with commit rights in a different
> way (domain is always macports), but I don't see any reason to do so.
>

How do I know that the user has commit rights?

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-21 Thread Arjun Salyan via macports-dev
On Thu, Mar 21, 2019 at 10:20 AM Mojca Miklavec  wrote:

> (2) You made a simple PR last time to fix portindex2json for a more
> reasonable output of categories. Would you be willing for a tiny bit
> more difficult task and try to improve the output for maintainers as
> well? We would want a list of all maintainers with two optional keys
> for each (email &  github handle) plus a boolean value to tell whether
> the port is under openmaintainer policy.
>

Hi, I was working on this. What do I do with "nomaintainer" ? For now I am
getting the following output.

{

"variants" : "clang33 clang34 clang37 clang38 clang39 clang40
clang50 clang60 clang70 mpich mpich_devel openmpi openmpi_devel python26
python27 python33 python34 python35 python36 python37 debug no_static
no_single regex_match_extra universal",

"subports" : "boost-numpy",

"portdir"  : "devel\/boost",

"description"  : "Collection of portable C++ source libraries",

"homepage" : "http:\/\/www.boost.org",

"epoch": "0",

"platforms": "darwin",

"name" : "boost",

"depends_lib"  : "port:zlib port:expat port:bzip2 port:libiconv
port:icu port:python27",

*"openmaintainer"   : True,*

"long_description" : "Boost provides free portable peer-reviewed C++
libraries. The emphasis is on portable libraries which work well with the
C++ Standard Library.",

"license"  : "Boost-1",

*"maintainers"  : [{*

*"github" : "https://www.github.com/ryandesign
"*

*},{*

*"github" : "https://www.github.com/michaelld
"*

*}],*

"categories"   : [devel],

"version"  : "1.66.0",

"revision" : "3"

}


Re: GSoC 2019 [Collect build statistics]

2019-03-20 Thread Arjun Salyan via macports-dev
Hi Mojca,
Thanks for the detailed reply.

Changes can be seen for this port:
http://frozen-falls-98471.herokuapp.com/ports/qt5-qtlocation/


On Wed, Mar 20, 2019 at 6:40 AM Mojca Miklavec  wrote:

> This is super useful. But I would probably link directly to the
> Portfile rather than the directory. Most ports don't have any patches,
> but if they do, one can easily browse one level higher once on the
> GitHub website.
> I would probably make those link (in particular the homepage link)
> open in a new window.
>

Thanks, I have made both the changes.

- The entries are not unique as they should be. You seem to have two
> entries for the same build (26315) for example.

- Sorting for 10.13 should be in reverse order (newest builds on top)
>

Fixed both.


> - I'm more interested in duration than end time. (Not sure if it's
> more useful to have start or stop time, but one is sufficient. The
> other one would be duration of the build.)
>

I have removed 'Stop Time' and added 'Time Elapsed'


> The missing table (no urgency) would then be more similar to this one:
> 10.13 || 10.14
> OK [link to 52248] || OK [link to 26315]
>

Implemented this.

- In BuildHistory the port_name should hold a foreign key to the port
> id rather than just holding a string with port's name (I guess that's
> many-to-one relationship in Django?).
>

Yes, but right now I can achieve this only when I have all the ports in my
aws database.


> - a useful addition would be information about commit's shasum which
> triggered this change (but that might be tricky to extract in a proper
> way)
>
Thanks, I shall give it a try.

I have also changed the script (parse build history) to detect new builds,
by comparing the last build id in the database with that on the buildbot.
It then receives only the new builds from the buildbot. I am not sure how
efficient this method be, or even if this is the right way of doing it. Now
either we can run this script at some definite interval or modify buildbot
to instruct when the script would run.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-19 Thread Arjun Salyan via macports-dev
I have some more improvements to demo app:

   - *Build History is now Dynamic: *By Making some minor tweaks to the
   python script sent by Mojca, I was able to load build history from buildbot
   into the database. I loaded only few recent logs for "*10.14_x86_64*" & "
   *10.13_x86_64*". Since, build history of all ports is not yet on the
   database, so it would not appear on port-detail page for all ports. To see
   it working, gmsh would be a fine example:
   https://frozen-falls-98471.herokuapp.com/ports/gmsh/ . It is not very
   neat yet, the os filter is 'just working'. But now we have a good starting
   point to improve upon.
   - *Link to Github.*

I am not very sure if the representation of build history is on the right
track.

Thank You

>


Re: GSoC 2019 [Collect build statistics]

2019-03-18 Thread Arjun Salyan via macports-dev
On Mon, 18 Mar 2019 at 10:49 PM, Mojca Miklavec  wrote:

> And in fact I'm unable to find any indices in your DB model.


Thanks, I shall add this. I am dealing with this huge data set for the
first time.

Also, TextField might be suitable for description etc, but for short
> entries like port name, this probably offers suboptimal performance
> and CharField would make more sense. I did not time it though, and
> this is not the bottleneck in your code, but the indices are
> definitely critical for perfomance.


Yes, I have finalised the ports table now and hence, I shall change the
field types accordingly as to which one is the most suitable for the data
type in that column.

Also, I have terminated the process of populating the database- my internet
today and the free tier both are making it really difficult. I was able to
load the entire database within seconds locally.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-18 Thread Arjun Salyan via macports-dev
Some improvements to the Demo App: https://frozen-falls-98471.herokuapp.com

   - All Ports and All Categories are now available (Although not all ports
   have populated yet, I am on AWS Free Tier and the process is really slow.
   At the time of drafting this email: around 500 have populated).
   - On the Port-Detail page, the categories are now clickable and lead to
   the list of ports under that category.

I was able to parse the entire PortIndex.json using a python script and
successfully converted it to Django fixtures which could then be populated
to the database. (I used the portindex.json outputted by current version of
portindex2json.tcl and fixed the issues with categories using same python
script)

Parse.py :
https://github.com/arjunsalyan/MacPorts-Demo-App/blob/master/MacPorts/parse.py

>


Re: GSoC 2019 [Collect build statistics]

2019-03-17 Thread Arjun Salyan via macports-dev
On Sun, Mar 17, 2019 at 1:31 AM Joshua Root  wrote:

> It would be a good idea to check if they have any changes on their end
> that we're missing, too.
>

They have made one change: "Make portindex2json.tcl always work with utf-8,
insensitive to local settings".

Should we incorporate this change also? How do I indicate that this is a
new version, as suggested by Craig.
I shall then proceed with the pull request.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-16 Thread Arjun Salyan via macports-dev
I have tried to make some changes in portindex2json.tcl so that the value
of categories is outputted as a list and not just a string.

Can someone please review if it seems fine:
https://github.com/arjunsalyan/Test/blob/master/portindex2json.tcl

Sample Output (new):

{

"variants" : "universal",

"portdir"  : "aqua\/AppKiDo",

"description"  : "AppKiDo is an API documentation browser for Cocoa
programmers",

"homepage" : "http:\/\/appkido.com\/",

"epoch": "0",

"platforms": "darwin",

"name" : "AppKiDo",

"license"  : "MIT",

"maintainers"  : "nomaintainer",

"long_description" : "AppKiDo is a free reference tool for Cocoa
Objective-C programmers. It parses the header files and HTML documentation
files provided by Developer Tools and presents the results in a powerful
interface.",

"version"  : "0.997",

"categories"   : [aqua,devel],

"revision" : "0"

},



Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-16 Thread Arjun Salyan via macports-dev
On Sat, 16 Mar 2019 at 7:51 AM, Mojca Miklavec  wrote:

> JFYI:
>
> It might theoretically be a valid situation to have two entries with the
> same name (software gets deleted, then two years later an unrelated
> software with the same name gets added; but only one entry would have the
> status "active" and the others not). A lot more common case would be that a
> port gets renamed, even if just by changing the capitalisation. Or the
> version or -devel suffix gets attached to the end, then removed again ...
>

We will always have the option to identify ports with primary keys in the
database. But since, only one entry will have active status at any given
time, so we can go for two filters- first the port name obatined from the
url and whether it is active or not.

The parse-builbot-logs.py script worked perfectly fine. Thank You so much.
>From last year’s email archive I do got an understanding that we will have
to store these logs in the database and not fetch them from buildbot
everytime. And also, device a method to fetch logs at regular interval and
load them into the database.


Now that I have idea (just a starting point)  of these three things:
1. Using portindex- getting info of all the ports.
2. Using mpstats- submitting stats to the new django app.
3. from builbot- getting history of builds.


I am framing the tasks I need to take upon in the upcoming week, before I
actually frame my first proposal. (Please correct me if I am not heading in
the right direction)

- improving existing functionalities of the demo app and making it more and
more dynamic.
-Trying to modify the tools or methods used to obtain the data
(portindex2json etc.)
- Working on more functionalities and get the starting point for them just
like the three mentioned above (like build reproducibility)

Thank You

>


Re: GSoC 2019 [Collect build statistics]

2019-03-13 Thread Arjun Salyan via macports-dev
As suggested, I have made an attempt at a basic demo app:
https://frozen-falls-98471.herokuapp.com

Please review it and let me know if this seems fine. After applying any
further inputs, I shall proceed with the documentation to setting it up.

It is not completely static. The port information is fetched from database
and is not hard-coded. What I did was to clone macports-ports, and used
portindex2json.tcl to populate the database. However, the format of the
json outputted by portindex2json could not be directly loaded into the
database. I had to manually make it in the format of fixtures accepted by
Django (and that is why only few ports are available), which means that
portindex2json would also require some modification.

The build history is hard-coded. Also I need suggestions here- what info is
most important to be displayed on the port info page. For now, I could only
figure out to show build history for different os. Am I doing it correctly?

The installation statistics, which currently run independently would also
be integrated with port-info in the new app.

Also, I have used bootstrap for this demo app, is that fine or we need to
go with something more powerful like angular or react.

Thank You.


Re: GSoC 2019 [Collect build statistics]

2019-03-08 Thread Arjun Salyan via macports-dev
Thank you Mojca.

The provided references have cleared a lot of my doubts and I am really
interested to do this project: 'Collect build statistics'

Here is what I have understood so far:
*1.* dynamic page for each port displaying basic information (description,
version etc.), installation stats, build history etc.
*2.* From suggested ideas, I found the following to be added to each page:

   - whether the current version of port built on each particular OS/arch
   - when was the last time the port built on that OS/arch
   - links to all builds
   - list of installed files, differences in installed files on different
   OS versions
   - perhaps include some basic functionality to allow checking for build
   reproducibility
   - what is the latest version of port (in case it's already outdated)

I do not understand : "perhaps include some basic functionality to allow
checking for build reproducibility".

*3.* I would further want to take up the task of migrating a redesigned
website (or some components) into the same Django* app.

Please help me with that 'build reproducibility' point and also how do I
plan from here (I know Django, but I am still learning about MacPorts)?

*I haven't finalised Django yet, but it seems to be the most suitable one.

Thank You.



On Thu, Mar 7, 2019 at 2:22 AM Mojca Miklavec  wrote:

> Dear Arjun,
>
> Welcome to MacPorts!
>
> On Wed, 6 Mar 2019 at 21:05, Arjun Salyan via macports-dev wrote:
> >
> > Hello,
> > I am Arjun Salyan, a GSoC'19 aspirant. I have familiarised myself with
> MacPorts, but still there is a way to go on with the documentation, and
> learning tcl.
> >
> > I am quite experienced in web and app development- with multiple
> Python(Django) and PHP projects having worked for a company. I am looking
> to club my previous skillset with all what I am learning right now about
> MacPorts.
> >
> > In suggested ideas and from mails, I do see 'a Django App', 'Django App
> to collect statistics'. I can work on these ideas further-
>
> If you are interested in this idea, you should check
> https://github.com/macports/macports-webapp/tree/master/docs
> as well as look at the archives of this mailing list from the last
> summer (there might have been about a hundred emails about this
> particular topic).
>
> A student abandoned that GSOC project last year, but there is a lot of
> useful information and explanations available.
>
> > and in addition I can also take up tasks like improving/ redesigning the
> website, documentation and 'Available Ports' page can be made much more
> interactive and easy to find ports.
>
> Available ports would by design be part of the django app, and design
> is well related. It's of course more important to have a working app
> et the end of the summer with ugly design than perfect design and
> defunct app :), but doing a good design for the app could be the first
> step towards a better website. I leave it up to you to come up with
> suggestions about design if you would like to contribute in that area
> (I cannot be of any help there :).
>
> > Will planning a proposal on such ideas be good, or I shall work more
> towards the macports_base?
>
> You should work on whatever you are more passionate about. It makes a
> lot less sense to be forced to work on something you are not so
> interested in, since that would be less fun and lead to worse results.
> This applies to both selecting the most suitable org, as well as to
> selecting a suitable task within that org.
>
> (Our only limitation is that we don't want two students to work on the
> same project. It is still too early to know what others might be
> interested in, but you should be able to figure that out also by
> following the mailing list conversations.)
>
> > Any suggestions on how independent apps can be used to contribute to
> macports would be very helpful.
>
> What exactly is your question?
>
> In case of Django app, I would be co-mentoring, I just wanted to
> mention that I'll be travelling until the 24th, so I won't have access
> to internet connection (or rather: a comfortable keyboard to type on)
> on regular basis, but I believe that others should be able to answer
> your requests in the meantime.
>
> Mojca
>


GSoC 2019

2019-03-06 Thread Arjun Salyan via macports-dev
Hello,
I am Arjun Salyan, a GSoC'19 aspirant. I have familiarised myself with
MacPorts, but still there is a way to go on with the documentation, and
learning tcl.

I am quite experienced in web and app development- with multiple
Python(Django) and PHP projects having worked for a company. I am looking
to club my previous skillset with all what I am learning right now about
MacPorts.

In suggested ideas and from mails, I do see 'a Django App', 'Django App to
collect statistics'. I can work on these ideas further- and in addition
I can also take up tasks like improving/ redesigning the website,
documentation and 'Available Ports' page can be made much more interactive
and easy to find ports.

*Will planning a proposal on such ideas be good, or I shall work more
towards the macports_base?*
Any suggestions on how independent apps can be used to contribute to
macports would be very helpful.

Thank You.