Re: GSoC 2019 [Collect build statistics]

2019-05-17 Thread Umesh Singla
Nice. Makes testing a lot easier now. It would be great if you could push
the Dockerfile as well.

Also, I am not liking conversing on these multiple broken threads very
much. I'll try to setup Matrix (have absolutely no idea about it yet) today
so that we can compare gitter and matrix and move ahead.

Umesh

On Sat, May 18, 2019 at 2:19 AM Mojca Miklavec  wrote:

> On Fri, 17 May 2019 at 20:00, Arjun Salyan wrote:
> >
> > Hi Mojca and Umesh,
> >
> > I have now spent some time with docker. I have also dockerised the web
> app.
>
> Wonderful news!
>
> Let us know where we could fetch it from, so that we can test it and
> one of the infra team can put it on the server.
>
> I need to ship something until Monday. I'll be somewhat more relaxed
> after that. (PS: Clemens, do you happen to go to the automotive fair
> in Stuttgart next week?)
>
> Mojca
>


Re: GSoC 2019 [Collect build statistics]

2019-05-14 Thread Mojca Miklavec
Dear Arjun,

I'm CC-ing the list as that's potentially interesting to others as well.

On Tue, 14 May 2019 at 19:45, Arjun Salyan wrote:

> Summary of today's discussion:
>
> 1. *Repository: *The existing repository for the web app would be moved
> from the staging area to the official MacPorts organisations's
> repositories. And then we will setup issues and milestones (after making
> changes to the timeline, if needed). The previous macports-webapp
> repository (from last year) can be key separate to the new one by making
> some name changes.
>
> 2. *Docker Containers: *The infra team has suggested that for initial
> deployment of the app we make a docker container so that our app would not
> pose any security issues to rest of the server.
>
> 3. *mpstats: *We can add a new port (mpstats 2.0 or some other name) to
> submit statistics to our app, so that when we start working on stats part-
> we will have some data to test the setup. The newer version would drop
> submission of "inactive ports" and "gcc".
>
> 4. *Buildbot*: I will be learning buildbot more thoroughly.
>

Apart from
https://github.com/macports/macports-infrastructure/tree/master/buildbot
see also
https://github.com/rajdeepbharati/macports-buildbot/

Students with similar projects are certainly encouraged to follow (and
provide feedback) to the other similar projects.


> 5. *Database Design: *It would be better to store all weekly submissions
> instead of just the latest one in the base table (the one which is used to
> populate all other tables).
>
> 6. *Variants: *A low priority task. We could pickup default variant for
> any port from our database of port info.
>
> I hope I haven't missed anything important.
>
> I have set up the app to receive submissions from mpstats at:
> https://frozen-falls-98471.herokuapp.com/statistics/submit/ . I just
> replaced the url in mpstats and it is working correctly.
>
> Thank You
>

I forgot to mention another super important point. You should use this time
to get familiar with writing:

   - *Unit tests*: You already created an empty file, but it's about high
   time to populate it with some real tests, like preparing a very simple set
   of portfiles, generate json, enter them to database and check whether all
   the tables are exactly as we want them to be (in particular with respect to
   authors, categories, ...), and then some checks for subsequent port
   updates, ... Those tests could also run on Travis (or some other service).

Ad:

   - *Database design*: prepare the design (for statistics submissions in
   particular). My suggestion was for the port installation statistics to
   store something like (submission_id, user_uuid, timestamp, os_version, ...)
   in one table, and something like (submission_id, port, version,
   is_requested, ) to another table, make a view
   joining the two, and then be able to answer queries like "how many distinct
   users had port 'python27' installed between 2019-05-01 and 2019-05-31" with
   some advanced sql statements (I hope they are still supported by Django),
   for arbitrary dates etc. Results could of course later be cached for faster
   rendering. This might need some playing around.

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-04-16 Thread Mojca Miklavec
On Fri, 12 Apr 2019 at 19:50, Arjun Salyan wrote:
> On Fri, Apr 12, 2019 at 3:03 AM Mojca Miklavec wrote:
>
>> One thing that "urgently" needs to be done is to obfuscate the email
>>
>> (if written at all). I'm not even sure whether we actually want the
>> email being displayed there. We need it to send automated emails from
>> the buildbot in case some failure happens, or to occasionally contact
>> the maintainer directly. But exposing that info on the website might
>> be too much.
>
> I had fixed it as soon as I saw the email, but couldn't reply that time.

Thank you.

>> I still need to check the code: what's your current strategy for
>> showing links to the tickets?
>> At some point we could differentiate different types of tickets (for
>> example mark bugs separately).
>
> Sorry to disappoint you here, but right now I am using web scrapping to do 
> this. I am looking for a plugin that could add public api feature to track 
> tickets, but maybe it doesn't exist right now. We can make our own for sure.

OK, thank you.

Maybe something like
https://trac-hacks.org/wiki/XmlRpcPlugin
could help. But I wouldn't concentrate on this right now. The existing
implementation does the job. Not in an efficient way, maybe not super
reliably, but there are other things with higher priority to do first.
This could be addressed after the other important stuff has been
implemented.

> I have added it to maintainer-detail and port-detail. But it still needs some 
> work on its position on the page.

Awesome. The search inside maintainer and category are really helpful.

At some point (but not now) we could add search through descriptions
and long descriptions as well, and to some extent group subports (some
experiments would be needed to figure out how to present that data in
a nice and useful way).

Regarding maintainers I left some notes in the PR.

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-04-15 Thread Arjun Salyan via macports-dev
Hi Mojca,

On Mon, Apr 15, 2019 at 9:46 PM Mojca Miklavec  wrote:

> Given the current state of the app with sufficient complexity, I
> believe that it would be wise to introduce some unit tests to be able
> to extensively test what happens with data you import, and to prevent
> / detect any breakages in the future.
>

Thank you. Since, I am currently working on parsing of maintainers I began
testing from maintainers only. It helped me make significant improvements
to the code which extracted the maintainers ( added to the pull request  :
https://github.com/macports-gsoc/macports-gsoc-2019-webapp/pull/1 ).
[update: this file has further changed since I updated the pull request,
logic remains the same, just the JSON object structure has changed]

I ran the tests and got desired results. I will show the final code and
results in around 24 hours after I get done with my viva voce and extra
classes, but below I am discussing the approach. Sorry, if this is not the
right way or the presentation is not fine.

I created five ports:

   1. portA maintainers {@github gmail.com:test1}
   2. portB maintainers {@github gmail.com:test2}SAME GITHUB,
   DIFFERENT EMAIL
   3. portC maintainers {@newgithub gmail.com:test2}SAME EMAIL,
   DIFFERENT GITHUB
   4. portD maintainers {gmail.com:test2}EMAIL REPEATED WITHOUT
   GITHUB
   5. portE maintainers {@github}GITHUB REPEATED WITHOUT EMAIL

I received 3 unique Github and Email pairs (according to the Logic[1] ) and
I am considering each as a different maintainer.
[
{
"github": "github",
"name": "test1",
"domain": "gmail.com"
},
{
"github": "github",
"name": "test2",
"domain": "gmail.com"
},
{
"name": "test2",
"domain": "gmail.com",
"github": "newgithub"
}
]

Now to each maintainer I added all those ports which had GitHub or Email or
both same as that of the unique maintainer.

[
{
"model": "ports.Maintainer",
"pk": 0,
"fields": {
"github": "github",
"name": "test1",
"domain": "gmail.com",
"ports": [
[
"portA",
"portB",
"portD"
]
}
},
{
"model": "ports.Maintainer",
"pk": 1,
"fields": {
"github": "github",
"name": "test2",
"domain": "gmail.com",
"ports": [
[
"portA",
"portB",
"portD",
"portC"
"portE"
]
}
},
{
"model": "ports.Maintainer",
"pk": 2,
"fields": {
"name": "test2",
"domain": "gmail.com",
"github": "newgithub",
"ports": [
[
"portE",
"portB",
"portC"
   ]
}
}
]


 For querying we can now use email/ GitHub and show all the ports for all
the maintainers received.

This should not break because of any inconsistency in the maintainer
details. But there is one disadvantage- On the port-detail page, we will
now show x maintainers, if the same maintainer provided x different pairs
of GitHub and email. However this disadvantage might prove to be helpful in
getting rid of the inconsistencies.

Thank You

[1]
Currently I am using the following Logic for adding maintainers (comparing
with already parsed maintainers) :

   - If neither the email nor GitHub is repeated: CREATE NEW
   - If the email and GitHub both are repeated: SKIP
   - If the email is repeated and not the GitHub handle (provided) : CREATE
   NEW with inconsistency flag
   - If the GitHub handle is repeated and not the email address (provided)
   : CREATE NEW with inconsistency flag
   - If the Github handle is repeated and email is not provided: SKIP
   - If the email address is repeated and GitHub is not provided: SKIP


Re: GSoC 2019 [Collect build statistics]

2019-04-15 Thread Mojca Miklavec
Dear Arjun,

I'll answer individual questions & provide further suggestion separately.

Given the current state of the app with sufficient complexity, I
believe that it would be wise to introduce some unit tests to be able
to extensively test what happens with data you import, and to prevent
/ detect any breakages in the future.

Some example of a unit test: prepare a small number (for example four)
super simple ports (just enough to make them valid, they could be
simply called portA, portB, portC, portD), add various made-up
maintainers; you may use one github handle with different or missing
email addresses in different ports, or one email address with various
or missing github handles ... Then run portindex, portindex2json,
store the json file, let the unit test import the data into an empty
database, and verify that the database contains exactly the entries
you wanted there. (OK, for the first unit test just one port with
simple valid data. Then something more complicated.)

You could also try to check if you can manage to "dockerize" the
solution (lower priority).

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-04-12 Thread Arjun Salyan via macports-dev
Hi Mojca,

On Fri, Apr 12, 2019 at 3:03 AM Mojca Miklavec  wrote:

> Awesome!


Thank You.

One thing that "urgently" needs to be done is to obfuscate the email
>
(if written at all). I'm not even sure whether we actually want the
> email being displayed there. We need it to send automated emails from
> the buildbot in case some failure happens, or to occasionally contact
> the maintainer directly. But exposing that info on the website might
> be too much.
>

I had fixed it as soon as I saw the email, but couldn't reply that time.

As far as accessing the information is concerned: at the moment you
> use email as unique identifier. I would probably use github handle by
> default / as the main entry point. I think that '@' is an allowed
> character in URL. If so, we could use
> /maintainer/@ryandesign
> to access the same page


There was a big error in my parsing script due to which the GitHub handles
of a lot of maintainers were not being parsed. And hence, I went with
email. But still, some 300 maintainers have not provided GitHub handles.


> Alternatively we could allow macports handles (for those with
> @macports.org email addresses), so
> /maintainer/ryandesign
> could work just as well.
>

Yes, this would be good.

For maintainers with a super long list of ports, or, for
> non-maintained ports in particular, we might eventually need a way to
> shorten that list (have multiple pages) or provide a similar search
> functionality as for global ports, except that here it would be
> limited to the ports by that particular maintainer.

I would put that list of ports in a table and add version & short
> description.
>

Yes, I will add pagination and show more details for both list of ports on
maintainer's page and on category page. The filter would also be amazing. I
will do this.


> I still need to check the code: what's your current strategy for
> showing links to the tickets?
> At some point we could differentiate different types of tickets (for
> example mark bugs separately).
>

Sorry to disappoint you here, but right now I am using web scrapping to do
this. I am looking for a plugin that could add public api feature to track
tickets, but maybe it doesn't exist right now. We can make our own for sure.


> One minor suggestion. I really like the "search for port" field. Could
> this be added to every page? There is "ports" in the top right corner,
> but that one is a lot less useful in itself (not saying that it should
> go, just suggesting search on each page).
>

I have added it to maintainer-detail and port-detail. But it still needs
some work on its position on the page.


> Here's what I would do, but feel free to propose an alternative and/or
> discuss further. (Actually, I have two slightly different ideas for
> implementation in my mind, I'll describe one of them first.)
>
> For the maintainers you could declare a unique keyword over the
> combination of github handle + email. Every maintainer of every port
> has a uniquely specified pair (github + email) when you import it to
> the database (neither github handles not emails would be unique on its
> own). Note that you still have index specified on both columns
> (separately on each, but the uniqueness is only defined on combination
> of the two).
>
> When you read the port, you check the pair (@github, email). If the
> pair already exists in the database, you enter the (port, author) pair
> into the database of maintainerships. If it doesn't exist, you create
> it first, and then assign the maintainership to the port. (Note that
> whenever you are updating the port, you also need to check if you need
> to remove some maintainers from that port.)
>
> When you display a particular maintainer, say @somerandomgithubhandle,
> you run a query and if you hit more that one entry with that github
> handle in the database:
>

This would solve the problem of multiple emails with same GitHub handle.
But there are cases when the maintainer has provided 'GitHub and email'
both for one port and just 'email' for other port. Sorry If I am missing
something.

Example:
for libsmf {ryandesign openmaintainer}
for penal-soft {{ryandesign @ryandesign} openmaintainer}

And while many haven't provided GitHub handles, some haven't provided
emails.

You could also have a separate page which runs different queries and
> looks for all maintainers with inconsistencies (that's for later). It
> would generally be helpful to have a collection of such pages for
> different things, like: all broken builds on buildbot, all outdated
> ports, ...
>

This for sure would be very good to add into the app once everything is
ready.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-04-11 Thread Mojca Miklavec
Dear Arjun,

Thanks a lot for the update.

On Thu, 11 Apr 2019 at 16:15, Arjun Salyan wrote:
>
> I have added maintainer views and tables to the demo app.
>
> List of maintainers is clickable on the port-detail page.
> A maintainer-detail view that display info and list of maintained ports

Awesome!

I just noticed that short before you wrote the email as I tried to
find some info on the page :)

> Examples:
> maintainer-detail: 
> https://frozen-falls-98471.herokuapp.com/maintainer/ryandesign__macports.org/

Thank you.

One thing that "urgently" needs to be done is to obfuscate the email
(if written at all). I'm not even sure whether we actually want the
email being displayed there. We need it to send automated emails from
the buildbot in case some failure happens, or to occasionally contact
the maintainer directly. But exposing that info on the website might
be too much.

As far as accessing the information is concerned: at the moment you
use email as unique identifier. I would probably use github handle by
default / as the main entry point. I think that '@' is an allowed
character in URL. If so, we could use
/maintainer/@ryandesign
to access the same page.

Alternatively we could allow macports handles (for those with
@macports.org email addresses), so
/maintainer/ryandesign
could work just as well.

Email access could of course be left as it is, but I would not make it default.

I would show the github handle prefixed with the '@' character.

For maintainers with a super long list of ports, or, for
non-maintained ports in particular, we might eventually need a way to
shorten that list (have multiple pages) or provide a similar search
functionality as for global ports, except that here it would be
limited to the ports by that particular maintainer.

I would put that list of ports in a table and add version & short description.

Nice, in any case.

> port-detail: https://frozen-falls-98471.herokuapp.com/ports/faust/

Thanks a lot for adding the links to Trac tickets.

I still need to check the code: what's your current strategy for
showing links to the tickets?
At some point we could differentiate different types of tickets (for
example mark bugs separately).


One minor suggestion. I really like the "search for port" field. Could
this be added to every page? There is "ports" in the top right corner,
but that one is a lot less useful in itself (not saying that it should
go, just suggesting search on each page).

> But, while extracting 'maintainers' from the portindex, maintaining 
> uniqueness was very difficult.

This was totally expected :)

> There are a lot of inconsistencies
> - Same maintainer has provided GitHub details for one port and not for the 
> other.
> - Same maintainer has provided different email for different ports.
>
> I understand that it should be web-app's job to detect this and for now the 
> problem is mostly solved. But in future, one odd case and things can break. 
> What best can be done about this?

Here's what I would do, but feel free to propose an alternative and/or
discuss further. (Actually, I have two slightly different ideas for
implementation in my mind, I'll describe one of them first.)

For the maintainers you could declare a unique keyword over the
combination of github handle + email. Every maintainer of every port
has a uniquely specified pair (github + email) when you import it to
the database (neither github handles not emails would be unique on its
own). Note that you still have index specified on both columns
(separately on each, but the uniqueness is only defined on combination
of the two).

When you read the port, you check the pair (@github, email). If the
pair already exists in the database, you enter the (port, author) pair
into the database of maintainerships. If it doesn't exist, you create
it first, and then assign the maintainership to the port. (Note that
whenever you are updating the port, you also need to check if you need
to remove some maintainers from that port.)

When you display a particular maintainer, say @somerandomgithubhandle,
you run a query and if you hit more that one entry with that github
handle in the database:
- simply display ports from all the entries with that github handle
(you can potentially sort them according to email, or lack thereof, so
that it's easier to figure out which ports need a fix)
- display a red warning saying that there are inconsistencies /
problems with that maintainer which need to be fixed

You could also have a separate page which runs different queries and
looks for all maintainers with inconsistencies (that's for later). It
would generally be helpful to have a collection of such pages for
different things, like: all broken builds on buildbot, all outdated
ports, ...

The other idea would be to keep just one entry per "maintainer", but
then add an additional column with a flag saying that there's
something wrong with that entry. When you would do daily or weekly
updates of the full database, 

Re: GSoC 2019 [Collect build statistics]

2019-04-11 Thread Arjun Salyan via macports-dev
Hi,

I have added maintainer views and tables to the demo app.

   - List of maintainers is clickable on the port-detail page.
   - A maintainer-detail view that display info and list of maintained ports

Examples:
maintainer-detail:
https://frozen-falls-98471.herokuapp.com/maintainer/ryandesign__macports.org/

port-detail: https://frozen-falls-98471.herokuapp.com/ports/faust/


But, while extracting 'maintainers' from the portindex, maintaining
uniqueness was very difficult. There are a lot of inconsistencies-
- Same maintainer has provided GitHub details for one port and not for the
other.
- Same maintainer has provided different email for different ports.

I understand that it should be web-app's job to detect this and for now the
problem is mostly solved. But in future, one odd case and things can break.
What best can be done about this?

On Tue, Apr 9, 2019 at 3:21 AM Mojca Miklavec  wrote:

> A general suggestion from me would be to study in depth some good and
> exhaustive book on relational database design to fill in the holes.
> (There might also be some online courses.)
>

Thanks Mojca. I did some research for a detailed book and I found "An
Introduction to Database Systems" by C.J. Date (I also found it in the
library). I will let you know how it goes.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-04-08 Thread Mojca Miklavec
Dear Arjun,

On Mon, 8 Apr 2019 at 20:31, Arjun Salyan wrote:
>
> Dear MacPorts Community,
>
> I have submitted my Final Proposal. I do understand that during these last 
> hours it might not be possible to give feedback on the proposals. But if I am 
> lucky enough to get more of them, I will try to get the job done (around 23 
> hours still remaining).
>
> Google Doc: 
> https://docs.google.com/document/d/198Ivygxb2NJQz_sqzDrbDPVEYZ5Ye5Yw0LV6Bt2QmG4/edit?usp=sharing

No special feedback (we still need to review the pull requests though).

A general suggestion from me would be to study in depth some good and
exhaustive book on relational database design to fill in the holes.
(There might also be some online courses.)

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-04-08 Thread Arjun Salyan via macports-dev
Dear MacPorts Community,

I have submitted my Final Proposal. I do understand that during these last
hours it might not be possible to give feedback on the proposals. But if I
am lucky enough to get more of them, I will try to get the job done (around
23 hours still remaining).

Google Doc:
https://docs.google.com/document/d/198Ivygxb2NJQz_sqzDrbDPVEYZ5Ye5Yw0LV6Bt2QmG4/edit?usp=sharing

Thank for being so helpful!


Re: GSoC 2019 [Collect build statistics]

2019-04-07 Thread Arjun Salyan via macports-dev
>
> Hi,

On Fri, Apr 5, 2019 at 8:58 AM Umesh Singla 
> wrote:
>
>> It’s always to good to show your work and get feedback. It’s difficult to
>> comment on the quality otherwise. Please do not forget to make a PR.
>>
>
I have submitted the PR for adding an option to portindex (which would
generate a separate file with Changed Ports)

https://github.com/macports/macports-base/pull/121

I am still working on handling deletions.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-04-05 Thread Arjun Salyan via macports-dev
On Fri, Apr 5, 2019 at 8:58 AM Umesh Singla 
wrote:

>
> It’s okay to share all the project related updates on the list. In fact,
> it’s preferred that way.
>

Thanks, I was just being hesitant to write that on the list.


> Does this include additions and/or deletions?
>

Portindex uses modified time to detect changes, so it does not seem like if
it would be able to detect deletions, as Mojca said. But, I just realised
that we can make some further changes- if a port is available in the old
portindex but the port directory does not exist, then we can mark the port
as deleted.


> It’s always to good to show your work and get feedback. It’s difficult to
> comment on the quality otherwise. Please do not forget to make a PR.
>

Thank you, I will do this. I was just uselessly worried that this command
might not be very useful in general- but then it is important to get code
reviewed.

Thank you

>


Re: GSoC 2019 [Collect build statistics]

2019-04-02 Thread Arjun Salyan via macports-dev
Hi Mojca,

On Tue, Apr 2, 2019 at 3:14 AM Mojca Miklavec  wrote:

> The drawbacks may include:
> - some ports will be skipped on the builder, for various reasons (port
> is known not to build on a particular builder, it may not be
> distributable, ...)
> - the buildbot master may be down or experience problems, so data
> might go missing
>

Thanks. I will consider these factors when improving upon this.


> A strange observation from your source code: you synced portindex and
> ran the conversion, but then loaded the data from another json file?
> Am I missing something?
>

No, the conversion "tclsh portindex2json.tcl portindex" is writing to the
file "syncedportindex.json". And I am reading from the same file. I am
really sorry that I did not submit a PR and it was difficult for you to
review the code.


> There are various ways to achieve the goal. Note that if you run
> portindex yourself, it will detect which files have been updated and
> only ever touch data of those ports. The portindex command could be
> modified to only output the file with changes (when you pass some
> options to it). This will still miss deletes, but it would be an
> efficient way with almost no dependencies.
>

Does this imply that we will keep a clone of macports-contrib locally and
run a modified 'portindex' command to generate a file with only the updated
ports?


> One way would be to generate portindex yourself and always remember
> what git shasum has been used, and store that shasum to the database.
> Next time when you update, check and store the latest shasum, then ask
> git which paths have changed between the two commits, and only update
> ports whose paths match the paths reported by git as changed.
>
> It could also help if you stored a "complete" git history to the
> database (shasum, which ports changed at that point, timestamp,
> parents). Not sure if that's really so helpful, just as an option.
>
> What might be an interesting approach would be to try to squeeze the
> git shasum to the PortIndex. This could also help when submitting
> statistics as it would be easier to determine how old the database is
> / when the user last synced. (It would not work for people with their
> own modifications of the tree.) If you had the shasum in portindex,
> you could still run git independently to check for the difference.
>

These methods are not very clear to me, I haven't dealt with shasums yet. I
will discuss about them, after my research.


> Just some random ideas.


Thank you so much.

Regarding updates of builds: just ask the database about which build
> you synced last, and then sync any builds newer than that, up to the
> last one. You may need to check whether a build was complete when you
> last enquired.
>

Thanks, I am already using the same method.

Arjun


Re: GSoC 2019 [Collect build statistics]

2019-04-01 Thread Mojca Miklavec
Dear Arjun,

On Mon, 1 Apr 2019 at 18:38, Arjun Salyan
 wrote:
>
> Hi,
> I was working on keeping the PortIndex updated, and was able to achieve this:
>
> Sync Portindex from 
> 'rsync://rsync.macports.org/macports//trunk/dports/PortIndex_darwin_16_i386/PortIndex'
> Update or Add ports that were recently built on 10.14_x86_64 (using time 
> frame 'last 24 hours' for now).
> New ports, (SoapyAirspy, SoapyAirspyHF etc) were successfully added, and can 
> now be seen on the demo app.
>
> This is exactly the approach I wrote in the proposal and I wanted to show a 
> working demo, so that I can get feedback about how efficient this method is.
> The script I used: update_portindex.py . ( note: the code might not be very 
> well written, I was just looking to get things working. Also, I am only 
> updating ports built on '10.14_x86_64')

(It might have been easier to comment on pull request, but I noticed
that those commits did not make it to the pull request.)

This is an interesting way which should mostly work, just not always
and not super reliably.

The drawbacks may include:
- some ports will be skipped on the builder, for various reasons (port
is known not to build on a particular builder, it may not be
distributable, ...)
- the buildbot master may be down or experience problems, so data
might go missing

A strange observation from your source code: you synced portindex and
ran the conversion, but then loaded the data from another json file?
Am I missing something?

There are various ways to achieve the goal. Note that if you run
portindex yourself, it will detect which files have been updated and
only ever touch data of those ports. The portindex command could be
modified to only output the file with changes (when you pass some
options to it). This will still miss deletes, but it would be an
efficient way with almost no dependencies.

One way would be to generate portindex yourself and always remember
what git shasum has been used, and store that shasum to the database.
Next time when you update, check and store the latest shasum, then ask
git which paths have changed between the two commits, and only update
ports whose paths match the paths reported by git as changed.

It could also help if you stored a "complete" git history to the
database (shasum, which ports changed at that point, timestamp,
parents). Not sure if that's really so helpful, just as an option.

What might be an interesting approach would be to try to squeeze the
git shasum to the PortIndex. This could also help when submitting
statistics as it would be easier to determine how old the database is
/ when the user last synced. (It would not work for people with their
own modifications of the tree.) If you had the shasum in portindex,
you could still run git independently to check for the difference.

You could keep full portindex in git after you sync it and check the
diffs. (Not sure if it would be super trivial to figure out which
ports changed, probably not.)

Just some random ideas.


Regarding updates of builds: just ask the database about which build
you synced last, and then sync any builds newer than that, up to the
last one. You may need to check whether a build was complete when you
last enquired.

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-04-01 Thread Arjun Salyan via macports-dev
Hi,
I was working on keeping the PortIndex updated, and was able to achieve
this:

   - Sync Portindex from 'rsync://
   rsync.macports.org/macports//trunk/dports/PortIndex_darwin_16_i386/PortIndex
   '
   - Update or Add ports that were recently built on 10.14_x86_64 (using
   time frame 'last 24 hours' for now).
   - New ports, (SoapyAirspy
   ,
   SoapyAirspyHF
    etc)
   were successfully added, and can now be seen on the demo app.

This is exactly the approach I wrote in the proposal and I wanted to show a
working demo, so that I can get feedback about how efficient this method is.
The script I used: update_portindex.py

.
( note: the code might not be very well written, I was just looking to get
things working. Also, I am only updating ports built on '10.14_x86_64')

I am also scared of running a for-loop over the entire portindex to update/
add the recently built ports. Could you please take a look and comment on
how good this approach is?

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-31 Thread Arjun Salyan via macports-dev
On Sun, Mar 31, 2019 at 2:05 PM Mojca Miklavec  wrote:

> There seem to be some issues with subports. (For example any p5.28-*
> under perl would give an error page.)
>
Yes, it was happening for the ports that contain '.' in their names. I have
solved it using regex now.

Just curious: what's the order of magnitude of the time it took?
>
When the database was on the same machine, it took like 30 seconds. And
with my AWS Free Tier Database it took near 40-45 minutes.


> I now created:
> https://github.com/macports-gsoc/macports-gsoc-2019-webapp
>
> You have full commit rights there, but the idea would be to first
> populate the repository with a basic README (else you probably cannot
> clone the repository at all), then clone it, and finally create a pull
> request with the relevant changes, and not commit directly before the
> code gets reviewed.
>

Sorry that I messed it up. I have created the pull request now.

Thank you, but I don't see it in our dashboard [yet?].
>
Oh, Sorry! I thought the first step was enough. I will quickly finish the
remaining.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-31 Thread Mojca Miklavec
On Sat, 30 Mar 2019 at 21:59, Arjun Salyan
 wrote:
>
> Thanks Mojca, I will update the Demo App with the suggested changes very 
> shortly.
>
> On Sat, Mar 30, 2019 at 8:30 PM Mojca Miklavec  wrote:
>>
>> Anyway, I'm just curious: what's the current situation with
>> django/database hosting, network and other limitations etc.?
>> Now that you have the code which can both import the full database, as
>> well as parse and show the builds, it's actually a pity that there's
>> only a small fraction of ports available, and no idea which ports
>> actually show some useful build info.
>
> All ports are now available.

Great.

There seem to be some issues with subports. (For example any p5.28-*
under perl would give an error page.)

> It took quite a time, but now all ports are there on the web app.

Just curious: what's the order of magnitude of the time it took?

> It is the build history now, how many logs would you suggest I fetch? I was 
> being careful in this as you told.

I would talk to Ryan about running the script locally at the location
of the build master for the first time. (It's not super important to
do it immediately, but since the code is there, it would be nice to
see it in action.)

>> What would be really cool though is to start some actual review
>> process for the Django code, as that's where there would be a lot more
>> work, and probably more substantial comments.
>>
>> Personally I don't have permissions to create a new repository under
>> macports organisation, but as Umesh suggested, we can create a
>> temporary org somewhere, create an empty repository, and then submit a
>> pull request to that one.
>
>  Shall I create the org? Or how do we proceed?

I now created:
https://github.com/macports-gsoc/macports-gsoc-2019-webapp

You have full commit rights there, but the idea would be to first
populate the repository with a basic README (else you probably cannot
clone the repository at all), then clone it, and finally create a pull
request with the relevant changes, and not commit directly before the
code gets reviewed.

>> And since you already have a draft proposal ready, it would also make
>> sense to submit it. (The final version gets submitted later.)
>
> I have submitted it, thanks.

Thank you, but I don't see it in our dashboard [yet?].

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-30 Thread Arjun Salyan via macports-dev
Thanks Mojca, I will update the Demo App with the suggested changes very
shortly.

On Sat, Mar 30, 2019 at 8:30 PM Mojca Miklavec  wrote:

> Anyway, I'm just curious: what's the current situation with
> django/database hosting, network and other limitations etc.?
> Now that you have the code which can both import the full database, as
> well as parse and show the builds, it's actually a pity that there's
> only a small fraction of ports available, and no idea which ports
> actually show some useful build info.
>
All ports are now available. It took quite a time, but now all ports are
there on the web app.
It is the build history now, how many logs would you suggest I fetch? I was
being careful in this as you told.

What would be really cool though is to start some actual review
> process for the Django code, as that's where there would be a lot more
> work, and probably more substantial comments.
>

That will be exciting.


> Personally I don't have permissions to create a new repository under
> macports organisation, but as Umesh suggested, we can create a
> temporary org somewhere, create an empty repository, and then submit a
> pull request to that one.
>

 Shall I create the org? Or how do we proceed?

And since you already have a draft proposal ready, it would also make
> sense to submit it. (The final version gets submitted later.)
>

I have submitted it, thanks.


Re: GSoC 2019 [Collect build statistics]

2019-03-30 Thread Mojca Miklavec
On Sat, 30 Mar 2019 at 12:54, Arjun Salyan wrote:
>
> Hi,
> I have installed an ajax based search box in the demo app:
> https://frozen-falls-98471.herokuapp.com

Cool, I like it, thanks!

The next "feature request" would probably be to allow search through
descriptions as well :)

Anyway, I'm just curious: what's the current situation with
django/database hosting, network and other limitations etc.?
Now that you have the code which can both import the full database, as
well as parse and show the builds, it's actually a pity that there's
only a small fraction of ports available, and no idea which ports
actually show some useful build info.

For the initial page it would also help if there was a number of ports
attached to each category name. Now one clicks on category like
https://frozen-falls-98471.herokuapp.com/ports/category/x11-font, just
to see zero ports listed. If there was a number of ports next to
category, it would be more fun to browse.

As for category page like
https://frozen-falls-98471.herokuapp.com/ports/category/amusements
it would probably be nice if the page also listed port version and
(short) description. Possibly more info later (like a checkbox if the
port is know to build), but for now the description would already
help.

> Also, just wanted to give a polite reminder about my open PR: 
> https://github.com/macports/macports-contrib/pull/3

I'm sorry. My Tcl is not something I would be bragging about, so I
need some more time to play with it. I provided some further feedback
now, but I would like to also make the maintainers' parsing a tiny bit
cleaner. Generally the code should be ready to be merged soon, it's
already producing the desired result that you need for parsing, any
further comments are just about "stupid nitpicking" to keep the code
as clean as possible, but nothing of any high priority.

What would be really cool though is to start some actual review
process for the Django code, as that's where there would be a lot more
work, and probably more substantial comments. Putting comments next to
commits in some random repository is a bit non-trivial: difficult to
find, difficult to get an overview, difficult to forget what was fixed
and what not, difficult to see what was reviewed and what not etc.

Personally I don't have permissions to create a new repository under
macports organisation, but as Umesh suggested, we can create a
temporary org somewhere, create an empty repository, and then submit a
pull request to that one. (It would be nice to include some basic
instructions for anyone not familiar with Django at all, about how to
install the project, populate the database and run it.)

And since you already have a draft proposal ready, it would also make
sense to submit it. (The final version gets submitted later.)

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-30 Thread Arjun Salyan via macports-dev
Hi,
I have installed an ajax based search box in the demo app:
https://frozen-falls-98471.herokuapp.com

Also, just wanted to give a polite reminder about my open PR:
https://github.com/macports/macports-contrib/pull/3

Thank You

>


Re: GSoC 2019 [Collect build statistics]

2019-03-28 Thread Arjun Salyan via macports-dev
On Thu, Mar 28, 2019 at 12:05 PM Mojca Miklavec  wrote:

> What if there's a server outage?
>

Then the best way is to use HttpStatusPush to deliver instant updates, and
so that any build is not missed due to server failure, we could run our
fetching script once per day. The script can easily match if any of the
build number present in logs is absent from the database.


> (3) The database needs to be designed in such a way (and the software
> needs to be written in such a way) that frequent updates of the full
> portindex2json:
> (a) works correctly (ports missing from PortIndex are marked as
> gone, no duplicate entries of ports, all info up-to-date)
> (b) works super efficiently
> (c) works with minimal overhead
> If network speed is the bottleneck, make sure that you feed / update
> the database from the same machine where the database is running.
> Updating via git is super fast, you want to avoid transferring the
> full 20MB file over network over and over again. Even if the testing
> system is running at strange configurations, suggest the architecture
> of how it would ideally be implemented if you can design the system
> and architecture yourself.
>
For keeping an updated copy of portindex.json this seems a fine pathway:

   - Generate portindex.json file along with Portindex, i.e. run
   portindex2json.tcl on our own. [ this would also help in our discussion
   with repology ]
   - portindex.json can be stored in the same directory as PortIndex and if
   we run our web-app on a different machine [ which is the most probable case
   ] then we could keep web-app's version of portindex.json updated using
   rsync [ repology is doing the same, not sure though ].
   - Then using os.stat on web-app's version of portindex.json, we can
   continuously check the file's 'last modified' time and can hence, can
   detect if there are any changes.

Now as we have an updated copy of portindex.json, we go back to our build
history which is constantly receiving updates from the server [ without
delay, if everything is fine and with some delay in case of server outage ]
and detect which ports had been recently built, and for those ports we
would then update the database using portindex.json.
To ensure things remain in right manner, we can schedule a weekly 'complete
syncing of database and portindex.json'.



> (4) Suggest a way to minimize the data transfer, so that it will only
> include the changes rather than the full data set. How to get such
> data? What would need to be changed / improved?


rsync would do exactly this.

(5) You won't be getting port renames. What you do get is
> "replaced_by" information at best (say, perl5.26 could be replaced_by
> perl5.28). When a port is renamed, treat it as a different port, but
> the old port could be marked as "inactive" and "replaced_by  port>" (if it's not deleted yet). This information is probably not in
> PortIndex, either portindex would need to be improved, or you need to
> find a different way.
>

Okay! So the name change problem can be handled. We can have a column of
"replaced_by" in out table and as long as it is empty/ NULL -> the port is
active else it is inactive and has been replaced by a new port.

Please let me know if these approaches look fine.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-28 Thread Mojca Miklavec
On Wed, 27 Mar 2019 at 17:13, Arjun Salyan wrote:
>>
>> - A more elaborate plan about how you plan to handle updates / keep
>> the database up-to-date. Sure, we can trigger certain actions from the
>> buildbot, but those various "actions" need to be implemented. Keeping
>> the app up to date in a safe and reliable way is a very important part
>> of the project, and requires collecting data from various sources.
>> "Look for the most efficient ways to keep the PortIndex and Build
>> History up-to-date" should be already attempted now.
>
>
> We can keep the build history up-to-date by using HttpStatusPush,
> I read about it in buildbot documentation. It sends a json object containing 
> build data.

OK, write that down (see below).

> This would even remove the need of a parsing script on web-app's end which 
> fetches
> the logs from buildbot.

What if there's a server outage?

> But I am having a problem in reaching at a good method to keep PortIndex 
> updated.
> PortIndex does not give any id to each port, and suppose I assign them ids in 
> the
> database. Then if a port is renamed, it would be impossible to identify which 
> port
> was renamed because PortIndex has no idea about the ids in the database.
>
> Another problem is the size of the file- every time running 
> portindex2json.tcl over the
> generated portindex and then looking for changes does not appear to be very 
> efficient.
> Neither does the build page seem to provide any relevant info about the 
> changes.

Sure, the build page doesn't provide relevant info about the changes:
the future app should.

> Any suggestions on tackling these would be very helpful.

(1) Identify all the individual items that will need to be updated and
write them down. Fetching from PortIndex and builds are two items, but
not the only ones. We want to know which ports have been updated
upstream, which websites seem broken, and more. For each of the items
suggest how frequently it should be done (checking for updates
definitely requires less frequent updates than the build status etc.).

(2) Think about different scenarios:
- how to update as fast as the change arrives (immediately after new
commits happen or builds are done ...)
- how to properly handle cases when there was a server outage, or
there was an error while updating and "live data" went missing

(3) The database needs to be designed in such a way (and the software
needs to be written in such a way) that frequent updates of the full
portindex2json:
(a) works correctly (ports missing from PortIndex are marked as
gone, no duplicate entries of ports, all info up-to-date)
(b) works super efficiently
(c) works with minimal overhead
If network speed is the bottleneck, make sure that you feed / update
the database from the same machine where the database is running.
Updating via git is super fast, you want to avoid transferring the
full 20MB file over network over and over again. Even if the testing
system is running at strange configurations, suggest the architecture
of how it would ideally be implemented if you can design the system
and architecture yourself.

(4) Suggest a way to minimize the data transfer, so that it will only
include the changes rather than the full data set. How to get such
data? What would need to be changed / improved?

(5) You won't be getting port renames. What you do get is
"replaced_by" information at best (say, perl5.26 could be replaced_by
perl5.28). When a port is renamed, treat it as a different port, but
the old port could be marked as "inactive" and "replaced_by " (if it's not deleted yet). This information is probably not in
PortIndex, either portindex would need to be improved, or you need to
find a different way.

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-27 Thread Arjun Salyan via macports-dev
>
> - A more elaborate plan about how you plan to handle updates / keep
> the database up-to-date. Sure, we can trigger certain actions from the
> buildbot, but those various "actions" need to be implemented. Keeping
> the app up to date in a safe and reliable way is a very important part
> of the project, and requires collecting data from various sources.
> "Look for the most efficient ways to keep the PortIndex and Build
> History up-to-date" should be already attempted now.
>

We can keep the build history up-to-date by using HttpStatusPush, I read
about it in buildbot documentation
. It sends a
json object containing build data. This would even remove the need of a
parsing script on web-app's end which fetches the logs from buildbot.

But I am having a problem in reaching at a good method to keep PortIndex
updated. PortIndex does not give any id to each port, and suppose I assign
them ids in the database. Then if a port is renamed, it would be impossible
to identify which port was renamed because PortIndex has no idea about the
ids in the database.

Another problem is the size of the file- every time running
portindex2json.tcl over the generated portindex and then looking for
changes does not appear to be very efficient. Neither does the build page
 seem to
provide any relevant info about the changes.

Any suggestions on tackling these would be very helpful.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-27 Thread Arjun Salyan via macports-dev
Thank You Mojca!
I read you inline comments as well, I have already started working over the
suggestions.
I will include these very shortly!

>

> Does your new semester already start at end of July?


Yes, it does. But it won’t affect the project. I have done projects in a
semester as tightly scheduled as the one going on- and the upcoming is
hardly this packed!

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-27 Thread Mojca Miklavec
Dear Arjunt

On Tue, 26 Mar 2019 at 07:34, Arjun Salyan wrote:
>
> Dear all,
>
> Using the valuable information and suggestions by all of you here at 
> MacPorts, especially the potential mentors, I have come up with the first 
> draft of my proposal for GSoC 2019.
> Link to Google Doc: 
> https://docs.google.com/document/d/198Ivygxb2NJQz_sqzDrbDPVEYZ5Ye5Yw0LV6Bt2QmG4/edit?usp=sharing

Thank you very much. I'll keep providing some smaller details inline
(I didn't finish reviewing yet), but here are some major points.

What I would like to see in the application is mainly the following:

- Database design (you have quite a bit of that already; details would
be discussed inline)

- A plan to add an external API, so that someone else could build an
equivalent UI, for example in React/Vue.js (no need to write that UI
yourself, unless you do that instead of using jinja templating) or use
the information from the port command ("port HowManyPeopleInstalled
wget", "port IsItBrokenOnMyOs qt5" :); again no need to implement
anything in the port command, just allow some easy future use of data

- Specification of that API (which calls would be supported, with examples)

- A more elaborate plan about how you plan to handle updates / keep
the database up-to-date. Sure, we can trigger certain actions from the
buildbot, but those various "actions" need to be implemented. Keeping
the app up to date in a safe and reliable way is a very important part
of the project, and requires collecting data from various sources.
"Look for the most efficient ways to keep the PortIndex and Build
History up-to-date" should be already attempted now.

- "Screenshots" of various (all?) pages of the planned app. I don't
literally mean screenshots; it could be hand-drawn sketches, it could
be a paint sketch, it could be static HTML, it could be added as
static content to your demo app, which you would later replace with
dynamic page, ... No carefully crafted visual piece of art, just
"boxes" or fake/sample content with titles. It would serve as a list
of templates that you would fill in during the summer. Some of that
info is included in sample charts, some in the "Reporting" paragraph,
two pages are already present in your demo app, but something like a
full picture that you would attach to the contract of your client
before starting a big project after brainstorming together, so that
you both know where exactly the project is heading :)

- If you want to collect some useful statistics from users, it's
really important to enable submissions as early as possible (I would
say before the first evaluation), so that you have some data to work
with when creating different views etc.

I'll provide more feedback about statistics later.

Other less important suggestions:
- Be more bold and put more stuff under extensions goals.

Does your new semester already start at end of July?

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-27 Thread Arjun Salyan via macports-dev
Dear all,

Using the valuable information and suggestions by all of you here at
MacPorts, especially the potential mentors, I have come up with the first
draft of my proposal for GSoC 2019.
Link to Google Doc:
https://docs.google.com/document/d/198Ivygxb2NJQz_sqzDrbDPVEYZ5Ye5Yw0LV6Bt2QmG4/edit?usp=sharing

I am eager to make it further better by taking inputs from you.

Thank You


Re: Dependencies on non-default variants (was: GSoC 2019 [Collect build statistics])

2019-03-25 Thread Craig Treleaven
Mojca:

I apologize for the length of this and for continuing to hammer on this issue 
but I think this is important.  I support the idea of a modern web app to bring 
together all the relevant information for a port that potential users of 
MacPorts need in order to assess if they want to install that software.  The 
results of the last buildbot runs is obviously valuable information. We would 
expect that virtually all ports would report successful builds on all the OS 
versions that they support.  As a maintainer, I wouldn’t leave a port in a 
broken state if I could possibly help it.  

A port that requires non-default variants is not known to be broken.  It did 
not fail to build—the buildbot was unable to attempt a build.  The vast 
majority of the time, it will build and install just fine for the user.  

> On Mar 24, 2019, at 4:05 PM, Mojca Miklavec  wrote:
> 
> On Sun, 24 Mar 2019 at 19:55, Craig Treleaven wrote:
>>> On Mar 24, 2019, at 1:09 PM, Mojca Miklavec wrote:
>>> On Sun, 24 Mar 2019 at 01:06, Craig Treleaven wrote:
> […]
> I now checked the first two random MythTV ports, which basically boils down 
> to:
> 
> require_active_variants qt5-mysql-plugin mariadb55
> require_active_variants p${perl5.major}-dbd-mysql mariadb
> require_active_variants ${pymodver}-mysql mariadb55
> 
> out of which only the second one provides the wrong default variant,
> and that port doesn't have the maintainer. Neither do any of the mysql
> packages have any maintainer at the moment.
> 
> My first question is: why exactly does p5-dbd-mysql need a different
> default variant?
> 
Myth requires a database backend and I’ve chosen mariadb 5.5.  I was pushed to 
add variants for mysql and different db versions.  That would have been a 
nightmare to support.  The perl dbd-mysql modules have to know where to find 
the db socket (AIUI) and therefore need a different variant depending on which 
database/version they are connecting to.  I’ve documented the required variants 
in the cookbook instructions on the MythTV wiki.

If I change the default variant for p5-dbd-mysql to suit me, I just push the 
problem to someone else.  I have no idea how many others are using this port 
and depend on it connecting to MySQL 5.7 by default.

BTW, mythtv-core.28, for example, doesn’t support OS X 10.8 or earlier.  It 
“fails” to build on those buildbots although it actually aborts before 
attempting the build.  I’m fine with that being reported in the web app as a 
failure since users of those OS versions will be informed that they can’t 
expect it to build for them.  I don’t want users on supported OS versions to 
see the exact same failure message.  “Unable to attempt” or “Not attempted” is 
actually what happened. 

More generally, a relatively common issue on MacPorts is X11 versus Quartz. Say 
the support library (such as gtk2) defaults to one (like +x11) but your package 
really needs the other.  Changing the default messes up others.  BTW, this 
would be a compelling use of installation statistics.  If we determined that, 
say, 80% of gtk2 installs are +quartz, then it would be a no-brainer to change 
the default.

> My second question / suggestion: please try to take over maintenance
> of the packages that you depend on and turn them into a shape that
> will make them more generally usable.
> 
It is outside my skill set and beyond the time I have available for MacPorts.  
Our database ports and related accessors need someone with more knowledge of 
database admin than I have or want to have.  They are big, complex pieces of 
software and maintaining our fleet of ports is a serious commitment.  (Bradley, 
I miss you!)


> On Mar 25, 2019, at 2:56 AM, Mojca Miklavec  wrote:
> 
> Dear Craig,
> 
>> As you said, people have looked at this problem and not found a workable 
>> solution.
> 
> Personally I never did spend any effort into fixing this, partially
> probably also because it doesn't affect any ports that I use or
> maintain.
> 
I believe there is a very, very old ticket on this issue but I can’t find it 
right now.

>> It may be a _long_ time before a “proper” solution is implemented.
> 
> You don't need a 100% perfect solution, but something that works.
> 
> What buildbot setup could do, is check which variants are required for
> a particular port. It could then perhaps install
>p5.28-dbd-mysql +mariadb
> explicitly, which could at least work for direct dependencies; getting
> it to work correctly in a recursive way would be a bigger challenge.
> 
> Again, by far the easiest solution would be to fix ports in a way that
> no port requires a non-default variant to be active. You didn't yet
> answer my question: what prevents the dependencies of MythTV to change
> their default variants, so that your ports would work out of the box?
> 
See above.

>> In the meantime, reporting these as failed builds *actively misinforms* 
>> users.
> 
> This has been the case for years already.
> 
>> When you say there will be the same 

Re: GSoC 2019 [Collect build statistics]

2019-03-25 Thread Mojca Miklavec
Dear Fred,

(Resending due to the initial post from the wrong email account; sorry
for the duplicate.)

On Sun, 24 Mar 2019 at 22:57, Fred Wright wrote:
> On Sun, 24 Mar 2019, Mojca Miklavec wrote:
> > On Sat, 23 Mar 2019 at 17:49, Craig Treleaven wrote:
> >>
> >> I see no reason to report inactive ports.
> >
> > Neither do I. I would remove those as I already mentioned in an earlier 
> > email.
>
> But in the spirit of lossless collection, those should be included and
> flagged as inactive, so that what to do with them can be decided later.

Lossless collection is not about collecting our users' gender, age,
height, weight, shoe size, religion ... just in case that one might
want to study how MacPorts affects users' life, or what our
demographic is ... some time in the indefinite future.

I'm not saying that there is absolutely no use of inactive ports. What
I'm saying is that unless (or until) one can argue what we should use
that for, we should not submit that information at all.

There is a lot more other useful information that we are not
submitting at the moment, and could be useful. Like: how much time
passed since the last "port selfupdate / port sync / git pull &
portindex" run? When exactly did the use install or update the port
(to estimate the time passed between the actual commit and user
updating)?

Lossless collection is about not trying to discard data that you get
(with all the timing info etc.), and just storing some incomplete and
overly simplistic statistics that you have no way of fixing later.

> There are different reasons for inactive ports.  Sometimes they're just
> leftovers from upgrades without -u.

I never run upgrade with '-u' and only ever clean the leftover port
every few months, if at all. Literally everything in my inactive ports
is useless info.

> But sometimes the user is
> intentionally keeping inactive ports, to permit switching fairly quickly
> via activate/deactivate, either to keep multiple variants or to keep
> conflicting ports.

This would only be useful information if the inactive port info
actually came with the additional label about why the user kept that
other inactive port around.

> Speaking of return codes, it's not very helpful that "upgrade outdated"
> returns error status if nothing is outdated. :-)

Please file a ticket on Trac.

Mojca


Re: Dependencies on non-default variants (was: GSoC 2019 [Collect build statistics])

2019-03-25 Thread Mojca Miklavec
Dear Craig,

> As you said, people have looked at this problem and not found a workable 
> solution.

Personally I never did spend any effort into fixing this, partially
probably also because it doesn't affect any ports that I use or
maintain.

> It may be a _long_ time before a “proper” solution is implemented.

You don't need a 100% perfect solution, but something that works.

What buildbot setup could do, is check which variants are required for
a particular port. It could then perhaps install
p5.28-dbd-mysql +mariadb
explicitly, which could at least work for direct dependencies; getting
it to work correctly in a recursive way would be a bigger challenge.

Again, by far the easiest solution would be to fix ports in a way that
no port requires a non-default variant to be active. You didn't yet
answer my question: what prevents the dependencies of MythTV to change
their default variants, so that your ports would work out of the box?

> In the meantime, reporting these as failed builds *actively misinforms* users.

This has been the case for years already.

> When you say there will be the same problem “on a different page”, I don’t 
> know what you mean.

One of the project ideas is to make buildbot logs and summaries more
useful, directly from the buildbot views:
https://trac.macports.org/ticket/55978
While we could internally waste weeks doing ugly workarounds to make
some ports artificially look pretty, I'm not going to ask developers
of buildbot to implement workarounds to allow cheating and report the
same build as both broken and successful at the same time. If you
don't want your port to be reported as broken, fix either the port(s),
the base, or the buildbot configuration.

An example of a buildbot configuration fix (that didn't actually take
a lot of time to do) is that we no longer build obsolete ports (we
used to do that in the past and got zillions of errors, in particular
when modifying the graveyard ports).

> And if we can implement a workaround, why can’t we share it with this other 
> page?

Because I don't want to implement "build is both working and broken at
the same time" functionality in buildbot.

> Why would we have different pages reporting the same information?

It's not exactly the same. We are already reporting that "your" ports
are broken, in both buildbot and on the GitHub interface (when you
browse the commits). It's just that finding a particular port in the
buildbot is currently almost a mission impossible, so as a consequence
nobody knows which ports are broken and which ones work, except if you
check your own commits immediately a few hours after doing the commit,
or read the archives of build failures sent by buildbot. The
improvements on the buildbot site are meant to make the buildbot's
interface itself useful (which it currently isn't, except for checking
the last few builds), while the standalone app would provide a
plethora of other information (installation statistics, which ports
are outdated, which ones are broken, which websites are broken, ...),
just collected on one single place.

> > My argument is that we need to fix ports and base to avoid those
> > failures, not to explain them away.
>
> I don’t disagree.  I guess I’m not as optimistic that this will be done 
> quickly.

It needs someone to push for a fix, someone to come up with a decend
idea for the fix, and someone to actually implement it (it could be
the same person). If you take the first two tasks on you, even if you
don't know how to code in the base yourself, there's a high chance
that you'll get help with the last part. If nobody is pushing nor
suggesting what to do, this will likely not be done as fast.

I'm just arguing that if we get a student working on the web app, I
would prefer if he or she would perhaps spend time adding novel
functionality, which could even be allowing user comments on each port
(which you could use to explain why the port appears to be broken)
rather than on something that should be fixed elsewhere. (I wouldn't
mind the student spending time fixing that particular bug in buildbot
configuration or in base if comfortable with the code, but I would
definitely not want to demand from the student to fix this issue as
part of the project. I'm pretty sure that we'll have plenty more
serious problems that nobody is even aware of yet. And if it would be
easy to fix in the web app, we'll gladly accept any patches.)

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-24 Thread Craig Treleaven
> On Mar 24, 2019, at 5:57 PM, Fred Wright  wrote:
> 
> On Sun, 24 Mar 2019, Mojca Miklavec wrote:
>> On Sat, 23 Mar 2019 at 17:49, Craig Treleaven wrote:
>>> 
>>> I see no reason to report inactive ports.
>> 
>> Neither do I. I would remove those as I already mentioned in an earlier 
>> email.
> 
> But in the spirit of lossless collection, those should be included and 
> flagged as inactive, so that what to do with them can be decided later.
> 
> There are different reasons for inactive ports.  Sometimes they're just 
> leftovers from upgrades without -u.  But sometimes the user is intentionally 
> keeping inactive ports, to permit switching fairly quickly via 
> activate/deactivate, either to keep multiple variants or to keep conflicting 
> ports.
> 
Two reasons we shouldn’t collect details of inactive ports:

1) It needlessly increases the volume of data transmitted and processed.  In my 
case, I have nearly 7 times as many inactive ports as active.  If we hope to 
get thousands of users participating, that would waste resources for users and 
the project.

2) We should only collect data from users that we need.  No one has said how 
details of inactive ports might be useful.  If it isn’t useful, how can we 
justify to users that we need to collect it?

Craig



Re: Dependencies on non-default variants (was: GSoC 2019 [Collect build statistics])

2019-03-24 Thread Craig Treleaven
> On Mar 24, 2019, at 4:05 PM, Mojca Miklavec  wrote:
> 
> On Sun, 24 Mar 2019 at 19:55, Craig Treleaven wrote:
>>> On Mar 24, 2019, at 1:09 PM, Mojca Miklavec wrote:
>>> On Sun, 24 Mar 2019 at 01:06, Craig Treleaven wrote:
 
 There are a number of ports that require a dependency to be installed with 
 a non-default variant in order to build successfully.  A short-coming of 
 MacPorts is that this cannot be done progammatically
 
 When this is rolled out, we don’t want to make users think that a port 
 will fail to build on their system when it is just a case of needing a 
 non-default variant.
 
 However, I don’t know how to handle this cleanly.   Perhaps we could parse 
 the build log looking for the message that informs the user how to install 
 the required variant.  If found, instead of saying the build failed, we 
 could indicate that the build was not attempted as the buildbot 
 configuration could not support a successful install.
>>> 
>>> I totally agree with your request, but this is completely out of scope
>>> of the proposed app. This either needs a proper extension in the base,
>>> or a workaround in mpbb, preferably the former. I believe a much
>>> bigger general issue is reporting failure of port builds on OSes which
>>> are know not to be supported (like: attempting to build the latest Qt
>>> on 10.5). Again, this needs to be addressed elsewhere.
>>> 
>> This issue hits very close to home for me.  None of my MythTV ports, nor the 
>> hdhomerun-gui port, will build successfully on the buildbots.  They never 
>> have.  They *will* build successfully (on supported OS versions) if the 
>> proper variants are specified.
>> 
>> If an unknowing potential user came to page for any of these ports and found 
>> nothing but failure messages for all of the buildbots, why on earth would 
>> they want to proceed to install the port?
> 
> […]
>> If we won’t expand the scope to handle this relatively common issue, we 
>> should at very least add some static text to the web page explaining that 
>> buildbot failures don’t mean necessarily mean the port will fail for a 
>> particular user.  Even so, that is a very poor workaround.
> 
> You could equally argue that users will see that ports don't fail, but
> then they are buffled when they cannot install those ports on their
> own machine. We should really really fix the situation in MacPorts.
> 
Not the same at all.  Now, if a user doesn’t specify the right variants they 
get a message that tells them what to do.  Of course it would be better if that 
didn’t happen but this is the way things work, now.

> I would say: Don't kill the messenger. Just because one application
> exposes some issues with MacPorts, it doesn't mean that the
> application needs to be endlessly tweaked to hide those problems away.
> We should no have failing builds on the buildbot, end of story. We
> need to do everything to avoid failing builds, not to implement
> explanations and workarounds on the wrong level. Note that we'll
> probably have another student apply for a different project which
> would, if selected, expose the exact same problem on a different page.
> Should we implement those workarounds ten times?
> 
As you said, people have looked at this problem and not found a workable 
solution.  It may be a _long_ time before a “proper” solution is implemented.  
In the meantime, reporting these as failed builds *actively misinforms* users.  
When you say there will be the same problem “on a different page”, I don’t know 
what you mean.  And if we can implement a workaround, why can’t we share it 
with this other page?  Why would we have different pages reporting the same 
information?

As I see it, all we have to do is search the appropriate log for the message 
that the active_variants PortGroup spits out when it detects a dep with the 
wrong variants.  If found, we replace the text “Failed” with something like 
“Build not attempted”.  And probably add a link to a page that explains why 
not.  Clean and elegant?  No.  Hard to do?  Again, no.

> My argument is that we need to fix ports and base to avoid those
> failures, not to explain them away.

I don’t disagree.  I guess I’m not as optimistic that this will be done quickly.

Craig



Re: GSoC 2019 [Collect build statistics]

2019-03-24 Thread Fred Wright



On Sun, 24 Mar 2019, Mojca Miklavec wrote:

On Sat, 23 Mar 2019 at 17:49, Craig Treleaven wrote:


I see no reason to report inactive ports.


Neither do I. I would remove those as I already mentioned in an earlier email.


But in the spirit of lossless collection, those should be included and 
flagged as inactive, so that what to do with them can be decided later.


There are different reasons for inactive ports.  Sometimes they're just 
leftovers from upgrades without -u.  But sometimes the user is 
intentionally keeping inactive ports, to permit switching fairly quickly 
via activate/deactivate, either to keep multiple variants or to keep 
conflicting ports.


On Sun, 24 Mar 2019, Craig Treleaven wrote:

I?m not familiar with the 2015 work.  ?port? now returns zero for 
successful completion.  Have we considered having port set a return code 
that indicates the general class of an unsuccessful operation?  For 
example, we have 8 port phases defined (fetch through destroot).  We 
could return -1 through -8 to indicate failure in a particular phase. 
More to the point, we could return a specific value for failures such as 
when active_variants determines that a required variant is not 
installed.  Similarly, if the port is not supported on a particular OS 
configuration another specific code could be returned.  I don?t 
contribute to base but that would seem to be a minimally invasive 
modification.


Speaking of return codes, it's not very helpful that "upgrade outdated" 
returns error status if nothing is outdated. :-)


Fred Wright


Dependencies on non-default variants (was: GSoC 2019 [Collect build statistics])

2019-03-24 Thread Mojca Miklavec
On Sun, 24 Mar 2019 at 19:55, Craig Treleaven wrote:
> > On Mar 24, 2019, at 1:09 PM, Mojca Miklavec wrote:
> > On Sun, 24 Mar 2019 at 01:06, Craig Treleaven wrote:
>>>
>>> There are a number of ports that require a dependency to be installed with 
>>> a non-default variant in order to build successfully.  A short-coming of 
>>> MacPorts is that this cannot be done progammatically
>>>
>>> When this is rolled out, we don’t want to make users think that a port will 
>>> fail to build on their system when it is just a case of needing a 
>>> non-default variant.
>>>
>>> However, I don’t know how to handle this cleanly.   Perhaps we could parse 
>>> the build log looking for the message that informs the user how to install 
>>> the required variant.  If found, instead of saying the build failed, we 
>>> could indicate that the build was not attempted as the buildbot 
>>> configuration could not support a successful install.
>>
>> I totally agree with your request, but this is completely out of scope
>> of the proposed app. This either needs a proper extension in the base,
>> or a workaround in mpbb, preferably the former. I believe a much
>> bigger general issue is reporting failure of port builds on OSes which
>> are know not to be supported (like: attempting to build the latest Qt
>> on 10.5). Again, this needs to be addressed elsewhere.
>>
> This issue hits very close to home for me.  None of my MythTV ports, nor the 
> hdhomerun-gui port, will build successfully on the buildbots.  They never 
> have.  They *will* build successfully (on supported OS versions) if the 
> proper variants are specified.
>
> If an unknowing potential user came to page for any of these ports and found 
> nothing but failure messages for all of the buildbots, why on earth would 
> they want to proceed to install the port?

Please note that HomeBrew dropped *ALL* variants from *ALL* of their
maintained packages, saying something like: "We may discuss which
features will be turned on, or which library will be linked, but we
won't discuss adding back any of those options."

I now checked the first two random MythTV ports, which basically boils down to:

require_active_variants qt5-mysql-plugin mariadb55
require_active_variants p${perl5.major}-dbd-mysql mariadb
require_active_variants ${pymodver}-mysql mariadb55

out of which only the second one provides the wrong default variant,
and that port doesn't have the maintainer. Neither do any of the mysql
packages have any maintainer at the moment.

My first question is: why exactly does p5-dbd-mysql need a different
default variant?

My second question / suggestion: please try to take over maintenance
of the packages that you depend on and turn them into a shape that
will make them more generally usable.

> If we won’t expand the scope to handle this relatively common issue, we 
> should at very least add some static text to the web page explaining that 
> buildbot failures don’t mean necessarily mean the port will fail for a 
> particular user.  Even so, that is a very poor workaround.

You could equally argue that users will see that ports don't fail, but
then they are buffled when they cannot install those ports on their
own machine. We should really really fix the situation in MacPorts.

I would say: Don't kill the messenger. Just because one application
exposes some issues with MacPorts, it doesn't mean that the
application needs to be endlessly tweaked to hide those problems away.
We should no have failing builds on the buildbot, end of story. We
need to do everything to avoid failing builds, not to implement
explanations and workarounds on the wrong level. Note that we'll
probably have another student apply for a different project which
would, if selected, expose the exact same problem on a different page.
Should we implement those workarounds ten times?

My argument is that we need to fix ports and base to avoid those
failures, not to explain them away.

> I’m not familiar with the 2015 work.

It was about using a library to resolve dependencies in a
"mathematically correct way". If this was finished, "port install foo"
would automatically install the dependencies with the correct variant,
among others.

> ‘port’ now returns zero for successful completion. Have we considered having 
> port set a return code that indicates the general class of an unsuccessful 
> operation?

I'm not aware of that, but that discussion would call for a different
ticket or different topic on this mailing list.

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-24 Thread Craig Treleaven
> 
> On Mar 24, 2019, at 1:09 PM, Mojca Miklavec  wrote:
> 
> On Sun, 24 Mar 2019 at 01:06, Craig Treleaven wrote:
>> 
>> please note that we can’t expect all ports to build successfully on the 
>> buildbots.
> 
> Nobody said that, but we cannot blame the student collecting the data
> from buildbot for macports internal deficiencies :)
> 
You thought I was blaming the student?  Merely pointing out something they 
might not know.

>> There are a number of ports that require a dependency to be installed with a 
>> non-default variant in order to build successfully.  A short-coming of 
>> MacPorts is that this cannot be done progammatically
> 
> This is a deficiency that could be fixed, but this is outside of scope
> of this project. The project should still report the port as broken,
> consistent to buildbot, it's then up to others to fix MacPorts. The
> work done by Jackson in 2015 was supposed to address that, but it
> turned out to be way more difficult than initially anticipated.
> 
>> When this is rolled out, we don’t want to make users think that a port will 
>> fail to build on their system when it is just a case of needing a 
>> non-default variant.
>> 
>> However, I don’t know how to handle this cleanly.   Perhaps we could parse 
>> the build log looking for the message that informs the user how to install 
>> the required variant.  If found, instead of saying the build failed, we 
>> could indicate that the build was not attempted as the buildbot 
>> configuration could not support a successful install.
> 
> I totally agree with your request, but this is completely out of scope
> of the proposed app. This either needs a proper extension in the base,
> or a workaround in mpbb, preferably the former. I believe a much
> bigger general issue is reporting failure of port builds on OSes which
> are know not to be supported (like: attempting to build the latest Qt
> on 10.5). Again, this needs to be addressed elsewhere.
> 
This issue hits very close to home for me.  None of my MythTV ports, nor the 
hdhomerun-gui port, will build successfully on the buildbots.  They never have. 
 They *will* build successfully (on supported OS versions) if the proper 
variants are specified.

If an unknowing potential user came to page for any of these ports and found 
nothing but failure messages for all of the buildbots, why on earth would they 
want to proceed to install the port?

If we won’t expand the scope to handle this relatively common issue, we should 
at very least add some static text to the web page explaining that buildbot 
failures don’t mean necessarily mean the port will fail for a particular user.  
Even so, that is a very poor workaround.

Failing that, I would modify the description of each of these ports to note the 
cause of the buildbot ‘failures’ and how it is irrelevant.

I’m not familiar with the 2015 work.  ‘port’ now returns zero for successful 
completion.  Have we considered having port set a return code that indicates 
the general class of an unsuccessful operation?  For example, we have 8 port 
phases defined (fetch through destroot).  We could return -1 through -8 to 
indicate failure in a particular phase.  More to the point, we could return a 
specific value for failures such as when active_variants determines that a 
required variant is not installed.  Similarly, if the port is not supported on 
a particular OS configuration another specific code could be returned.  I don’t 
contribute to base but that would seem to be a minimally invasive modification.

Craig

Re: GSoC 2019 [Collect build statistics]

2019-03-24 Thread Mojca Miklavec
On Sun, 24 Mar 2019 at 01:06, Craig Treleaven wrote:
>
> please note that we can’t expect all ports to build successfully on the 
> buildbots.

Nobody said that, but we cannot blame the student collecting the data
from buildbot for macports internal deficiencies :)

> There are a number of ports that require a dependency to be installed with a 
> non-default variant in order to build successfully.  A short-coming of 
> MacPorts is that this cannot be done progammatically

This is a deficiency that could be fixed, but this is outside of scope
of this project. The project should still report the port as broken,
consistent to buildbot, it's then up to others to fix MacPorts. The
work done by Jackson in 2015 was supposed to address that, but it
turned out to be way more difficult than initially anticipated.

> When this is rolled out, we don’t want to make users think that a port will 
> fail to build on their system when it is just a case of needing a non-default 
> variant.
>
> However, I don’t know how to handle this cleanly.   Perhaps we could parse 
> the build log looking for the message that informs the user how to install 
> the required variant.  If found, instead of saying the build failed, we could 
> indicate that the build was not attempted as the buildbot configuration could 
> not support a successful install.

I totally agree with your request, but this is completely out of scope
of the proposed app. This either needs a proper extension in the base,
or a workaround in mpbb, preferably the former. I believe a much
bigger general issue is reporting failure of port builds on OSes which
are know not to be supported (like: attempting to build the latest Qt
on 10.5). Again, this needs to be addressed elsewhere.


What *might* need to be addressed here and has never been discussed so
far, is the fact that some ports install a different version of the
port, or different default variants, or need different dependencies,
based on OS version. If port Foo installs version 1.0 on 10.8 and
older, and version 2.0 on 10.9 and newer, it would be somewhat wrong
to treat version 1.0 as outdated. But treating this correctly without
too much overhead is somewhat tricky, and probably not something for
the first iteration either.

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-24 Thread Arjun Salyan via macports-dev
On Sun, Mar 24, 2019 at 10:02 PM Mojca Miklavec  wrote:

> Here are some examples of why I don't see a single correct answer to
> your initial question. Let's assume that you know absolutely
> everything about all MacPorts installation (exact timestamp of when
> each port was installed or uninstalled, exact timestamp of MacPorts
> installations / upgrades / removals ...) and you want to know the
> answer to
> "How many users have port Foo installed on each OS version in March
> 2019?"
>

If we go with the current setup, mpstats submits data weekly, and hence to
make the reporting as precise as possible, we would need to present reports
on per-week basis, also as Craig suggested.
I have tried something here:
https://docs.google.com/document/d/1VReRyPYKifZ1ub77oXXP7ZCqi20nq2jPrKzNxQJ7hxk/edit?usp=sharing
.
Please take a look when you get time.

Thanks


Re: GSoC 2019 [Collect build statistics]

2019-03-24 Thread Mojca Miklavec
On Sat, 23 Mar 2019 at 17:49, Craig Treleaven wrote:
>
> I see no reason to report inactive ports.

Neither do I. I would remove those as I already mentioned in an earlier email.

> The “OS” section for my system is:
>
>   "os": {
> "macports_version": "2.5.4",
> "osx_version": "10.10",
> "os_arch": "i386",
> "os_platform": "darwin",
> "build_arch": "x86_64",
> "gcc_version": "none",
> "xcode_version": "7.2.1"
>   },
>
> The gcc-version is no longer relevant.

True. See (an ancient) https://trac.macports.org/wiki/StatisticsIdeas

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-24 Thread Mojca Miklavec
On Sat, 23 Mar 2019 at 15:28, Craig Treleaven wrote:
>
> Our existing installation stats are, to be kind, a mess.

Which is precisely why we suggested this project: to fix what we
learned from the past mistakes. The submission process is mostly still
ok, but the database design is flawed. I'm not blaming the original
project, it's part of evolution. The major problem is that those
problems were never addressed so far, and we hope they would be in a
rewrite-from-scratch.

> I can’t recall if we ever had a design document that identified the sorts of 
> information we wanted to capture and report.

The closest we came so far was probably last year's project,

https://github.com/macports/macports-webapp/blob/master/docs/Database_Design.md
but that is still very far from being a finished design and is still
completely lacking all the data views that we want to see.

We are now waiting for this year's proposal and evolution of the idea
based on discussion here, which would hopefully become the new "design
document".

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-24 Thread Mojca Miklavec
Hi,

(Sorry, this email got so long that I'll answer the others separately.)

On Sat, 23 Mar 2019 at 11:26, Arjun Salyan wrote:
> On Sat, Mar 23, 2019 at 3:15 PM Mojca Miklavec wrote:
>>
>> I would use the first definition: number of users currently having the
>> port installed. It might be pretty common to have to reinstall the
>> same port multiple times (maybe just for debugging / development
>> reasons) and we don't want to count the port developer 20 times. If
>> the user uninstalled the port, it's equivalent to me as never having
>> it installed in the first place.
>
>
> Thanks. But in that case what would be considered as number of installations 
> in a particular month? Suppose, the first weekly submission contains port P 
> in active_ports, but during second submission(in the same month), the port is 
> uninstalled.
>
> One way would be to have it consider the number of users having it in active 
> ports on the last day of the month or on 15th.

Short answer: I could consider the port as installed by a particular
user if it was reported as installed at least once in that month (if
it was installed during the first report, then uninstalled, count it
as installed; it will not be counted next month anyway if the user
just made a mistake / changed their mind).


Long answer:

I would say that there is no single correct answer (I'll try to give a
few examples below), but I find it quite important not to do any
"lossy data import" at the time of importing the statistics. Non-lossy
import allows you to change the representation of data (what to show
and how) at any given point in the future.

The existing statistics page discards a lot of information at the time
of import. For example: it just counts the overall number of a certain
macOS versions which turned out to be completely useless piece of
information if it's not correlated with time. We want to know how many
users of 10.8 we have today, not counting the users which have
migrated since.

A big mistake we did in the early days of GSOC is that we didn't try
to deploy the solutions early enough (this was properly deployed only
long after the GSOC was over), so the student only ever worked with
made-up data and nobody ever noticed that this would be a problem. But
even when put that late deployment aside ... if the data wasn't lost
during the statistics submission, we could still recalculate
historical data and change the representation to the exact form in
which we want it now (after months or years of experience and
feedback). If we still had raw data in the form of
(uuid, timestamp, os_version)
we could still experiment with various data representations and draw
the desired graphs. Now we only keep
(uuid, os_version)
in the database. Granted, from the second representation it's much
easier to draw the graph than from the first one, but the first one
bears a lot more information. With proper database indexing and some
non-trivial sql queries you could easily draw "any graph you want"
from the first table.

Ideally the database should contain only raw data, and then some views
to assist with further statistics. Certain pages could be cached, so
that the database would not need to recalculate the same data over and
over again even when the underlying data didn't change at all. Only if
we run into serious performance issues I would start doing some
pre-calculations and store them back to the database, maybe run
nightly, hourly or so.



Here are some examples of why I don't see a single correct answer to
your initial question. Let's assume that you know absolutely
everything about all MacPorts installation (exact timestamp of when
each port was installed or uninstalled, exact timestamp of MacPorts
installations / upgrades / removals ...) and you want to know the
answer to
"How many users have port Foo installed on each OS version in March 2019?"

1.) Assume I have it installed on computer in the office, but I was on
vacations or business trip all March, so the computer was not even
online to submit its monthly statistics. Does that computer count? It
won't count now as it would not submit the statistics, but it could
count if you knew everything about that computer. If you recorded the
event when I installed the port and didn't see any uninstallation
/deactivation events since, you could still count it as active
(maybe). Well, you could argue that I didn't use that computer for a
month anyway, so it has all the rights not to be counted, which is a
fair argument, but ...

2.) I also have that port on my laptop and I used it actively during
that time. But since I was travelling, I hardly ever had access to
internet from the laptop (as good as never), so there would be no
statistics sent either.

3.) I have that port on my old laptop which I didn't turn on since the
last few months (but the software is still there). Even if you knew
everything about the history of macports installations on that laptop:
would you count that port? Probably not, 

Re: GSoC 2019 [Collect build statistics]

2019-03-24 Thread Craig Treleaven
> On Mar 23, 2019, at 6:26 AM, Arjun Salyan via macports-dev 
>  wrote:
> 
> On Sat, Mar 23, 2019 at 3:15 PM Mojca Miklavec  > wrote:
> I would use the first definition: number of users currently having the
> port installed. It might be pretty common to have to reinstall the
> same port multiple times (maybe just for debugging / development
> reasons) and we don't want to count the port developer 20 times. If
> the user uninstalled the port, it's equivalent to me as never having
> it installed in the first place.
> 
> Thanks. But in that case what would be considered as number of installations 
> in a particular month? Suppose, the first weekly submission contains port P 
> in active_ports, but during second submission(in the same month), the port is 
> uninstalled.
> 
> One way would be to have it consider the number of users having it in active 
> ports on the last day of the month or on 15th.

Actually, the current mpstats job submits data weekly.  The following is a 
portion of the ‘/Library/LaunchDaemons/org.macports.mpstats.plist’ on my system:

StartCalendarInterval

Weekday
2
Hour
06
Minute
52


I read somewhere (lost now) that during a month, earlier submissions are 
discarded when another submission is received for the same UUID.  

As a maintainer, I am interested to know how the characteristics of the users 
of my ports change over time.  For example, if I make a new version available, 
how quickly do users upgrade?  How quickly do the users of this port migrate to 
newer versions of the Mac operating system?  Etc.  As mentioned earlier, I have 
no interest in inactive versions of ports.

As a project, I think we’d like to know how quickly our user base adopts new 
versions of the OS, Xcode and new versions of MacPorts base.  

I don’t think we need a tremendous amount of detail.  I would propose that we 
only need (or need to report on) snapshots as of:
a) week ago (i.e. current information)
b) month ago (4 weeks ago)
c) 3 months ago (13 weeks ago)
d) 6 months ago (26 weeks ago)
e) year ago (52 weeks ago)

This way, submissions over a year old could be purged while allowing fairly 
straightforward and understandable reporting criteria.

Craig




Re: GSoC 2019 [Collect build statistics]

2019-03-24 Thread Arjun Salyan via macports-dev
Hi,

I have prepared a Google Doc on the implementation of installation
statistics. I do not know if this is the right way to get suggestions. But
it would be great if I could get feedback and suggestions on this:
https://docs.google.com/document/d/1VReRyPYKifZ1ub77oXXP7ZCqi20nq2jPrKzNxQJ7hxk/edit?usp=sharing

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-23 Thread Arjun Salyan via macports-dev
On Sat, Mar 23, 2019 at 7:58 PM Craig Treleaven 
wrote:

> See:
>

> http://stats.macports.neverpanic.de/os_statistics#os_platform
>
> It says all 239 reported platforms are Darwin.  So this appears to be the
> conglomeration of all reporting over the past several years.  This explains
> why the charts for OS X Version and MacPorts Version contain so many old
> versions.  All versions ever reported are being added together--which is
> useless.
>

Does this imply that the current system is reporting all os x/ macports
versions a unique user ever had?


> Note that the port ‘mpstats’ must be installed in order to report.  Thus,
> it MUST be the “top port” for the month, every month.  Not helpful
> information.
>

Also, since we will be already reporting the number of users who are
submitting reports, it does not make sense to include mpstats in the top
installations table.


> The top list includes items like libffi, gettext and expat.  Generally,
> these are installed as dependencies of over things that users have actually
> chosen to install.  However, we don’t capture whether a user “Requested” a
> port or not.  I would really be interested in a list of top Requested ports.
>
mpstats reports whether a port was requested or not, so it would be easy to
display stats for only requested ports.

I think good installation stats could help us understand our users and how
> they are using MacPorts.  I can’t recall if we ever had a design document
> that identified the sorts of information we wanted to capture and report.
>

I will try to come up with an initial design of how and what can be
reported and then we could brainstorm to reach somewhere.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-23 Thread Craig Treleaven
> On Mar 23, 2019, at 5:45 AM, Mojca Miklavec  wrote:
> 
> On Sat, 23 Mar 2019 at 10:35, Arjun Salyan via macports-dev wrote:
>> 
>> Hi,
>> I am working on the design of tables for installation statistics. I have a 
>> doubt here:
>> 
>> Suppose there is a port P. Now for number of installations of P, there are 
>> many definitions I am having in my mind:
>> 
>> Number of users currently having P in active_ports/ inactive_ports. [ACTIVE 
>> INSTALLATIONS]
> 
> Btw: I would ignore inactive ports (= count them as not installed).  […]

Note that you can see what statistics are being reported for your system with 

/opt/local/libexec/mpstats show

You may want to direct the output to a file or text editor!

For my system, there are about 750 active ports and over 2,700 inactive ports.  
For example, all 13 versions of qt5x-qtbase are included in the inactive ports. 
 My submission totals about 3,500 lines or 203,000 bytes.  

I see no reason to report inactive ports.  Is there some value to this 
information that I am missing?  Presumably, eliminating the inactive ports 
would cut down the data traffic to about 43 kB.  If so, I think weekly 
reporting would be quite feasible.

The “OS” section for my system is:

  "os": {
"macports_version": "2.5.4",
"osx_version": "10.10",
"os_arch": "i386",
"os_platform": "darwin",
"build_arch": "x86_64",
"gcc_version": "none",
"xcode_version": "7.2.1"
  },

The gcc-version is no longer relevant.

Craig



Re: GSoC 2019 [Collect build statistics]

2019-03-23 Thread Craig Treleaven
> On Mar 23, 2019, at 6:26 AM, Arjun Salyan via macports-dev 
>  wrote:
> 
> On Sat, Mar 23, 2019 at 3:15 PM Mojca Miklavec  > wrote:
> I would use the first definition: number of users currently having the
> port installed. It might be pretty common to have to reinstall the
> same port multiple times (maybe just for debugging / development
> reasons) and we don't want to count the port developer 20 times. If
> the user uninstalled the port, it's equivalent to me as never having
> it installed in the first place.
> 
> Thanks. But in that case what would be considered as number of installations 
> in a particular month? Suppose, the first weekly submission contains port P 
> in active_ports, but during second submission(in the same month), the port is 
> uninstalled.
> 
> One way would be to have it consider the number of users having it in active 
> ports on the last day of the month or on 15th.
> 

Our existing installation stats are, to be kind, a mess.  

If you look at:

http://stats.macports.neverpanic.de 

It says: "Our statistics know about 239 users in total. Last month (February), 
49 users have submitted statistics.”

Since there are two submissions per month, does that mean there were 49 unique 
reporting systems (one user may have more than one system; I do) or half that 
amount?

What does “239 users in total” mean?  Does that mean 239 unique user 
identifiers over the past several years?  How is this a helpful statistic in 
any way?

See:

http://stats.macports.neverpanic.de/os_statistics#os_platform 


It says all 239 reported platforms are Darwin.  So this appears to be the 
conglomeration of all reporting over the past several years.  This explains why 
the charts for OS X Version and MacPorts Version contain so many old versions.  
All versions ever reported are being added together--which is useless.

Look at:

http://stats.macports.neverpanic.de/installed_ports 


It says "Most popular port this month (March) is mpstats with 57 installs.”   
Since we are late in March now, it appears that most systems have submitted a 
report and the number of reporting users has gone up from the 49 in February.  
I guess.

Note that the port ‘mpstats’ must be installed in order to report.  Thus, it 
MUST be the “top port” for the month, every month.  Not helpful information.

The top list includes items like libffi, gettext and expat.  Generally, these 
are installed as dependencies of over things that users have actually chosen to 
install.  However, we don’t capture whether a user “Requested” a port or not.  
I would really be interested in a list of top Requested ports.

Currently, I can’t access the installation statistics for individual ports.  As 
I recall, there are significant problems with the current reporting.

I think good installation stats could help us understand our users and how they 
are using MacPorts.  I can’t recall if we ever had a design document that 
identified the sorts of information we wanted to capture and report.  

Craig



Re: GSoC 2019 [Collect build statistics]

2019-03-23 Thread Arjun Salyan via macports-dev
On Sat, Mar 23, 2019 at 3:15 PM Mojca Miklavec  wrote:

> I would use the first definition: number of users currently having the
> port installed. It might be pretty common to have to reinstall the
> same port multiple times (maybe just for debugging / development
> reasons) and we don't want to count the port developer 20 times. If
> the user uninstalled the port, it's equivalent to me as never having
> it installed in the first place.
>

Thanks. But in that case what would be considered as number of
installations in a particular month? Suppose, the first weekly submission
contains port P in active_ports, but during second submission(in the same
month), the port is uninstalled.

One way would be to have it consider the number of users having it in
active ports on the last day of the month or on 15th.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-23 Thread Mojca Miklavec
Dear Arjun,

On Sat, 23 Mar 2019 at 10:35, Arjun Salyan via macports-dev wrote:
>
> Hi,
> I am working on the design of tables for installation statistics. I have a 
> doubt here:
>
> Suppose there is a port P. Now for number of installations of P, there are 
> many definitions I am having in my mind:
>
> Number of users currently having P in active_ports/ inactive_ports. [ACTIVE 
> INSTALLATIONS]

Btw: I would ignore inactive ports (= count them as not installed).

> Number of users for which P ever appeared in active/ inactive ports, no 
> matter if it is there at this point of time or not. [TOTAL INSTALLATIONS- 
> counted only once per user]
> If any particular user installs P two times, then count that as two different 
> installations. [TOTAL INSTALLATIONS]

I would use the first definition: number of users currently having the
port installed. It might be pretty common to have to reinstall the
same port multiple times (maybe just for debugging / development
reasons) and we don't want to count the port developer 20 times. If
the user uninstalled the port, it's equivalent to me as never having
it installed in the first place.

This reminds me on the old times with webpage view counters when one
could click "reload" twenty times and the page view counter would
simply keep increasing for each reload or any subpage visited.

There is one further caveat. The existing code of mpstat is only
executed once every two weeks. If you really wanted to count each
installation, you would need to modify port command (base code) to
report every single command to the server. We might eventually do
something like that to report broken builds etc., but that's currently
not ready.

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-23 Thread Arjun Salyan via macports-dev
Hi,
I am working on the design of tables for installation statistics. I have a
doubt here:

Suppose there is a port P. Now for number of installations of P, there are
many definitions I am having in my mind:

   1. Number of users currently having P in active_ports/ inactive_ports.
   [ACTIVE INSTALLATIONS]
   2. Number of users for which P ever appeared in active/ inactive ports,
   no matter if it is there at this point of time or not. [TOTAL
   INSTALLATIONS- counted only once per user]
   3. If any particular user installs P two times, then count that as two
   different installations. [TOTAL INSTALLATIONS]

Which one would make more sense? Maybe we can have two fields- "Total
Installations" (definition 2) and "Active Installations" (definition 1)? Or
just one?

Thank You
Arjun

>


Re: GSoC 2019 [Collect build statistics]

2019-03-21 Thread Arjun Salyan via macports-dev
On Thu, Mar 21, 2019 at 8:35 PM Umesh Singla 
wrote:

> a) We have seen a quick demo of this already. However, the major part I
> think is missing is the search. We can brainstorm over the details like
> search-as-you-type, adding new ports etc according to the timeline. Not
> sure how much browsing by the first letter helps.
>

Search-as-you-type would be good, and can be further supported with: "New
Ports", "Related Ports" on port detail, "Popular Ports"- overall and for
each category.


> b) You cannot have a class Ports when it represents a Port:
> https://github.com/arjunsalyan/MacPorts-Demo-App/blob/master/ports/models.py#L6
> .
>

Yes, there are some major code improvements to be done. I will finish these
shortly.


> 2. build statistics:
>
> a) In Time Elapsed of builds, it would be incorrect to show time taken for
> only one of the build stages. Example, in the case of
> https://build.macports.org/builders/ports-10.12_x86_64-builder/builds/87301,
> total-time-taken (12 min 59 secs) is right to be shown and not "6 mins 45
> secs".
>
Thanks for picking this out.


> b) There are some things which seem hard-coded to me. I
> see '10.14_x86_64', '10.13_x86_64' at multiple places - in port detail
> view, build to database view and jinja templates. It's time to define some
> constants config file now. For build statuses as well. With a new release
> of macOS, we do not want to have to change multiple files in code. In this
> project, it is important that a part which works, it is accurate and
> complete.
>

What I have planned is to have a separate table of builders with relations
to the build history table. Any upcoming versions can then easily be added
to the table. Since, I wasn't fetching many logs from

c) Also, as Mojca mentioned, errors like these:
> http://frozen-falls-98471.herokuapp.com/ports/database/ should not be
> exposed. What is it intended to do anyway?
>

Initially, I used this to parse build history into the database. But now I
am using a separate script- just forgot to remove this. Sorry. As for
errors we will be throwing custom 404 for doesnotexist exceptions

3. installation statistics:
>
Thank you, I will look into this.


> As Mojca said, I am not seeing any way to provide code review on Github
> when it's already merged. Since you have the base application ready, it's
> time to use PRs. I would also advise starting to follow at least some of
> the PEP8 style guide conventions, it's good to follow clean code practices
> from the beginning. You can either use coala lint or pylint before pushing
> the code, if familiar.
>

 I can submit PRs to the temporary repository Mojca mentioned about once it
is available. We can then have a very fresh start. I will make the initial
commit after improving the code based on the suggestions.

Thanks
Arjun


Re: GSoC 2019 [Collect build statistics]

2019-03-21 Thread Arjun Salyan via macports-dev
Hi, I have created the pull request. The new output is shown below:


{

   * "variants" : ["debug"],*

"depends_build" :
["path:bin/cmake:cmake","port:pkgconfig","path:share/ECM/cmake/ECMConfig.cmake:kde-extra-cmake-modules"],

"portdir" : "audio\/phonon-backend-vlc",

"depends_fetch" : "bin:git:git",

"description" : "VLC backend for Phonon",

"homepage" : "http:\/\/projects.kde.org
\/projects\/kdesupport\/phonon\/phonon-vlc",

"epoch" : "0",

"platforms" : "darwin",

"name" : "phonon-backend-vlc-qt5",

*"depends_lib" :
["path:lib/libvlc.dylib:libVLC","port:phonon-qt5","path:lib/pkgconfig/Qt5Core.pc:qt5-qtbase"],*

   * "openmaintainer" : True,*

"license" : "{LGPL-2.1 LGPL-3}",

"long_description" : "A VLC backend for the Phonon4Qt5 multimedia
library.",

*"maintainers" : [{*

*"email" : {"domain":"gmail.com
","name":"rjvbertin"},*

*"github" : "RJVB"*

*}],*

"categories" : ["audio","kde","kf5"],

"version" : "0.9.0.7",

"revision" : "0"

},

On Thu, Mar 21, 2019 at 6:18 PM Mojca Miklavec  wrote:

> That we have a bug. Please report all such instances that you find (or
> submit a PR to macports-ports, removing "nomaintainer").
>

I have skipped the "closedmaintainer" key until we fix this bug. Or shall I
implement it?

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-21 Thread Umesh Singla
Hi

For me, this project can be divided into three major parts:

1. ports page:

a) We have seen a quick demo of this already. However, the major part I
think is missing is the search. We can brainstorm over the details like
search-as-you-type, adding new ports etc according to the timeline. Not
sure how much browsing by the first letter helps.

b) You cannot have a class Ports when it represents a Port:
https://github.com/arjunsalyan/MacPorts-Demo-App/blob/master/ports/models.py#L6
.

2. build statistics:

a) In Time Elapsed of builds, it would be incorrect to show time taken for
only one of the build stages. Example, in the case of
https://build.macports.org/builders/ports-10.12_x86_64-builder/builds/87301,
total-time-taken (12 min 59 secs) is right to be shown and not "6 mins 45
secs".

b) There are some things which seem hard-coded to me. I
see '10.14_x86_64', '10.13_x86_64' at multiple places - in port detail
view, build to database view and jinja templates. It's time to define some
constants config file now. For build statuses as well. With a new release
of macOS, we do not want to have to change multiple files in code. In this
project, it is important that a part which works, it is accurate and
complete.

c) Also, as Mojca mentioned, errors like these:
http://frozen-falls-98471.herokuapp.com/ports/database/ should not be
exposed. What is it intended to do anyway?

3. installation statistics:

a) Last year we had a quick attempt at this project (installation
statistics part). It's still lying in my account. Probably the only part
where we spent some time was models.py [2]. You could review it sometime
soon and come back with doubts/suggestions. It almost has the model
implementation on the installed ports statistics with a sample submission
data file present at the root of the project. Class Port, Category, etc can
help with the first part of the project as well.

The tricky part was to figure out (still needs to be figured out) unique
ports and what to keep as the primary key. Similarly, in the case of
maintainers (and look for more such cases).

There are several variations of how maintainers are mentioned in the
Portfiles, example GitHub handles, or emails, or personal websites at
times. You may want to look into the different formats (or ask on the list
separately), and come up with a solution to parse and store.

b) I would suggest keeping all the parsing scripts together at some place
in your project. Keep a document where you have changed the parsing output
(before and after) and also, what do they parse. Naming a script parse.py
doesn't tell much. I am having difficulty tracking down and reviewing all
the scripts.


==

As Mojca said, I am not seeing any way to provide code review on Github
when it's already merged. Since you have the base application ready, it's
time to use PRs. I would also advise starting to follow at least some of
the PEP8 style guide conventions, it's good to follow clean code practices
from the beginning. You can either use coala lint or pylint before pushing
the code, if familiar.

You recently added a .gitignore file but there are .pyc, __pycache__/, etc
files from before. Consider cleaning them up.

These are all the suggestions for later, but only a small portion of what a
proposal is expected to include. Right now, I'd suggest working on the
parsing the maintainers as best as possible. It'd then be good to tackle
the project part by part.

[1]:
https://github.com/umeshksingla/macports-stats/blob/master/firstproject/firstapp/models.py

Best,
Umesh

On Thu, Mar 21, 2019 at 3:57 PM Arjun Salyan 
wrote:

> On Thu, Mar 21, 2019 at 10:20 AM Mojca Miklavec 
> wrote:
>
>> (2) You made a simple PR last time to fix portindex2json for a more
>> reasonable output of categories. Would you be willing for a tiny bit
>> more difficult task and try to improve the output for maintainers as
>> well? We would want a list of all maintainers with two optional keys
>> for each (email &  github handle) plus a boolean value to tell whether
>> the port is under openmaintainer policy.
>>
>
> Hi, I was working on this. What do I do with "nomaintainer" ? For now I am
> getting the following output.
>
> {
>
> "variants" : "clang33 clang34 clang37 clang38 clang39 clang40
> clang50 clang60 clang70 mpich mpich_devel openmpi openmpi_devel python26
> python27 python33 python34 python35 python36 python37 debug no_static
> no_single regex_match_extra universal",
>
> "subports" : "boost-numpy",
>
> "portdir"  : "devel\/boost",
>
> "description"  : "Collection of portable C++ source libraries",
>
> "homepage" : "http:\/\/www.boost.org",
>
> "epoch": "0",
>
> "platforms": "darwin",
>
> "name" : "boost",
>
> "depends_lib"  : "port:zlib port:expat port:bzip2 port:libiconv
> port:icu port:python27",
>
> *"openmaintainer"   : True,*
>
> "long_description" : "Boost provides free portable 

Re: GSoC 2019 [Collect build statistics]

2019-03-21 Thread Arjun Salyan via macports-dev
Thanks, it is clear now. I will do the changes and submit the PR.


Re: GSoC 2019 [Collect build statistics]

2019-03-21 Thread Mojca Miklavec
On Thu, 21 Mar 2019 at 13:29, Arjun Salyan wrote:
> On Thu, Mar 21, 2019 at 5:42 PM Mojca Miklavec wrote:
>
>>
>> Just create an empty list of maintainers.
>
> There are some ports which have : {ryandesign @ryandesign} nomaintainer} as 
> the output of maintainers . What does "nomaintainer" mean here?

That we have a bug. Please report all such instances that you find (or
submit a PR to macports-ports, removing "nomaintainer").

>> We also need to add emails. Maybe something like
>> "email" : { "name" : "ryandesign", "domain" : "macports.org" },
>> "github": "ryandesign"
>>
>
> Suppose current output is this: {something @someotherthing}. So, here 
> 'something' is the 'name'? And that name followed by @macports.org would give 
> the email?

The @someotherthing is github username.
The something (if it doesn't contain a color) is an email of the form
someth...@macports.org.
If that something has a colon, like gmail.com:something, then it's an
email of the form someth...@gmail.com.

Try "port info ".

> And for some ports the maintainers output is like: {gmail.com:name @gname}, 
> so there the email would be n...@gmail.com instead of n...@macports.org ?

Yes. And "@gname" is always github username.

> How do I know that the user has commit rights?

Usually any user without the colon in the email address (there might
be some mistakes of course).

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-21 Thread Arjun Salyan via macports-dev
On Thu, Mar 21, 2019 at 5:42 PM Mojca Miklavec  wrote:


> Just create an empty list of maintainers.
>
There are some ports which have : *{ryandesign @ryandesign} nomaintainer} *as
the output of maintainers . What does "nomaintainer" mean here?


> We also need to add emails. Maybe something like
> "email" : { "name" : "ryandesign", "domain" : "macports.org" },
> "github": "ryandesign"
>
>
Suppose current output is this: {something @someotherthing}. So, here
'something' is the 'name'? And that name followed by @macports.org would
give the email?

And for some ports the maintainers output is like: {gmail.com:name @gname},
so there the email would be n...@gmail.com instead of n...@macports.org ?


> Alternative would be to treat users with commit rights in a different
> way (domain is always macports), but I don't see any reason to do so.
>

How do I know that the user has commit rights?

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-21 Thread Mojca Miklavec
On Thu, 21 Mar 2019 at 11:27, Arjun Salyan
 wrote:
> On Thu, Mar 21, 2019 at 10:20 AM Mojca Miklavec  wrote:
>>
>> (2) You made a simple PR last time to fix portindex2json for a more
>> reasonable output of categories. Would you be willing for a tiny bit
>> more difficult task and try to improve the output for maintainers as
>> well? We would want a list of all maintainers with two optional keys
>> for each (email &  github handle) plus a boolean value to tell whether
>> the port is under openmaintainer policy.
>
>
> Hi, I was working on this.

Awesome!

> What do I do with "nomaintainer" ?

Just create an empty list of maintainers.

> For now I am getting the following output.
>
> "variants" : "clang33 clang34 clang37 clang38 clang39 clang40 
> clang50 clang60 clang70 mpich mpich_devel openmpi openmpi_devel python26 
> python27 python33 python34 python35 python36 python37 debug no_static 
> no_single regex_match_extra universal",

We could make this into a list as well. We also need additional info
about whether or not a variant is default, but that can be added
later.

> "depends_lib"  : "port:zlib port:expat port:bzip2 port:libiconv 
> port:icu port:python27",

I would make this a list (as for categories) as well. We could split
the "port:" prefix, but maybe later.

> "openmaintainer"   : True,

Thinking about this and "nomaintainer". Maybe go bold and use
"closedmaintainer" instead, then do the following:
- if port either has no maintainer, or is under openmaintainer policy,
set closedmaintainer to False
- else set it to True

If we keep the "openmaintainer" keword, it's not clear to me what to
do with non-maintained ports.

> "maintainers"  : [{
> "github" : "https://www.github.com/ryandesign;
> },{
> "github" : "https://www.github.com/michaelld;
> }],

Just keep the name, no need for the full URL. So probably something like
"github_username" : "ryandesign"
instead of the full URL (not sure if github or github_username) is better.

We also need to add emails. Maybe something like
"email" : { "name" : "ryandesign", "domain" : "macports.org" },
"github": "ryandesign"

Alternative would be to treat users with commit rights in a different
way (domain is always macports), but I don't see any reason to do so.

Thank you,
Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-21 Thread Arjun Salyan via macports-dev
On Thu, Mar 21, 2019 at 10:20 AM Mojca Miklavec  wrote:

> (2) You made a simple PR last time to fix portindex2json for a more
> reasonable output of categories. Would you be willing for a tiny bit
> more difficult task and try to improve the output for maintainers as
> well? We would want a list of all maintainers with two optional keys
> for each (email &  github handle) plus a boolean value to tell whether
> the port is under openmaintainer policy.
>

Hi, I was working on this. What do I do with "nomaintainer" ? For now I am
getting the following output.

{

"variants" : "clang33 clang34 clang37 clang38 clang39 clang40
clang50 clang60 clang70 mpich mpich_devel openmpi openmpi_devel python26
python27 python33 python34 python35 python36 python37 debug no_static
no_single regex_match_extra universal",

"subports" : "boost-numpy",

"portdir"  : "devel\/boost",

"description"  : "Collection of portable C++ source libraries",

"homepage" : "http:\/\/www.boost.org",

"epoch": "0",

"platforms": "darwin",

"name" : "boost",

"depends_lib"  : "port:zlib port:expat port:bzip2 port:libiconv
port:icu port:python27",

*"openmaintainer"   : True,*

"long_description" : "Boost provides free portable peer-reviewed C++
libraries. The emphasis is on portable libraries which work well with the
C++ Standard Library.",

"license"  : "Boost-1",

*"maintainers"  : [{*

*"github" : "https://www.github.com/ryandesign
"*

*},{*

*"github" : "https://www.github.com/michaelld
"*

*}],*

"categories"   : [devel],

"version"  : "1.66.0",

"revision" : "3"

}


Re: GSoC 2019 [Collect build statistics]

2019-03-20 Thread Mojca Miklavec
Dear Arjun,

Just quickly (I'm online for a very short time) I'm listing some of
the potential next steps (in no particular order), I hope that Umesh
will also comment on it:

(1) It would probably be time for a thorough code review. It's a bit
tricky to do code reviews on your personal repository. What we could
do (I would be grateful if someone from the infrastructure team could
help a bit) is create a new empty temporary repository inside the
MacPorts organisation (but without subscribing everyone with commit
access), maybe with an initial commit, and then let you make a pull
request to that one, while we would do code review until the code is
in "perfect shape".

(2) You made a simple PR last time to fix portindex2json for a more
reasonable output of categories. Would you be willing for a tiny bit
more difficult task and try to improve the output for maintainers as
well? We would want a list of all maintainers with two optional keys
for each (email &  github handle) plus a boolean value to tell whether
the port is under openmaintainer policy.

(3) The author of portindex2json asked if we could generate the json
file on our infrastructure. Maybe you could explore how our buildbot
infrastructure works and create a PR or two to allow automatically
generating json output along with PortIndex on our buildbot. Others
might be able to help with more information if I'm offline. (I left a
tiny bit of info last time.)

(4) Umesh might point you to his past attempt to create such an
application. You could review how it was done and try to start working
on accepting installation statistics.

(5) Think about how to keep the build logs up to date.

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-20 Thread Mojca Miklavec
Dear Arjun,

Thanks a lot for the changes.

More feedback later, I just noticed that
https://frozen-falls-98471.herokuapp.com/ports/gmsh/
now throws an error.

Even if the port doesn't exist in the database, it should probably say
that the port doesn't exist rather than throwing an error.

Thank you,
Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-20 Thread Arjun Salyan via macports-dev
Hi Mojca,
Thanks for the detailed reply.

Changes can be seen for this port:
http://frozen-falls-98471.herokuapp.com/ports/qt5-qtlocation/


On Wed, Mar 20, 2019 at 6:40 AM Mojca Miklavec  wrote:

> This is super useful. But I would probably link directly to the
> Portfile rather than the directory. Most ports don't have any patches,
> but if they do, one can easily browse one level higher once on the
> GitHub website.
> I would probably make those link (in particular the homepage link)
> open in a new window.
>

Thanks, I have made both the changes.

- The entries are not unique as they should be. You seem to have two
> entries for the same build (26315) for example.

- Sorting for 10.13 should be in reverse order (newest builds on top)
>

Fixed both.


> - I'm more interested in duration than end time. (Not sure if it's
> more useful to have start or stop time, but one is sufficient. The
> other one would be duration of the build.)
>

I have removed 'Stop Time' and added 'Time Elapsed'


> The missing table (no urgency) would then be more similar to this one:
> 10.13 || 10.14
> OK [link to 52248] || OK [link to 26315]
>

Implemented this.

- In BuildHistory the port_name should hold a foreign key to the port
> id rather than just holding a string with port's name (I guess that's
> many-to-one relationship in Django?).
>

Yes, but right now I can achieve this only when I have all the ports in my
aws database.


> - a useful addition would be information about commit's shasum which
> triggered this change (but that might be tricky to extract in a proper
> way)
>
Thanks, I shall give it a try.

I have also changed the script (parse build history) to detect new builds,
by comparing the last build id in the database with that on the buildbot.
It then receives only the new builds from the buildbot. I am not sure how
efficient this method be, or even if this is the right way of doing it. Now
either we can run this script at some definite interval or modify buildbot
to instruct when the script would run.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-19 Thread Mojca Miklavec
Dear Arjun,

On Tue, 19 Mar 2019 at 13:47, Arjun Salyan
 wrote:
>
> I have some more improvements to demo app:
>
> Build History is now Dynamic: By Making some minor tweaks to the python 
> script sent by Mojca, I was able to load build history from buildbot into the 
> database.

Awesome!

> I loaded only few recent logs for "10.14_x86_64" & "10.13_x86_64". Since, 
> build history of all ports is not yet on the database, so it would not appear 
> on port-detail page for all ports.

That's understandable / OK.

> To see it working, gmsh would be a fine example: 
> https://frozen-falls-98471.herokuapp.com/ports/gmsh/ . It is not very neat 
> yet, the os filter is 'just working'. But now we have a good starting point 
> to improve upon.

Thanks.

> Link to Github.

This is super useful. But I would probably link directly to the
Portfile rather than the directory. Most ports don't have any patches,
but if they do, one can easily browse one level higher once on the
GitHub website.

I would probably make those link (in particular the homepage link)
open in a new window.

> I am not very sure if the representation of build history is on the right 
> track.

- The entries are not unique as they should be. You seem to have two
entries for the same build (26315) for example.
- Sorting for 10.13 should be in reverse order (newest builds on top)
- I'm more interested in duration than end time. (Not sure if it's
more useful to have start or stop time, but one is sufficient. The
other one would be duration of the build.)
- That "marine blue" hurts my eyes. I only want "build successful" to
be of different colour, no the whole table.

The "All builds" table looks fine (other than having duplicate
entries). I would change your existing page in such a way that you
could apply the OS version filter to that table. You would then
provide options "All", "macOS 10.14", "macOS 10.13".

The missing table (no urgency) would then be more similar to this one:

10.14 10.13
OKOK

or maybe

10.13 || 10.14
OK [link to 52248] || OK [link to 26315]

or something along those lines.

Some comments about the source code:

- In BuildHistory the port_name should hold a foreign key to the port
id rather than just holding a string with port's name (I guess that's
many-to-one relationship in Django?).
- Status should probably be implemented in a slightly smarter way, so
that perhaps number 0 or "OK" stands for success, other numbers
represent other statuses, making it also easier and more efficient to
count and filter (no need to do this now, but in the real
implementation it needs a change).
- builder_name should probably point to another table (like for
categories, but no need to do it now)
- a useful addition would be information about commit's shasum which
triggered this change (but that might be tricky to extract in a proper
way)
- You point to "port builder"'s build number. It would be useful to
have "port watcher"'s build number and link as well.
- Not needed now, but later it would be nice to have a list of all the
(recent) builds as a separate view.

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-19 Thread Arjun Salyan via macports-dev
I have some more improvements to demo app:

   - *Build History is now Dynamic: *By Making some minor tweaks to the
   python script sent by Mojca, I was able to load build history from buildbot
   into the database. I loaded only few recent logs for "*10.14_x86_64*" & "
   *10.13_x86_64*". Since, build history of all ports is not yet on the
   database, so it would not appear on port-detail page for all ports. To see
   it working, gmsh would be a fine example:
   https://frozen-falls-98471.herokuapp.com/ports/gmsh/ . It is not very
   neat yet, the os filter is 'just working'. But now we have a good starting
   point to improve upon.
   - *Link to Github.*

I am not very sure if the representation of build history is on the right
track.

Thank You

>


Re: GSoC 2019 [Collect build statistics]

2019-03-18 Thread Arjun Salyan via macports-dev
On Mon, 18 Mar 2019 at 10:49 PM, Mojca Miklavec  wrote:

> And in fact I'm unable to find any indices in your DB model.


Thanks, I shall add this. I am dealing with this huge data set for the
first time.

Also, TextField might be suitable for description etc, but for short
> entries like port name, this probably offers suboptimal performance
> and CharField would make more sense. I did not time it though, and
> this is not the bottleneck in your code, but the indices are
> definitely critical for perfomance.


Yes, I have finalised the ports table now and hence, I shall change the
field types accordingly as to which one is the most suitable for the data
type in that column.

Also, I have terminated the process of populating the database- my internet
today and the free tier both are making it really difficult. I was able to
load the entire database within seconds locally.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-18 Thread Mojca Miklavec
On Mon, 18 Mar 2019 at 17:37, Mojca Miklavec wrote:
> On Mon, 18 Mar 2019 at 16:52, Arjun Salyan wrote:
>
> > All Ports and All Categories are now available (Although not all ports have 
> > populated yet, I am on AWS Free Tier and the process is really slow. At the 
> > time of drafting this email: around 500 have populated).
>
> I can imagine that the free plan would not be the fastest one in the
> world, but to me 500 entries in what I could imagine might be an hour
> since you started the job sounds like potential efficiency problem. A
> forgotten index in a table can easily increase the runtime
> polynomially or even exponentially.

And in fact I'm unable to find any indices in your DB model.

https://docs.djangoproject.com/en/2.1/topics/db/optimization/
https://www.djangorocks.com/snippets/indexing-your-django-models.html

Also, TextField might be suitable for description etc, but for short
entries like port name, this probably offers suboptimal performance
and CharField would make more sense. I did not time it though, and
this is not the bottleneck in your code, but the indices are
definitely critical for perfomance.

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-18 Thread Mojca Miklavec
On Mon, 18 Mar 2019 at 16:52, Arjun Salyan wrote:
>
> Some improvements to the Demo App: https://frozen-falls-98471.herokuapp.com

Thank you very much.

> All Ports and All Categories are now available (Although not all ports have 
> populated yet, I am on AWS Free Tier and the process is really slow. At the 
> time of drafting this email: around 500 have populated).

I can imagine that the free plan would not be the fastest one in the
world, but to me 500 entries in what I could imagine might be an hour
since you started the job sounds like potential efficiency problem. A
forgotten index in a table can easily increase the runtime
polynomially or even exponentially.

> On the Port-Detail page, the categories are now clickable and lead to the 
> list of ports under that category.

Cool!

What I would be potentially missing there is a number of ports under
that category. When I clicked around, several categories were empty.
You just need to make sure that all the relevant columns in the
database are indexed to make this efficient enough.

> I was able to parse the entire PortIndex.json using a python script and 
> successfully converted it to Django fixtures which could then be populated to 
> the database.

That's very good news. What exactly did you have to change to make it
work compared to last time?

> (I used the portindex.json outputted by current version of portindex2json.tcl 
> and fixed the issues with categories using same python script)
>
> Parse.py : 
> https://github.com/arjunsalyan/MacPorts-Demo-App/blob/master/MacPorts/parse.py

Thank you, I will look into it and provide further feedback.

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-18 Thread Arjun Salyan via macports-dev
Some improvements to the Demo App: https://frozen-falls-98471.herokuapp.com

   - All Ports and All Categories are now available (Although not all ports
   have populated yet, I am on AWS Free Tier and the process is really slow.
   At the time of drafting this email: around 500 have populated).
   - On the Port-Detail page, the categories are now clickable and lead to
   the list of ports under that category.

I was able to parse the entire PortIndex.json using a python script and
successfully converted it to Django fixtures which could then be populated
to the database. (I used the portindex.json outputted by current version of
portindex2json.tcl and fixed the issues with categories using same python
script)

Parse.py :
https://github.com/arjunsalyan/MacPorts-Demo-App/blob/master/MacPorts/parse.py

>


Re: GSoC 2019 [Collect build statistics]

2019-03-17 Thread Mojca Miklavec
V ned., 17. mar. 2019 20:09 je oseba Arjun Salyan <
arjun.salyan.ch...@itbhu.ac.in> napisala:

> On Sun, Mar 17, 2019 at 1:31 AM Joshua Root  wrote:
>
>> It would be a good idea to check if they have any changes on their end
>> that we're missing, too.
>>
>
> They have made one change: "Make portindex2json.tcl always work with
> utf-8, insensitive to local settings".
>
> Should we incorporate this change also?
>

Yes. You could open two PRs or one with two commits. If you know author's
github handle (for repology), also mention that username in PR.

Mojca

>


Re: GSoC 2019 [Collect build statistics]

2019-03-17 Thread Arjun Salyan via macports-dev
On Sun, Mar 17, 2019 at 1:31 AM Joshua Root  wrote:

> It would be a good idea to check if they have any changes on their end
> that we're missing, too.
>

They have made one change: "Make portindex2json.tcl always work with utf-8,
insensitive to local settings".

Should we incorporate this change also? How do I indicate that this is a
new version, as suggested by Craig.
I shall then proceed with the pull request.

Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-16 Thread Joshua Root
On 2019-3-17 05:02 , Craig Treleaven wrote:
> I believe portindex2json.tcl was created to feed Repology.org
>  so if changes are made it would be polite to
> indicate that it is a new version.

It would be a good idea to check if they have any changes on their end
that we're missing, too.

- Josh


Re: GSoC 2019 [Collect build statistics]

2019-03-16 Thread Craig Treleaven
> On Mar 16, 2019, at 1:00 PM, Mojca Miklavec  wrote:
> 
> Dear Arjun,
> 
> V sob., 16. mar. 2019 21:46 je oseba Arjun Salyan napisala:
> I have tried to make some changes in portindex2json.tcl so that the value of 
> categories is outputted as a list and not just a string.
> 
> Can someone please review if it seems fine:
> https://github.com/arjunsalyan/Test/blob/master/portindex2json.tcl 
> 
> 
> Thank you very much.
> 
> I didn't look at the code yet (internet connectivity is barely sufficient to 
> sync email at the moment), but may I suggest submitting a pull request to 
> https://github.com/macports/macports-contrib 
>  ? That's by far the most 
> efficient way to get the code reviewed and merged.
> 

I believe portindex2json.tcl was created to feed Repology.org 
 so if changes are made it would be polite to indicate 
that it is a new version.

Craig




Re: GSoC 2019 [Collect build statistics]

2019-03-16 Thread Mojca Miklavec
Dear Arjun,

V sob., 16. mar. 2019 21:46 je oseba Arjun Salyan napisala:

> I have tried to make some changes in portindex2json.tcl so that the value
> of categories is outputted as a list and not just a string.
>
> Can someone please review if it seems fine:
> https://github.com/arjunsalyan/Test/blob/master/portindex2json.tcl
>

Thank you very much.

I didn't look at the code yet (internet connectivity is barely sufficient
to sync email at the moment), but may I suggest submitting a pull request
to https://github.com/macports/macports-contrib ? That's by far the most
efficient way to get the code reviewed and merged.

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-16 Thread Arjun Salyan via macports-dev
I have tried to make some changes in portindex2json.tcl so that the value
of categories is outputted as a list and not just a string.

Can someone please review if it seems fine:
https://github.com/arjunsalyan/Test/blob/master/portindex2json.tcl

Sample Output (new):

{

"variants" : "universal",

"portdir"  : "aqua\/AppKiDo",

"description"  : "AppKiDo is an API documentation browser for Cocoa
programmers",

"homepage" : "http:\/\/appkido.com\/",

"epoch": "0",

"platforms": "darwin",

"name" : "AppKiDo",

"license"  : "MIT",

"maintainers"  : "nomaintainer",

"long_description" : "AppKiDo is a free reference tool for Cocoa
Objective-C programmers. It parses the header files and HTML documentation
files provided by Developer Tools and presents the results in a powerful
interface.",

"version"  : "0.997",

"categories"   : [aqua,devel],

"revision" : "0"

},



Thank You


Re: GSoC 2019 [Collect build statistics]

2019-03-16 Thread Arjun Salyan via macports-dev
On Sat, 16 Mar 2019 at 7:51 AM, Mojca Miklavec  wrote:

> JFYI:
>
> It might theoretically be a valid situation to have two entries with the
> same name (software gets deleted, then two years later an unrelated
> software with the same name gets added; but only one entry would have the
> status "active" and the others not). A lot more common case would be that a
> port gets renamed, even if just by changing the capitalisation. Or the
> version or -devel suffix gets attached to the end, then removed again ...
>

We will always have the option to identify ports with primary keys in the
database. But since, only one entry will have active status at any given
time, so we can go for two filters- first the port name obatined from the
url and whether it is active or not.

The parse-builbot-logs.py script worked perfectly fine. Thank You so much.
>From last year’s email archive I do got an understanding that we will have
to store these logs in the database and not fetch them from buildbot
everytime. And also, device a method to fetch logs at regular interval and
load them into the database.


Now that I have idea (just a starting point)  of these three things:
1. Using portindex- getting info of all the ports.
2. Using mpstats- submitting stats to the new django app.
3. from builbot- getting history of builds.


I am framing the tasks I need to take upon in the upcoming week, before I
actually frame my first proposal. (Please correct me if I am not heading in
the right direction)

- improving existing functionalities of the demo app and making it more and
more dynamic.
-Trying to modify the tools or methods used to obtain the data
(portindex2json etc.)
- Working on more functionalities and get the starting point for them just
like the three mentioned above (like build reproducibility)

Thank You

>


Re: GSoC 2019 [Collect build statistics]

2019-03-15 Thread Mojca Miklavec
JFYI:

It might theoretically be a valid situation to have two entries with the
same name (software gets deleted, then two years later an unrelated
software with the same name gets added; but only one entry would have the
status "active" and the others not). A lot more common case would be that a
port gets renamed, even if just by changing the capitalisation. Or the
version or -devel suffix gets attached to the end, then removed again ...

But that might get arbitarily complicated.

You may keep the potential need for convoluted history in the back of your
mind when designing the solution and try to work with internal unique port
IDs as numbers instead of strings etc., but we would rather have a simple
solution working than a complex solution 70% done and never finished &
deployed. I would definitely not overcomplicate the solution.

We do need to mark ports as inactive when they become obsolete or removed,
but that's just about where I would stop with "complex situations".

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-14 Thread Mojca Miklavec
Dear Arjun,

(Sorry if some things repeat from what Umesh sent you. I have already
written the reply yesterday, but did not manage to come back online
until now.)

Thank you very much for your demo app.

On Wed, 13 Mar 2019 at 17:52, Arjun Salyan wrote:
>
> As suggested, I have made an attempt at a basic demo app:
> https://frozen-falls-98471.herokuapp.com
>
> Please review it and let me know if this seems fine. After applying any 
> further inputs, I shall proceed with the documentation to setting it up.

Wonderful, thank you. (The review would usually also include review of
the sources, so that you can get further suggestions for improvements.
Since you provided just the link to the app, I'm only providing you
feedback based on that one.)

0.) I have some general suggestions about organisation/layout of the
first/entry page, but it's not relevant at this stage, and it's
something that can be changed easily at any time.

1.) There seem to be some issues with the database. For
"/ports/ArpSpyX/" I get the following error:

MultipleObjectsReturned at /ports/ArpSpyX/

get() returned more than one Ports -- it returned 2!

Request Method:GET
Request URL:https://frozen-falls-98471.herokuapp.com/ports/ArpSpyX/
...

I assume that you would need to use a primary key to avoid such
problems? (Honestly I'm not entirely sure how this should work in case
some port Foo gets deleted one day, and another day another unrelated
software with that same name gets introduced. The primary key should
probably be a number, and then you would have a mapping between the
name and the number corresponding to the latest port. Not that you
should worry about such cases, but you should definitely not allow
your database to have multiple entries for a single entity.)



> It is not completely static. The port information is fetched from database 
> and is not hard-coded.

Well done.

> What I did was to clone macports-ports, and used portindex2json.tcl to 
> populate the database. However, the format of the json outputted by 
> portindex2json could not be directly loaded into the database. I had to 
> manually make it in the format of fixtures accepted by Django (and that is 
> why only few ports are available), which means that portindex2json would also 
> require some modification.

Now that you mention it I seem to remember some issues.
I found an email that was left unresponded and I didn't pursue on the
issue since:
https://lists.macports.org/pipermail/macports-dev/2018-March/037726.html
We should try to get this fixed.

I'm currently not at home for another 11 days and have somewhat
limited access to both computer and internet etc., so I'm not sure if
I can check this before I'm back, but maybe someone else can also look
into it.

What you may try to do is write a few lines of python script to do one
single thing: parse the json file from portindex2json (with the
library) and print the info back to the screen (first just the name,
then add a few additional fields). I assume this will fail as well,
and you'll know at which entry (port) that happened exactly. Then
eliminate or fix that entry (and maybe open a bug report, or write
here in case you'll be faster than others investigating the issues).

Another option (or if json doesn't load at all) would be to do some
kind of bisection (deleting entries) until you find the problematic
entries.

Note that Heroku's free account will probably not allow storing the
full set of ports. I assume that some check like "if first letter of
the port name is alphabetically smaller than letter " could do the
trick if you hit that limitation.

> The build history is hard-coded.

No problem at all. We didn't ask for a fully functional app :)

> Also I need suggestions here- what info is most important to be displayed on 
> the port info page.

The precise representation might still need to be defined / discussed
/ changed later. I expect that after the first implementation users /
developers would come up with suggestions for improvements (which is
why it is important to deploy the first working app early in the
summer).

But here's my view of what I consider most important:

(a) I want to know if all the builds succeeded. If yes, I'm perfectly
happy with one single green check saying that
version X.Y / revision Z / commit abcd
was successfully built + maybe info whether it is distributable (=
Can it be downloaded?
Sometimes licence prohibits distribution)

(b) If some builds did not pass, I want to know:
- on which OS version it failed / did not build at all / is still pending
  (I would prefer some table representation with red / green /
yellow / gray / ... text or background)
- what was the last version (and when) that was successfully built

I'm not sure what representation would work best to deliver that info,
but I would not worry about it too much for now. For start, even a
sequential list of all builds of that port on any given OS would do
the job.

Later on, 

Re: GSoC 2019 [Collect build statistics]

2019-03-14 Thread Umesh Singla
Hi Arjun

On Wed, Mar 13, 2019 at 10:22 PM Arjun Salyan via macports-dev <
macports-dev@lists.macports.org> wrote:

> As suggested, I have made an attempt at a basic demo app:
> https://frozen-falls-98471.herokuapp.com
>
> Please review it and let me know if this seems fine. After applying any
> further inputs, I shall proceed with the documentation to setting it up.
>

This looks great for the first attempt. It would be great if you could
share the details on the flow of data (from portindex to html views), and
how you would extend it to incorporate mpstats output later.


> It is not completely static. The port information is fetched from database
> and is not hard-coded. What I did was to clone macports-ports, and used
> portindex2json.tcl to populate the database. However, the format of the
> json outputted by portindex2json could not be directly loaded into the
> database. I had to manually make it in the format of fixtures accepted by
> Django (and that is why only few ports are available), which means that
> portindex2json would also require some modification.
>

There are some known issues with portindex2json output, possibly related to
quotations, or strings and lists. Can you paste here the issues you ran
into? This seems like a priority to me.


> The build history is hard-coded. Also I need suggestions here- what info
> is most important to be displayed on the port info page. For now, I could
> only figure out to show build history for different os. Am I doing it
> correctly?
>

Apart from build stats, a link to a port's portfile on Github, dependencies
(with versions), or status of the build (on each OS?) will be helpful. It
would be better if the categories are clickable, which shows me the list of
ports belonging to that category (like a filter) but not for now. Better
parsing of maintainers will be useful too.

Also, the OS filter seems to be not working, probably there's no data but
on selecting 10.13 for AppHack, it still showed base-10.14_x86_64.


> The installation statistics, which currently run independently would also
> be integrated with port-info in the new app.
>
> Also, I have used bootstrap for this demo app, is that fine or we need to
> go with something more powerful like angular or react.
>

Bootstrap is fine as long as we are leveraging Django's jinja templating
and there's no need to expose APIs.

Umesh


> Thank You.
>


Re: GSoC 2019 [Collect build statistics]

2019-03-13 Thread Arjun Salyan via macports-dev
As suggested, I have made an attempt at a basic demo app:
https://frozen-falls-98471.herokuapp.com

Please review it and let me know if this seems fine. After applying any
further inputs, I shall proceed with the documentation to setting it up.

It is not completely static. The port information is fetched from database
and is not hard-coded. What I did was to clone macports-ports, and used
portindex2json.tcl to populate the database. However, the format of the
json outputted by portindex2json could not be directly loaded into the
database. I had to manually make it in the format of fixtures accepted by
Django (and that is why only few ports are available), which means that
portindex2json would also require some modification.

The build history is hard-coded. Also I need suggestions here- what info is
most important to be displayed on the port info page. For now, I could only
figure out to show build history for different os. Am I doing it correctly?

The installation statistics, which currently run independently would also
be integrated with port-info in the new app.

Also, I have used bootstrap for this demo app, is that fine or we need to
go with something more powerful like angular or react.

Thank You.


Re: GSoC 2019 [Collect build statistics]

2019-03-11 Thread Mojca Miklavec
On Mon, 11 Mar 2019 at 09:18, Arjun Salyan via macports-dev wrote:
>
> I have a couple of doubts here:
>
> 1. Once I install mpstats does it still send weekly reports? I could not find 
> the required code to do this in the port files. Code for manually submitting 
> is available in mpstats.tcl but I am unable to locate the code for automating 
> the submissions. What am I missing here?

Maybe try
port contents mpstats
and check the files that MacPorts reports as belonging to mpstats.

> 2. I am completely new to Buildbot. I went though their website buildbot.net, 
> I understand its functioning, but am not getting the practical approach. Can 
> you give me a brief idea of how it is implemented with Macports? Like what 
> are the events that are taking place and when?

Joshua already sent you the link to master.cfg which is the python
code that checks for new commits in the repository and starts the
builds. The other missing link might be
https://github.com/macports/mpbb, which are shell scripts used to
reduce the complexity of master.cfg (helper/convenience scripts
between buildbot and macports). The same scripts (mpbb) are used for
Travis and Azure pipelines.

(Joshua: I thought that buildbot watches the git repository on its
own, rather than listening to webhooks, but I might be wrong.)

The "results" are probably easiest to see here:
https://build.macports.org/waterfall

In principle it goes as follows:

- something is pushed to git(hub)
- buildbot sees the commits and triggers a build on every "port
watcher" (on different macOS versions)
  - a list of modified ports is established
  - for each port that needs to be built
- first the sources are fetched / mirrored (jobs-mirror)
- then the port is built and uploaded to the server in multiple steps
  - buildbot also notifies GitHub whether the build was successful or
not (you'll see a green checkbox / red cross next to commits in
macports-ports commit history)

The build can also be triggered manually.

Sadly we are still using buildbot 0.8 rather than 2.x. The github
repository contains instructions for running a buildbot master and
slave locally on your machine, so that you can test things, as Joshua
suggested.

If something is not clear, please ask in more detail.

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-11 Thread Mojca Miklavec
On Mon, 11 Mar 2019 at 09:41, Arjun Salyan wrote:
> On Fri, Mar 8, 2019 at 11:08 PM Mojca Miklavec wrote:
>>
>> In the case of this Django app it probably makes sense to come up with a 
>> proof-of-concept demo (ideally including documentation about how to set it 
>> up and running, and either screenshot or hyperlink).
>
> Does this mean a static site about how the project would look or a dynamic 
> site with database and all the needed tables?

A static site (probably with made-up statistics data, but trying to
use proper port names etc.) and all the information as expected in the
final product would be super useful part of application itself. By
carefully planning what should be on the page and how to organize it,
it would be much easier and more straightforward to work on the
project during the summer, knowing precisely where the project should
be heading. Some basic css could be nice, but not absolutely needed:
it's more important to have the contents well-planned (even if drawn
with pencil and paper only).

A dynamic site on the other hand would serve as demonstration of your
skills. Yes, that includes the database, but not necessarily all the
tables. We did some brainstorming about design of the database during
the last GSOC application period, but that doesn't mean that the
design is final or perfect: the exact table layout would likely
change, both during project planning and implementation. When doing
the demo app, I would suggest to keep a logbook about the steps you
did; something that could later be turned into developer documentation
/ tutorial (how you created the project, how another developer can run
it on his machine).

The static and dynamic site could also be combined if needed (some
small part of the site would be dynamic, while the rest would be a
hardcoded example that you would later replace with real data from the
database).

Depending on your prior familiarity with the tools involved and the
time you want to invest into proposal / demo, think about what your
sample app could be (ideally something that you could simply build
upon / continue developing if you get selected; rather than some code
that you would later throw away). We can help you with ideas, but it
makes more sense to start with your initiative at first.

Another general suggestion (for any gsoc candidate): you don't need to
wait for the official GSOC submissions to start, and you definitely
don't want to wait for the final call. The sooner you send us a link
to your proposal and/or your demo, the earlier we can start providing
feedback, meaning that you would have plenty of time to make both
proposal and sample app better & higher chances to get selected.

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-11 Thread Joshua Root
On 2019-3-11 19:18 , Arjun Salyan wrote:
> I have a couple of doubts here:
> 
> 1. Once I install mpstats does it still send weekly reports? I could not
> find the required code to do this in the port files. Code for manually
> submitting is available in mpstats.tcl but I am unable to locate the
> code for automating the submissions. What am I missing here?

The launchd plist it installs causes it to run periodically.

> 2. I am completely new to Buildbot. I went though their website
> buildbot.net , I understand its functioning, but am
> not getting the practical approach. Can you give me a brief idea of how
> it is implemented with Macports? Like what are the events that are
> taking place and when?

Buildbot is a fairly complex multilayered system and takes a while to
grok. Our configuration is here:

(master.cfg is the main file there). It may help to try to set up a
small buildbot of your own?

The basic process is that whenever a commit is pushed to our GitHub
repo, buildbot is notified via a webhook, and then figures out which
ports were touched by the commit and builds them (on each OS version).

- Josh


Re: GSoC 2019 [Collect build statistics]

2019-03-09 Thread Joshua Root
On 2019-3-9 22:55 , Saagar Jha wrote:
> 
> FWIW, while I am not opposed to MacPorts adding clearly communicated,
> opt-in, self-hosted analytics; I would be /very/ strongly against doing
> this if any of these conditions was not met. Doubly so if the initial
> discussion to do this is made without users being able to provide input,
> triply if the feature stays largely unchanged even when users complain,
> and quadruply if this discussion is forcefully closed, unresolved,
> because “the feature has already shipped”.

Yes, gathering of this data needs to remain strictly opt-in, and using
something like Google Analytics would be a whole new rabbit hole of
issues that we don't want.

It's already easy enough for users to participate by installing mpstats;
the missing piece is advertising its existence and suggesting that users
install it. The only reason we're not doing that is that the stats
system still needs work.

- Josh


Re: GSoC 2019 [Collect build statistics]

2019-03-08 Thread Mojca Miklavec
On Fri, 8 Mar 2019 at 15:45, Arjun Salyan wrote:
>
> I do not understand : "perhaps include some basic functionality to allow 
> checking for build reproducibility".
>
> Please help me with that 'build reproducibility' point

Others sent you some links about what build reproducibility is, please
ask if that's not clear.

What we could do is let users opt-in to submit more info about their
own builds. The most basic way would be to calculate the checksum (on
the buildbot) of the compressed collection of files installed, and
store that checksum in the database. Then users could submit their own
build results and include the checksum from their own builds. If the
checksums match, fine. If they don't, this could be reported, but it
would be nice to know more about why the mismatch, and this is
something that needs further analysis / ideas etc.

As an example: software might include the build date into one of the
installed files. This makes the build non-reproducible since the exact
contents of the installed package depend on when the software was
built. Such cases need to be discovered and fixed.

Build reproducibility is important to assure that compiler or any
other components didn't get compromised.

> and also how do I plan from here (I know Django, but I am still learning 
> about MacPorts)?

I'm not exactly sure what your question is. Are you looking for
suggestions about what to learn next, about what to do next, ...?

The best way to learn about MacPorts is to start playing with it,
install a bunch of your favourite software packages, install the
"mpstats" port and figure out how it works, set up a local files
repository with a git clone of macports-ports
(https://guide.macports.org/chunked/development.local-repositories.html)
and try to modify some port, install the modified version etc. ... I
assume you do have a Mac?

We will ask every GSOC candidate to prove that they are capable of
implementing the idea. Most projects ask for PRs. In the case of this
Django app it probably makes sense to come up with a proof-of-concept
demo (ideally including documentation about how to set it up and
running, and either screenshot or hyperlink). Tcl is not the core of
the project, but it helps to know some of it, so that you can also add
the missing fields to the client.

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-08 Thread Mojca Miklavec
Hi,

On Fri, 8 Mar 2019 at 17:12, Craig Treleaven wrote:
>
> Have you seen the following wiki page?
>
> https://trac.macports.org/wiki/StatisticsIdeas
>
> There are some deficiencies in the current data collected.  A key issue is 
> whether a port was requested or installed as a dependency.  That then leads 
> to the need for a versioned API.  Other data elements need a re-think.

In any case I would say that there are way more deficiencies on the
server side than on the client side. Yes, we need to improve the
format based on the need, but even if we start from the existing data
that's currently being sent there's a lot we could do better on the
server.

> My interpretation is that some key MacPorts people had privacy concerns 
> related to collecting such information.  As such, there was no appetite to 
> strongly encourage users to participate in submitting the data.  In crude 
> terms, the “opt-out” v. “opt-in” question.  I don’t know if that is now 
> changed or not.

Just to make it clear: I don't think that opt-in vs. opt-out is
relevant to the implementation of the project in any way. This is a
completely independent decision. I would say that we need to do better
advertisement in any case, but so far it did not make much sense since
the current statistics page is not as useful.

> I fall firmly on the side that it is fair to ask users for such data as it 
> helps us to understand how MacPorts is used.  That can then guide us, as 
> MacPorts contributors, into where to channel our available time.  I note that 
> Homebrew collects such information and there does not seem to be much 
> resistance, if any.  I think relying on opt-in would mean poor data quality 
> and that implementing such a collection and reporting system is largely a 
> waste of time.  IMHO.

We may reconsider the decision later, but there's currently no need to
change anything right now. The part which works on the user's side
might need some minor tweaks, but it's basically working. I would say
that we first need to get a solid solution, and potentially continue
the discussion in a completely independent thread.

> PS My mail archive show me that we’ve been talking about such a facility for 
> more than 5 years!

It was initially a GSOC 2011 project which got abandoned after the
GSOC was over and written in a framework that nobody from the core
team had experience with. So it could have been nearly 8 years :)
Quite some while after GSOC Clemens has managed to get it into a state
that could run on his server, but then we collected the list of all
the things that were wrong, which nobody ever fixed.

Mojca


Re: GSoC 2019 [Collect build statistics]

2019-03-08 Thread Craig Treleaven
> On Mar 8, 2019, at 9:45 AM, Arjun Salyan via macports-dev 
>  wrote:
> 
> Thank you Mojca.
> 
> The provided references have cleared a lot of my doubts and I am really 
> interested to do this project: 'Collect build statistics'
> 
> Here is what I have understood so far:
> 1. dynamic page for each port displaying basic information (description, 
> version etc.), installation stats, build history etc.
> 2. From suggested ideas, I found the following to be added to each page:
> whether the current version of port built on each particular OS/arch
> when was the last time the port built on that OS/arch
> links to all builds
> list of installed files, differences in installed files on different OS 
> versions
> perhaps include some basic functionality to allow checking for build 
> reproducibility
> what is the latest version of port (in case it's already outdated)
> I do not understand : "perhaps include some basic functionality to allow 
> checking for build reproducibility".
> 
> 3. I would further want to take up the task of migrating a redesigned website 
> (or some components) into the same Django* app.
> 
> Please help me with that 'build reproducibility' point and also how do I plan 
> from here (I know Django, but I am still learning about MacPorts)?


Have you seen the following wiki page?

https://trac.macports.org/wiki/StatisticsIdeas 


There are some deficiencies in the current data collected.  A key issue is 
whether a port was requested or installed as a dependency.  That then leads to 
the need for a versioned API.  Other data elements need a re-think.

My interpretation is that some key MacPorts people had privacy concerns related 
to collecting such information.  As such, there was no appetite to strongly 
encourage users to participate in submitting the data.  In crude terms, the 
“opt-out” v. “opt-in” question.  I don’t know if that is now changed or not.  

I fall firmly on the side that it is fair to ask users for such data as it 
helps us to understand how MacPorts is used.  That can then guide us, as 
MacPorts contributors, into where to channel our available time.  I note that 
Homebrew collects such information and there does not seem to be much 
resistance, if any.  I think relying on opt-in would mean poor data quality and 
that implementing such a collection and reporting system is largely a waste of 
time.  IMHO.

Craig

PS My mail archive show me that we’ve been talking about such a facility for 
more than 5 years!  



Re: GSoC 2019 [Collect build statistics]

2019-03-08 Thread Marcus Calhoun-Lopez


> On Mar 8, 2019, at 7:45 AM, Arjun Salyan via macports-dev 
>  wrote:
> 
> Please help me with that 'build reproducibility' point and also how do I plan 
> from here (I know Django, but I am still learning about MacPorts)?

There is a site on our Wiki on reproducible builds: 
https://trac.macports.org/wiki/ReproducibleBuilds
Several other projects are interested in this concept (see, e.g., 
https://reproducible-builds.org and https://wiki.debian.org/ReproducibleBuilds).

-Marcus

Re: GSoC 2019 [Collect build statistics]

2019-03-08 Thread Arjun Salyan via macports-dev
Thank you Mojca.

The provided references have cleared a lot of my doubts and I am really
interested to do this project: 'Collect build statistics'

Here is what I have understood so far:
*1.* dynamic page for each port displaying basic information (description,
version etc.), installation stats, build history etc.
*2.* From suggested ideas, I found the following to be added to each page:

   - whether the current version of port built on each particular OS/arch
   - when was the last time the port built on that OS/arch
   - links to all builds
   - list of installed files, differences in installed files on different
   OS versions
   - perhaps include some basic functionality to allow checking for build
   reproducibility
   - what is the latest version of port (in case it's already outdated)

I do not understand : "perhaps include some basic functionality to allow
checking for build reproducibility".

*3.* I would further want to take up the task of migrating a redesigned
website (or some components) into the same Django* app.

Please help me with that 'build reproducibility' point and also how do I
plan from here (I know Django, but I am still learning about MacPorts)?

*I haven't finalised Django yet, but it seems to be the most suitable one.

Thank You.



On Thu, Mar 7, 2019 at 2:22 AM Mojca Miklavec  wrote:

> Dear Arjun,
>
> Welcome to MacPorts!
>
> On Wed, 6 Mar 2019 at 21:05, Arjun Salyan via macports-dev wrote:
> >
> > Hello,
> > I am Arjun Salyan, a GSoC'19 aspirant. I have familiarised myself with
> MacPorts, but still there is a way to go on with the documentation, and
> learning tcl.
> >
> > I am quite experienced in web and app development- with multiple
> Python(Django) and PHP projects having worked for a company. I am looking
> to club my previous skillset with all what I am learning right now about
> MacPorts.
> >
> > In suggested ideas and from mails, I do see 'a Django App', 'Django App
> to collect statistics'. I can work on these ideas further-
>
> If you are interested in this idea, you should check
> https://github.com/macports/macports-webapp/tree/master/docs
> as well as look at the archives of this mailing list from the last
> summer (there might have been about a hundred emails about this
> particular topic).
>
> A student abandoned that GSOC project last year, but there is a lot of
> useful information and explanations available.
>
> > and in addition I can also take up tasks like improving/ redesigning the
> website, documentation and 'Available Ports' page can be made much more
> interactive and easy to find ports.
>
> Available ports would by design be part of the django app, and design
> is well related. It's of course more important to have a working app
> et the end of the summer with ugly design than perfect design and
> defunct app :), but doing a good design for the app could be the first
> step towards a better website. I leave it up to you to come up with
> suggestions about design if you would like to contribute in that area
> (I cannot be of any help there :).
>
> > Will planning a proposal on such ideas be good, or I shall work more
> towards the macports_base?
>
> You should work on whatever you are more passionate about. It makes a
> lot less sense to be forced to work on something you are not so
> interested in, since that would be less fun and lead to worse results.
> This applies to both selecting the most suitable org, as well as to
> selecting a suitable task within that org.
>
> (Our only limitation is that we don't want two students to work on the
> same project. It is still too early to know what others might be
> interested in, but you should be able to figure that out also by
> following the mailing list conversations.)
>
> > Any suggestions on how independent apps can be used to contribute to
> macports would be very helpful.
>
> What exactly is your question?
>
> In case of Django app, I would be co-mentoring, I just wanted to
> mention that I'll be travelling until the 24th, so I won't have access
> to internet connection (or rather: a comfortable keyboard to type on)
> on regular basis, but I believe that others should be able to answer
> your requests in the meantime.
>
> Mojca
>