Re: RFH: Debian derivatives census
On Wed, 2020-09-09 at 15:00 -0400, Jeremiah C. Foster wrote: > This sounds very useful - how can I follow along on the discussion? Is > there a separate email list for this topic? There is no discussion about using the snapshot API in the census, just a FIXME item in the patches generation script. The debian-derivatives mailing list and IRC channel are probably the best places to discuss the derivatives census scripts once this thread is concluded. > I'll review those links to find out more and see if I'm able to > contribute there. The file that causes the RAM issue is 555MB of YAML and is here: http://deriv.debian.net/sources.patches -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part
Re: RFH: Debian derivatives census
On Sun, 2020-09-06 at 12:22 +0800, Paul Wise wrote: > On Thu, 2020-09-03 at 14:12 -0400, Jeremiah C. Foster wrote: > > > I would like to add that I've recently learned that the Derivatives > > Census can help determine programmatically the delta between Debian > > and > > a Derivative (if things are correctly configured.) For a > > distribution > > such as ours which aims for binary compatibility and wants to stay > > as > > close to Debian as possible, this is extremely valuable. > > I think you are referring to the patch generation? > > https://wiki.debian.org/Derivatives/Integration#Patches > > The size of the metadata about the patches is what is causing the > memory issues. > > The patch generation itself currently can only be run on the Debian > servers at LeaseWeb because it relies on access to the snapshot.d.o > database and hash based filesystem. There is a TODO item about > porting > it to the snapshot.d.o API instead so that derivatives who have > private > apt repositories can also run it locally. This sounds very useful - how can I follow along on the discussion? Is there a separate email list for this topic? > > > I feel that is our responsibility to contribute back to Debian > > (which > > we try to do) everything we can and I think that contributing time > > and > > effort is the least we can do. > > Excellent, please take a look at the census codebase and the wiki > pages > I have linked to and run the codebase locally to see how it works. Will do! > > The Debian package tracker will be of particular interest to me > > because > > of the ability to understand the delta from Debian to a derivative. > > I'm > > more than happy to contribute in any way I can and will review > > those > > URLs to find some low-hanging fruit to get me started. > > The main work needed on the package tracker is to replace the Ubuntu > panel with a patches panel that links to available patches in various > places including from the derivatives census. > > https://bugs.debian.org/779400 Super useful, I'll review to see where I can participate. > > Is there are preferred channel for communication? > > Is the mailing list preferred over IRC? > > This thread and the debian-derivatives mailing list and IRC channel > are > good places to discuss the census and I'll respond in either of them. Great, thanks. > > Regarding RAM and CPUs, I have a VM running Bullseye at Linode > > which we > > can use for Gitlab runners or the like. Perhaps this will be of > > use? > > The RAM issue is mainly caused by part of the service not being > written > in a scalable way, since it just loads giant YAML files into memory. > Throwing more RAM at the problem or making the memory storage more > efficient would be the wrong approach, since eventually the patch > metadata in YAML files will exceed the available RAM. A database > would > be a better way to do it. So we need changes to the codebase to store > the data in a database instead plus a script to stream the YAML into > the database without loading it all into RAM. A couple of links I > gathered on the problem. > > https://habr.com/en/post/458518/ > https://news.ycombinator.com/item?id=20401055 > https://stackoverflow.com/questions/429162/how-to-process-a-yaml-stream-in-python I'll review those links to find out more and see if I'm able to contribute there. Thanks again, Jeremiah signature.asc Description: This is a digitally signed message part
Re: RFH: Debian derivatives census
On Thu, Sep 3, 2020 at 2:56 PM Francisco M Neto wrote: > I'd love to join! What do I do? > > I can (mostly) hold my own in those languages. Great! I suggest you start by looking at the wiki pages I mentioned, downloading the codebase and try running it locally. As I said the biggest problem is the RAM usage from loading the YAML files but there are lots of TODO/FIXME items sprinkled throughout the codebase and some ideas for features on the wiki pages. If you have any questions I'll be available on the debian-derivatives IRC channel and mailing list, or this thread. -- bye, pabs https://wiki.debian.org/PaulWise
Re: RFH: Debian derivatives census
On Thu, Sep 3, 2020 at 2:42 PM Sicelo wrote: > This project sounds interesting, and I would like to avail myself to > help/learn as much as possible. I know some basics in Python, SQL, and > shell, but not Perl. Great! The Perl parts are quite minimal (just for discovering RSS feeds and downloading favicons) so you can easily ignore those. I suggest you start by looking at the wiki pages I mentioned, downloading the codebase and try running it locally. There are lots of TODO/FIXME items sprinkled throughout the codebase and some ideas for features on the wiki pages. If you have any questions I'll be available on the debian-derivatives IRC channel and mailing list, or this thread. -- bye, pabs https://wiki.debian.org/PaulWise
Re: RFH: Debian derivatives census
On Thu, 2020-09-03 at 14:12 -0400, Jeremiah C. Foster wrote: > I would like to add that I've recently learned that the Derivatives > Census can help determine programmatically the delta between Debian and > a Derivative (if things are correctly configured.) For a distribution > such as ours which aims for binary compatibility and wants to stay as > close to Debian as possible, this is extremely valuable. I think you are referring to the patch generation? https://wiki.debian.org/Derivatives/Integration#Patches The size of the metadata about the patches is what is causing the memory issues. The patch generation itself currently can only be run on the Debian servers at LeaseWeb because it relies on access to the snapshot.d.o database and hash based filesystem. There is a TODO item about porting it to the snapshot.d.o API instead so that derivatives who have private apt repositories can also run it locally. > I feel that is our responsibility to contribute back to Debian (which > we try to do) everything we can and I think that contributing time and > effort is the least we can do. Excellent, please take a look at the census codebase and the wiki pages I have linked to and run the codebase locally to see how it works. > The Debian package tracker will be of particular interest to me because > of the ability to understand the delta from Debian to a derivative. I'm > more than happy to contribute in any way I can and will review those > URLs to find some low-hanging fruit to get me started. The main work needed on the package tracker is to replace the Ubuntu panel with a patches panel that links to available patches in various places including from the derivatives census. https://bugs.debian.org/779400 > Is there are preferred channel for communication? > Is the mailing list preferred over IRC? This thread and the debian-derivatives mailing list and IRC channel are good places to discuss the census and I'll respond in either of them. > Regarding RAM and CPUs, I have a VM running Bullseye at Linode which we > can use for Gitlab runners or the like. Perhaps this will be of use? The RAM issue is mainly caused by part of the service not being written in a scalable way, since it just loads giant YAML files into memory. Throwing more RAM at the problem or making the memory storage more efficient would be the wrong approach, since eventually the patch metadata in YAML files will exceed the available RAM. A database would be a better way to do it. So we need changes to the codebase to store the data in a database instead plus a script to stream the YAML into the database without loading it all into RAM. A couple of links I gathered on the problem. https://habr.com/en/post/458518/ https://news.ycombinator.com/item?id=20401055 https://stackoverflow.com/questions/429162/how-to-process-a-yaml-stream-in-python -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part
Re: RFH: Debian derivatives census
On Thu, 2020-09-03 at 10:04 +0800, Paul Wise wrote: > Hi all, Hello Pabs! > I'm looking for collaborators on the Debian derivatives census. The > census involves a mixture of social and technical work as well as > following different information feeds to find new Debian derivatives > and passing information to other Debian teams and folks. > > https://wiki.debian.org/Derivatives/Census > I believe the census is valuable to Debian and to derivatives I would like to say that we find it incredibly valuable for PureOS and I've seen the Derivatives Census as an excellent source of information both for outreach and to understand the Debian ecosystem as it were. Thank you pabs for all your work on this. > and that > it helps build mutually beneficial connections between us and the > wider > community of Free Software distributions. Derivatives bring new > people, > perspectives and projects to Debian, conference sponsorship and more. > Derivatives benefit from collaboration with Debian through learning > from our community, increased exposure to the Debian audience and of > course our software distribution and services. I would like to add that I've recently learned that the Derivatives Census can help determine programmatically the delta between Debian and a Derivative (if things are correctly configured.) For a distribution such as ours which aims for binary compatibility and wants to stay as close to Debian as possible, this is extremely valuable. > I'm looking for folks who are not very involved in Debian and would > like to increase their involvement. I feel that is our responsibility to contribute back to Debian (which we try to do) everything we can and I think that contributing time and effort is the least we can do. > The current codebase involves Make, > Python, SQL, Shell and small amounts of Perl but if you don't know > these yet I'll be happy to help you learn enough that you can > contribute. In addition to the census codebase itself, work on the > census can involve working on the codebases of other Debian services, > such as the Debian Package Tracker. > > https://wiki.debian.org/Derivatives/Integration > https://wiki.debian.org/Derivatives > https://tracker.debian.org/ The Debian package tracker will be of particular interest to me because of the ability to understand the delta from Debian to a derivative. I'm more than happy to contribute in any way I can and will review those URLs to find some low-hanging fruit to get me started. Is there are preferred channel for communication? Is the mailing list preferred over IRC? > The census service is currently disabled until the patch part of the > service is refactored to use a database instead of YAML so that > loading > metadata about the patches doesn't use all the RAM on the machine. I > haven't had the spoons to tackle this issue just yet. > > https://wiki.debian.org/Glossary#spoons Debian lore! Thanks, I didn't know about spoons. :-) Regarding RAM and CPUs, I have a VM running Bullseye at Linode which we can use for Gitlab runners or the like. Perhaps this will be of use? It is currently used to run diffoscope over an ISO built by debootstrap to determine reproducibility of the ISO; http://dev.jeremiahfoster.com/pureos-9.0-images.html I realize that Debian already has plenty of CPU cycles and would rather have more spoons but I thought I'd mention it. :-) Thanks again pabs et. al.! - Jeremiah signature.asc Description: This is a digitally signed message part
Re: RFH: Debian derivatives census
On Thu, 2020-09-03 at 10:04 +0800, Paul Wise wrote: > I'm looking for folks who are not very involved in Debian and would > like to increase their involvement. The current codebase involves Make, > Python, SQL, Shell and small amounts of Perl but if you don't know > these yet I'll be happy to help you learn enough that you can > contribute. In addition to the census codebase itself, work on the > census can involve working on the codebases of other Debian services, > such as the Debian Package Tracker. I'd love to join! What do I do? I can (mostly) hold my own in those languages. -- []'s, Francisco M Neto www.fmneto.com 3E58 1655 9A3D 5D78 9F90 CFF1 D30B 1694 D692 FBF0 signature.asc Description: This is a digitally signed message part
Re: RFH: Debian derivatives census
> > I'm looking for folks who are not very involved in Debian and would > like to increase their involvement. The current codebase involves Make, > Python, SQL, Shell and small amounts of Perl but if you don't know > these yet I'll be happy to help you learn enough that you can > contribute. In addition to the census codebase itself, work on the > census can involve working on the codebases of other Debian services, > such as the Debian Package Tracker. > > https://wiki.debian.org/Derivatives/Integration > https://wiki.debian.org/Derivatives > https://tracker.debian.org/ > > The census service is currently disabled until the patch part of the > service is refactored to use a database instead of YAML so that loading > metadata about the patches doesn't use all the RAM on the machine. I > haven't had the spoons to tackle this issue just yet. > Hi This project sounds interesting, and I would like to avail myself to help/learn as much as possible. I know some basics in Python, SQL, and shell, but not Perl. Hope to be able to help in some way. Regards Sicelo
RFH: Debian derivatives census
Hi all, I'm looking for collaborators on the Debian derivatives census. The census involves a mixture of social and technical work as well as following different information feeds to find new Debian derivatives and passing information to other Debian teams and folks. https://wiki.debian.org/Derivatives/Census I believe the census is valuable to Debian and to derivatives and that it helps build mutually beneficial connections between us and the wider community of Free Software distributions. Derivatives bring new people, perspectives and projects to Debian, conference sponsorship and more. Derivatives benefit from collaboration with Debian through learning from our community, increased exposure to the Debian audience and of course our software distribution and services. I'm looking for folks who are not very involved in Debian and would like to increase their involvement. The current codebase involves Make, Python, SQL, Shell and small amounts of Perl but if you don't know these yet I'll be happy to help you learn enough that you can contribute. In addition to the census codebase itself, work on the census can involve working on the codebases of other Debian services, such as the Debian Package Tracker. https://wiki.debian.org/Derivatives/Integration https://wiki.debian.org/Derivatives https://tracker.debian.org/ The census service is currently disabled until the patch part of the service is refactored to use a database instead of YAML so that loading metadata about the patches doesn't use all the RAM on the machine. I haven't had the spoons to tackle this issue just yet. https://wiki.debian.org/Glossary#spoons -- bye, pabs https://wiki.debian.org/PaulWise signature.asc Description: This is a digitally signed message part