Re: RFH: Debian derivatives census

2020-09-09 Thread Paul Wise
On Wed, 2020-09-09 at 15:00 -0400, Jeremiah C. Foster wrote:

> This sounds very useful -  how can I follow along on the discussion? Is
> there a separate email list for this topic?

There is no discussion about using the snapshot API in the census, just
a FIXME item in the patches generation script. The debian-derivatives
mailing list and IRC channel are probably the best places to discuss
the derivatives census scripts once this thread is concluded.

> I'll review those links to find out more and see if I'm able to
> contribute there.

The file that causes the RAM issue is 555MB of YAML and is here:

http://deriv.debian.net/sources.patches

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Re: RFH: Debian derivatives census

2020-09-09 Thread Jeremiah C. Foster
On Sun, 2020-09-06 at 12:22 +0800, Paul Wise wrote:
> On Thu, 2020-09-03 at 14:12 -0400, Jeremiah C. Foster wrote: 
> 
> > I would like to add that I've recently learned that the Derivatives
> > Census can help determine programmatically the delta between Debian
> > and
> > a Derivative (if things are correctly configured.) For a
> > distribution
> > such as ours which aims for binary compatibility and wants to stay
> > as
> > close to Debian as possible, this is extremely valuable. 
> 
> I think you are referring to the patch generation?
> 
> https://wiki.debian.org/Derivatives/Integration#Patches
> 
> The size of the metadata about the patches is what is causing the
> memory issues.
> 
> The patch generation itself currently can only be run on the Debian
> servers at LeaseWeb because it relies on access to the snapshot.d.o
> database and hash based filesystem. There is a TODO item about
> porting
> it to the snapshot.d.o API instead so that derivatives who have
> private
> apt repositories can also run it locally.

This sounds very useful -  how can I follow along on the discussion? Is
there a separate email list for this topic?

> 
> > I feel that is our responsibility to contribute back to Debian
> > (which
> > we try to do) everything we can and I think that contributing time
> > and
> > effort is the least we can do.
> 
> Excellent, please take a look at the census codebase and the wiki
> pages
> I have linked to and run the codebase locally to see how it works.

Will do!

> > The Debian package tracker will be of particular interest to me
> > because
> > of the ability to understand the delta from Debian to a derivative.
> > I'm
> > more than happy to contribute in any way I can and will review
> > those
> > URLs to find some low-hanging fruit to get me started.
> 
> The main work needed on the package tracker is to replace the Ubuntu
> panel with a patches panel that links to available patches in various
> places including from the derivatives census.
> 
> https://bugs.debian.org/779400

Super useful, I'll review to see where I can participate.

> > Is there are preferred channel for communication?
> > Is the mailing list preferred over IRC?
> 
> This thread and the debian-derivatives mailing list and IRC channel
> are
> good places to discuss the census and I'll respond in either of them.

Great, thanks.

> > Regarding RAM and CPUs, I have a VM running Bullseye at Linode
> > which we
> > can use for Gitlab runners or the like. Perhaps this will be of
> > use?
> 
> The RAM issue is mainly caused by part of the service not being
> written
> in a scalable way, since it just loads giant YAML files into memory.
> Throwing more RAM at the problem or making the memory storage more
> efficient would be the wrong approach, since eventually the patch
> metadata in YAML files will exceed the available RAM. A database
> would
> be a better way to do it. So we need changes to the codebase to store
> the data in a database instead plus a script to stream the YAML into
> the database without loading it all into RAM. A couple of links I
> gathered on the problem.
> 
> https://habr.com/en/post/458518/
> https://news.ycombinator.com/item?id=20401055
> https://stackoverflow.com/questions/429162/how-to-process-a-yaml-stream-in-python

I'll review those links to find out more and see if I'm able to
contribute there.

Thanks again,

Jeremiah



signature.asc
Description: This is a digitally signed message part


Re: RFH: Debian derivatives census

2020-09-05 Thread Paul Wise
On Thu, Sep 3, 2020 at 2:56 PM Francisco M Neto wrote:

> I'd love to join! What do I do?
>
> I can (mostly) hold my own in those languages.

Great! I suggest you start by looking at the wiki pages I mentioned,
downloading the codebase and try running it locally. As I said the
biggest problem is the RAM usage from loading the YAML files but there
are lots of TODO/FIXME items sprinkled throughout the codebase and
some ideas for features on the wiki pages. If you have any questions
I'll be available on the debian-derivatives IRC channel and mailing
list, or this thread.

--
bye,
pabs

https://wiki.debian.org/PaulWise



Re: RFH: Debian derivatives census

2020-09-05 Thread Paul Wise
On Thu, Sep 3, 2020 at 2:42 PM Sicelo wrote:

> This project sounds interesting, and I would like to avail myself to
> help/learn as much as possible. I know some basics in Python, SQL, and
> shell, but not Perl.

Great! The Perl parts are quite minimal (just for discovering RSS
feeds and downloading favicons) so you can easily ignore those. I
suggest you start by looking at the wiki pages I mentioned,
downloading the codebase and try running it locally. There are lots of
TODO/FIXME items sprinkled throughout the codebase and some ideas for
features on the wiki pages. If you have any questions I'll be
available on the debian-derivatives IRC channel and mailing list, or
this thread.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise



Re: RFH: Debian derivatives census

2020-09-05 Thread Paul Wise
On Thu, 2020-09-03 at 14:12 -0400, Jeremiah C. Foster wrote: 

> I would like to add that I've recently learned that the Derivatives
> Census can help determine programmatically the delta between Debian and
> a Derivative (if things are correctly configured.) For a distribution
> such as ours which aims for binary compatibility and wants to stay as
> close to Debian as possible, this is extremely valuable. 

I think you are referring to the patch generation?

https://wiki.debian.org/Derivatives/Integration#Patches

The size of the metadata about the patches is what is causing the
memory issues.

The patch generation itself currently can only be run on the Debian
servers at LeaseWeb because it relies on access to the snapshot.d.o
database and hash based filesystem. There is a TODO item about porting
it to the snapshot.d.o API instead so that derivatives who have private
apt repositories can also run it locally.

> I feel that is our responsibility to contribute back to Debian (which
> we try to do) everything we can and I think that contributing time and
> effort is the least we can do.

Excellent, please take a look at the census codebase and the wiki pages
I have linked to and run the codebase locally to see how it works.

> The Debian package tracker will be of particular interest to me because
> of the ability to understand the delta from Debian to a derivative. I'm
> more than happy to contribute in any way I can and will review those
> URLs to find some low-hanging fruit to get me started.

The main work needed on the package tracker is to replace the Ubuntu
panel with a patches panel that links to available patches in various
places including from the derivatives census.

https://bugs.debian.org/779400

> Is there are preferred channel for communication?
> Is the mailing list preferred over IRC?

This thread and the debian-derivatives mailing list and IRC channel are
good places to discuss the census and I'll respond in either of them.

> Regarding RAM and CPUs, I have a VM running Bullseye at Linode which we
> can use for Gitlab runners or the like. Perhaps this will be of use?

The RAM issue is mainly caused by part of the service not being written
in a scalable way, since it just loads giant YAML files into memory.
Throwing more RAM at the problem or making the memory storage more
efficient would be the wrong approach, since eventually the patch
metadata in YAML files will exceed the available RAM. A database would
be a better way to do it. So we need changes to the codebase to store
the data in a database instead plus a script to stream the YAML into
the database without loading it all into RAM. A couple of links I
gathered on the problem.

https://habr.com/en/post/458518/
https://news.ycombinator.com/item?id=20401055
https://stackoverflow.com/questions/429162/how-to-process-a-yaml-stream-in-python

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Re: RFH: Debian derivatives census

2020-09-03 Thread Jeremiah C. Foster
On Thu, 2020-09-03 at 10:04 +0800, Paul Wise wrote:
> Hi all,

Hello Pabs!

> I'm looking for collaborators on the Debian derivatives census. The
> census involves a mixture of social and technical work as well as
> following different information feeds to find new Debian derivatives
> and passing information to other Debian teams and folks.
> 
> https://wiki.debian.org/Derivatives/Census

> I believe the census is valuable to Debian and to derivatives 

I would like to say that we find it incredibly valuable for PureOS and
I've seen the Derivatives Census as an excellent source of information
both for outreach and to understand the Debian ecosystem as it were.
Thank you pabs for all your work on this.

> and that
> it helps build mutually beneficial connections between us and the
> wider
> community of Free Software distributions. Derivatives bring new
> people,
> perspectives and projects to Debian, conference sponsorship and more.
> Derivatives benefit from collaboration with Debian through learning
> from our community, increased exposure to the Debian audience and of
> course our software distribution and services.

I would like to add that I've recently learned that the Derivatives
Census can help determine programmatically the delta between Debian and
a Derivative (if things are correctly configured.) For a distribution
such as ours which aims for binary compatibility and wants to stay as
close to Debian as possible, this is extremely valuable. 

> I'm looking for folks who are not very involved in Debian and would
> like to increase their involvement. 

I feel that is our responsibility to contribute back to Debian (which
we try to do) everything we can and I think that contributing time and
effort is the least we can do.

> The current codebase involves Make,
> Python, SQL, Shell and small amounts of Perl but if you don't know
> these yet I'll be happy to help you learn enough that you can
> contribute. In addition to the census codebase itself, work on the
> census can involve working on the codebases of other Debian services,
> such as the Debian Package Tracker.
> 
> https://wiki.debian.org/Derivatives/Integration
> https://wiki.debian.org/Derivatives
> https://tracker.debian.org/

The Debian package tracker will be of particular interest to me because
of the ability to understand the delta from Debian to a derivative. I'm
more than happy to contribute in any way I can and will review those
URLs to find some low-hanging fruit to get me started. Is there are
preferred channel for communication? Is the mailing list preferred over
IRC?

> The census service is currently disabled until the patch part of the
> service is refactored to use a database instead of YAML so that
> loading
> metadata about the patches doesn't use all the RAM on the machine. I
> haven't had the spoons to tackle this issue just yet.
> 
> https://wiki.debian.org/Glossary#spoons

Debian lore! Thanks, I didn't know about spoons. :-) 

Regarding RAM and CPUs, I have a VM running Bullseye at Linode which we
can use for Gitlab runners or the like. Perhaps this will be of use? It
is currently used to run diffoscope over an ISO built by debootstrap to
determine reproducibility of the ISO; 
http://dev.jeremiahfoster.com/pureos-9.0-images.html

I realize that Debian already has plenty of CPU cycles and would rather
have more spoons but I thought I'd mention it. :-)

Thanks again pabs et. al.!

- Jeremiah




signature.asc
Description: This is a digitally signed message part


Re: RFH: Debian derivatives census

2020-09-03 Thread Francisco M Neto
On Thu, 2020-09-03 at 10:04 +0800, Paul Wise wrote:
> I'm looking for folks who are not very involved in Debian and would
> like to increase their involvement. The current codebase involves Make,
> Python, SQL, Shell and small amounts of Perl but if you don't know
> these yet I'll be happy to help you learn enough that you can
> contribute. In addition to the census codebase itself, work on the
> census can involve working on the codebases of other Debian services,
> such as the Debian Package Tracker.

I'd love to join! What do I do?

I can (mostly) hold my own in those languages.

-- 
[]'s,

Francisco M Neto 
www.fmneto.com

3E58 1655 9A3D 5D78 9F90
CFF1 D30B 1694 D692 FBF0



signature.asc
Description: This is a digitally signed message part


Re: RFH: Debian derivatives census

2020-09-03 Thread Sicelo
> 
> I'm looking for folks who are not very involved in Debian and would
> like to increase their involvement. The current codebase involves Make,
> Python, SQL, Shell and small amounts of Perl but if you don't know
> these yet I'll be happy to help you learn enough that you can
> contribute. In addition to the census codebase itself, work on the
> census can involve working on the codebases of other Debian services,
> such as the Debian Package Tracker.
> 
> https://wiki.debian.org/Derivatives/Integration
> https://wiki.debian.org/Derivatives
> https://tracker.debian.org/
> 
> The census service is currently disabled until the patch part of the
> service is refactored to use a database instead of YAML so that loading
> metadata about the patches doesn't use all the RAM on the machine. I
> haven't had the spoons to tackle this issue just yet.
> 

Hi

This project sounds interesting, and I would like to avail myself to
help/learn as much as possible. I know some basics in Python, SQL, and
shell, but not Perl.

Hope to be able to help in some way.

Regards
Sicelo



RFH: Debian derivatives census

2020-09-02 Thread Paul Wise
Hi all,

I'm looking for collaborators on the Debian derivatives census. The
census involves a mixture of social and technical work as well as
following different information feeds to find new Debian derivatives
and passing information to other Debian teams and folks.

https://wiki.debian.org/Derivatives/Census

I believe the census is valuable to Debian and to derivatives and that
it helps build mutually beneficial connections between us and the wider
community of Free Software distributions. Derivatives bring new people,
perspectives and projects to Debian, conference sponsorship and more.
Derivatives benefit from collaboration with Debian through learning
from our community, increased exposure to the Debian audience and of
course our software distribution and services.

I'm looking for folks who are not very involved in Debian and would
like to increase their involvement. The current codebase involves Make,
Python, SQL, Shell and small amounts of Perl but if you don't know
these yet I'll be happy to help you learn enough that you can
contribute. In addition to the census codebase itself, work on the
census can involve working on the codebases of other Debian services,
such as the Debian Package Tracker.

https://wiki.debian.org/Derivatives/Integration
https://wiki.debian.org/Derivatives
https://tracker.debian.org/

The census service is currently disabled until the patch part of the
service is refactored to use a database instead of YAML so that loading
metadata about the patches doesn't use all the RAM on the machine. I
haven't had the spoons to tackle this issue just yet.

https://wiki.debian.org/Glossary#spoons

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part