Re: [gentoo-dev] Standard parsable format for profiles/package.mask file

2023-09-22 Thread Arthur Zamarin
On 22/09/2023 00.22, Ulrich Mueller wrote:
>> On Thu, 21 Sep 2023, Arthur Zamarin wrote:
> 
>> = "Formal" format =
> 
>> Each entry is composed of 2 parts: "#"-prefixed explanation block and
>> list of "${CATEGORY}/${PN}" packages. Entries are separated when a new
>> explanation block starts (meaning first "#"-prefixed line after packages
>> list). You may add newlines between packages in packages list.
> 
> "Must" rather than "may" here? You certainly cannot list several
> packages in the same line.

Agreed, poor choice of words.

>> The first line of the "#"-prefixed explanation block must be of the
>> format "${AUTHOR_NAME} <${EMAIL}> (${SINGLE_DATE})" when the date is of
>> format -MM-DD, in UTC timezone.
> 
>> If this is a last-rite message, the last line must list the last-rite
>> last date (removal date) and the last-rite bug number. You can also list
>> other bugs relevant to the last-rite. So I think a format of: "Removal
>> on ${REMOVAL_DATE}.  Bug #NN, #NN." Where the bug list is comma
>> and space separated, we have at least one space (" +" regex) between the
>> removal date and bug list, and the date is of -MM-DD format.
>> I prefer this line is separate (and not continuous of prefix message text).
> 
>> The explanation block itself can reference bugs, by matching the regex
>> "[Bb]ugs? #\d+(, +#\d+)*" (For example: "bug #713106, #753134"). I think
>> this is quite a simple one, but powerful enough for most.
> 
>> Lines with single newline between them (so no blank line between them)
>> are considered as single paragraph continuum. If you want to start new
>> paragraph, leave a blank line (still prefixed with #) - think similar to
>> markdown. A line matching the last-rite line is always it's own paragraph.
> 
> Is this rule about paragraphs needed? It is at odds with the rule that
> the removal date and bug must be on their own line (i.e. that line is
> _not_ part of a "paragraph continuum").

Hmm, yeah, rereading my text shows I've over-complicated it. What I
wanted is that last paragraph (yes, if there are many bugs it might wrap
into new line) can be not separated with blank line from "main
explanation block".

> What about the introductory comment block in the file? Should there be a
> defined syntax for a separator between it and the rest of the file? For
> example, everything above the first line matching "^#[ \t]*---" could be
> ignored by automatic tools, and they would insert new entries below that
> separator.

Good point, and I should address it as you recommended. I will mention
the ignore-until-this-line, and that new entries should be added as
first entry after that ignore-until-this-line.

>> Should it be a GLEP, I don't think so? But I'm unsure about it. We do
>> need to document it (for example header of that exact file).
> 
> It shouldn't be too difficult to wrap this up as a GLEP. OTOH, we don't
> have a GLEP for eclassdoc either.

Yeah, after all the input, yes, I will work on a formal GLEP. It will
take time, but I hope to prepare a first draft in the coming 2 weeks.

> Ulrich

-- 
Arthur Zamarin
arthur...@gentoo.org
Gentoo Linux developer (Python, pkgcore stack, Arch Teams, GURU)



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] Standard parsable format for profiles/package.mask file

2023-09-22 Thread Arthur Zamarin
On 22/09/2023 12.21, Ulrich Mueller wrote:
>> On Fri, 22 Sep 2023, Oskari Pirhonen wrote:
> 
>>> Each entry is composed of 2 parts: "#"-prefixed explanation block and
>>> list of "${CATEGORY}/${PN}" packages. Entries are separated when a new
>>> explanation block starts (meaning first "#"-prefixed line after packages
>>> list). You may add newlines between packages in packages list.
> 
>> What about mandatory blank line(s) between entries? That way it ensures
>> they are visually separated when skimming through the file. Plus, you
>> can easily jump from entry to entry in editors that support
>> paragraph-wise movement.
> 
> Yes, please. Mandatory blank lines between entries, and no blank lines
> (or lines containing only whitespace) within entries. Especially, no
> blank lines in the list of packages.

Yeah I agree. Originally I wanted to allow blank lines between packages
in same entry (to enable you to group them), but as further
considerations and your input, this is a bad idea (if you want to divide
the group, create separate entries).

-- 
Arthur Zamarin
arthur...@gentoo.org
Gentoo Linux developer (Python, pkgcore stack, Arch Teams, GURU)



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] Standard parsable format for profiles/package.mask file

2023-09-22 Thread Ulrich Mueller
> On Fri, 22 Sep 2023, Oskari Pirhonen wrote:

>> Each entry is composed of 2 parts: "#"-prefixed explanation block and
>> list of "${CATEGORY}/${PN}" packages. Entries are separated when a new
>> explanation block starts (meaning first "#"-prefixed line after packages
>> list). You may add newlines between packages in packages list.

> What about mandatory blank line(s) between entries? That way it ensures
> they are visually separated when skimming through the file. Plus, you
> can easily jump from entry to entry in editors that support
> paragraph-wise movement.

Yes, please. Mandatory blank lines between entries, and no blank lines
(or lines containing only whitespace) within entries. Especially, no
blank lines in the list of packages.


signature.asc
Description: PGP signature


Re: [gentoo-dev] Standard parsable format for profiles/package.mask file

2023-09-21 Thread Oskari Pirhonen
On Thu, Sep 21, 2023 at 22:40:05 +0300, Arthur Zamarin wrote:
> = "Formal" format =
> 
> Each entry is composed of 2 parts: "#"-prefixed explanation block and
> list of "${CATEGORY}/${PN}" packages. Entries are separated when a new
> explanation block starts (meaning first "#"-prefixed line after packages
> list). You may add newlines between packages in packages list.
> 

What about mandatory blank line(s) between entries? That way it ensures
they are visually separated when skimming through the file. Plus, you
can easily jump from entry to entry in editors that support
paragraph-wise movement.

- Oskari


signature.asc
Description: PGP signature


Re: [gentoo-dev] Standard parsable format for profiles/package.mask file

2023-09-21 Thread Oskari Pirhonen
On Thu, Sep 21, 2023 at 23:22:27 +0200, Ulrich Mueller wrote:
> > On Thu, 21 Sep 2023, Arthur Zamarin wrote:
> 
> > = "Formal" format =
> 
> > Each entry is composed of 2 parts: "#"-prefixed explanation block and
> > list of "${CATEGORY}/${PN}" packages. Entries are separated when a new
> > explanation block starts (meaning first "#"-prefixed line after packages
> > list). You may add newlines between packages in packages list.
> 
> "Must" rather than "may" here? You certainly cannot list several
> packages in the same line.
> 

I read it to mean something like this would be allowed:

# Blurb about package1, package2, and package3
category/package1
category/package2

category/package3

Whether it makes sense to allow that is a different question.

- Oskari


signature.asc
Description: PGP signature


Re: [gentoo-dev] Standard parsable format for profiles/package.mask file

2023-09-21 Thread Sam James


Tim Harder  writes:

> On 2023-09-21 Thu 15:22, Ulrich Mueller wrote:
>>> On Thu, 21 Sep 2023, Arthur Zamarin wrote:
>>> Should it be a GLEP, I don't think so? But I'm unsure about it. We do
>>> need to document it (for example header of that exact file).
>>
>>It shouldn't be too difficult to wrap this up as a GLEP.
>
> To me standardizing a format in Gentoo (outside of PMS-related
> functionality) requires a GLEP or at the very least some semi-formal
> documentation outside the file in question in a place like the
> devmanual. Consider it due diligence of the process that allows people
> writing code to target the format without having to chase details down
> into code bases or mailing list threads.

+1




Re: [gentoo-dev] Standard parsable format for profiles/package.mask file

2023-09-21 Thread Tim Harder

On 2023-09-21 Thu 15:22, Ulrich Mueller wrote:

On Thu, 21 Sep 2023, Arthur Zamarin wrote:

Should it be a GLEP, I don't think so? But I'm unsure about it. We do
need to document it (for example header of that exact file).


It shouldn't be too difficult to wrap this up as a GLEP.


To me standardizing a format in Gentoo (outside of PMS-related
functionality) requires a GLEP or at the very least some semi-formal
documentation outside the file in question in a place like the
devmanual. Consider it due diligence of the process that allows people
writing code to target the format without having to chase details down
into code bases or mailing list threads.


OTOH, we don't have a GLEP for eclassdoc either.


This is a poor example since it's partly the reason why an awk script
with issues relating to extensibility and maintainability is still used
to generate eclass manpages.

I mainly let it slide when writing pkgcore/pkgcheck parsing
functionality because the devmanual [0] was a passable resource at the
time.

Tim

[0]: https://devmanual.gentoo.org/eclass-writing/#documenting-eclasses



Re: [gentoo-dev] Standard parsable format for profiles/package.mask file

2023-09-21 Thread Ulrich Mueller
> On Thu, 21 Sep 2023, Arthur Zamarin wrote:

> = "Formal" format =

> Each entry is composed of 2 parts: "#"-prefixed explanation block and
> list of "${CATEGORY}/${PN}" packages. Entries are separated when a new
> explanation block starts (meaning first "#"-prefixed line after packages
> list). You may add newlines between packages in packages list.

"Must" rather than "may" here? You certainly cannot list several
packages in the same line.

> The first line of the "#"-prefixed explanation block must be of the
> format "${AUTHOR_NAME} <${EMAIL}> (${SINGLE_DATE})" when the date is of
> format -MM-DD, in UTC timezone.

> If this is a last-rite message, the last line must list the last-rite
> last date (removal date) and the last-rite bug number. You can also list
> other bugs relevant to the last-rite. So I think a format of: "Removal
> on ${REMOVAL_DATE}.  Bug #NN, #NN." Where the bug list is comma
> and space separated, we have at least one space (" +" regex) between the
> removal date and bug list, and the date is of -MM-DD format.
> I prefer this line is separate (and not continuous of prefix message text).

> The explanation block itself can reference bugs, by matching the regex
> "[Bb]ugs? #\d+(, +#\d+)*" (For example: "bug #713106, #753134"). I think
> this is quite a simple one, but powerful enough for most.

> Lines with single newline between them (so no blank line between them)
> are considered as single paragraph continuum. If you want to start new
> paragraph, leave a blank line (still prefixed with #) - think similar to
> markdown. A line matching the last-rite line is always it's own paragraph.

Is this rule about paragraphs needed? It is at odds with the rule that
the removal date and bug must be on their own line (i.e. that line is
_not_ part of a "paragraph continuum").

What about the introductory comment block in the file? Should there be a
defined syntax for a separator between it and the rest of the file? For
example, everything above the first line matching "^#[ \t]*---" could be
ignored by automatic tools, and they would insert new entries below that
separator.

> Should it be a GLEP, I don't think so? But I'm unsure about it. We do
> need to document it (for example header of that exact file).

It shouldn't be too difficult to wrap this up as a GLEP. OTOH, we don't
have a GLEP for eclassdoc either.

Ulrich


signature.asc
Description: PGP signature


[gentoo-dev] Standard parsable format for profiles/package.mask file

2023-09-21 Thread Arthur Zamarin
Hi all

I want to suggest a standard format for profiles/package.mask, for
multiple reasons:

1. Easier to write simple to understand mask or last-rites entries. When
all entries are in similar format, the reader knows where to expect
important information and such. Also easier for writer to convey all
needed information.

2. We can teach tools to parse it and render nicely, or help you fill
the file. For example I've tried to implement a parser for
packages.gentoo.org so it shows as nice as possible the message, see as
example [1]. On the other hand, `pkgdev mask` [2] can help you fill the
message (including bug number, last-rite until date, author & email
line). Both of them mostly works, but when someone "breaks" the
unofficial syntax, the tools fail sadly.

This is why I want to recommend we create a mostly standard syntax, so
we can all expect the same thing and have nice things.
Also please note that for now I want to formalize the format only for
profiles/package.mask file, and not the one inside all the different
profiles. If you think we better apply to all of them, we can think on
it separately please :)

The current format is mostly acceptable, but let's tighten it. I will
implement a pkgcheck check that will validate the format and error out
if invalid.

[1] https://packages.gentoo.org/packages/sys-fs/eudev
[2] https://pkgcore.github.io/pkgdev/man/pkgdev/mask.html

= "Formal" format =

Each entry is composed of 2 parts: "#"-prefixed explanation block and
list of "${CATEGORY}/${PN}" packages. Entries are separated when a new
explanation block starts (meaning first "#"-prefixed line after packages
list). You may add newlines between packages in packages list.

The first line of the "#"-prefixed explanation block must be of the
format "${AUTHOR_NAME} <${EMAIL}> (${SINGLE_DATE})" when the date is of
format -MM-DD, in UTC timezone.

If this is a last-rite message, the last line must list the last-rite
last date (removal date) and the last-rite bug number. You can also list
other bugs relevant to the last-rite. So I think a format of: "Removal
on ${REMOVAL_DATE}.  Bug #NN, #NN." Where the bug list is comma
and space separated, we have at least one space (" +" regex) between the
removal date and bug list, and the date is of -MM-DD format.
I prefer this line is separate (and not continuous of prefix message text).

The explanation block itself can reference bugs, by matching the regex
"[Bb]ugs? #\d+(, +#\d+)*" (For example: "bug #713106, #753134"). I think
this is quite a simple one, but powerful enough for most.

Lines with single newline between them (so no blank line between them)
are considered as single paragraph continuum. If you want to start new
paragraph, leave a blank line (still prefixed with #) - think similar to
markdown. A line matching the last-rite line is always it's own paragraph.

= Example =

After all of those rambling, here is an example (it will result in 3
paragraphs, 2 explanation and 1 last-rite finish):

# Arthur Zamarin  (2023-09-21)
# Very broken, no idea why packaged, need to drop ASAP. The project
# is done with supporting this package. See for history bug #667889.
#
# As a better plan, you should migrate to dev-lang/perl, which has
# better compatibility with dev-lang/ruby when used with dev-lang/lua
# bindings.
# Removal on 2023-10-21.  Bug #667687, #667689.
dev-lang/python

 Call for comments 

So how does it sound? I know it is easy to try to limit the syntax for
me (since I"ll need to implement parsing of it), but I think this format
above matches most of the currently used once, and the one created by
`pkgdev mask`. But i needed, I'm open to improve it by comments.

Should it be a GLEP, I don't think so? But I'm unsure about it. We do
need to document it (for example header of that exact file).


-- 
Arthur Zamarin
arthur...@gentoo.org
Gentoo Linux developer (Python, pkgcore stack, Arch Teams, GURU)


OpenPGP_signature.asc
Description: OpenPGP digital signature