On Wednesday, July 22, 2020 6:50 PM, Wojtek Porczyk 
<[email protected]> wrote:

> On Tue, Jul 21, 2020 at 06:31:55PM +0000, WillyPillow wrote:
> 

> > On Monday, July 20, 2020 4:52 PM, Wojtek Porczyk 
> > [email protected] wrote:
> > 

> > > On Sun, Jul 19, 2020 at 06:20:06PM +0000, WillyPillow wrote:
> > 

> > > > On another note, I'm wonder which fields are needed in the output of 
> > > > the "info"
> > > > operation. Comparing my WIP code to DNF, I currently do not have the
> > > > architecture [2], URL, licence, and description fields. This is due to
> > > > `qubes.TemplateSearch` not currently returning those fields.
> > > > For the packages in the official repos, those fields do not contain much
> > > > information (in particular, the description field contains the same 
> > > > information
> > > > as the summary), though I'm not sure if they might be useful in the 
> > > > future or
> > > > for unofficial templates.
> > 

> > > I don't know, could you design that so that those can be added in the 
> > > future
> > > if need be? Those need to be understood and properly validated, because 
> > > some
> > > software might act upon that information. For example: Debian provides
> > > a directory with licence texts, which allows for
> > > /usr/share/common-licenses/$license, which smells path traversal.
> > > Fedora's RPM guidelines is even worse, they support operators like "()",
> > > "and", "or":
> > > https://docs.fedoraproject.org/en-US/packaging-guidelines/LicensingGuidelines/
> > 

> > To include the information, all that is needed is to include it in the
> > `queryformat` string of the qrexec call and read it in `qvm-template` (Aside
> > from dealing with the newline issue [4] -- mentioned below.)
> > As for validation, on our end, as we (AFAICT) only need to display the 
> > entry as
> > is and don't need to parse it, we should be fine.
> > For other programs parsing the information, IMO it should be up to the 
> > program
> > in question to sanitize its inputs.
> 

> That would be in ideal world, unfortunately many developers can't be bothered
> to do it properly. That's why Qubes exists in the first place.
> 

> > This is because the fields contain, by
> > design, arbitrary text, and my understanding of the linked article is that 
> > it's
> > merely a guideline for Fedora repos instead of RPMs in general.
> 

> Can't we make our own guidelines? Like explicit whitelist of known licences
> and maybe "other"? Linux templates are mostly under GPL, unikernels can have
> other licences, but there aren't many unikernels. Unless I'm missing
> something, known-good values would be easy to enumerate and check.

I see. A whitelist approach indeed seems to work in this case. The Fedora
article also links to a page containing a list of license short names we can
start from.

> Also, some checking probably applies also to name and description: they
> shouldn't have fancy characters, because UI toolkits might support things
> like pango markup or whatever.

Currently we're already blocking non-ASCII characters and escape sequences. For
names, constraining the character set to, e.g., something like
`[A-Za-z0-9_.+\-]` is pretty doable, though for the summary/description, I'm
unsure if anything more can be done without sacrificing usability, as, for
instance, it's not unimaginable for them to contain stuff like `<>`.

The roadmap also contains plans for `qvm-template` to provide machine-readable
output (as opposed to the current user-facing format). Maybe another way of
approaching this is to provide facilities for escaping the output for
downstream applications?

> > > > One tricky thing is that the description may contain newlines, while 
> > > > `dnf repoquery` does not escape them at all [3]. This may mean that 
> > > > another method
> > > > of querying the repo needs to be considered if the description is 
> > > > included. (Or
> > > > use unconventional characters/strings as separators. In particular, I 
> > > > can't
> > > > pass NULL characters in the arguments to DNF for obvious reasons.)
> > 

> > > Yes, another qrexec call is OK, provided this won't be too slow, i.e., to
> > > display a list of 15 templates available you won't refresh the repo cache
> > > 15 times...
> > 

> > (Speaking of refreshing the repo cache, a `--refresh` option that forces 
> > this
> > may need to be added, either as an option to the existing qrexec calls or as
> > another call.)
> > Creating another qrexec call is an interesting idea, but I'm unsure if it's
> > really feasible performance-wise. In particular, running `dnf info` (which 
> > does
> > not refresh the cache by default) on any package in `qubes-template-itl` 
> > takes
> > almost a second on my machine.
> > What I meant by "another method" is actually an alternative to `dnf 
> > repoquery`.
> > The issue is that (AFAIK) DNF provides no methods other than `repoquery` to
> > obtain a machine-readable form of the information (short of using the API 
> > from
> > Python). However, it has the issue when dealing with newlines.
> > IMO, the easiest way of doing this in terms of code changes is to modify
> > `repoquery` so that it allows expanding `\\0` as null characters. 
> > Currently, it
> > already does similar replacements with `\\n` and `\\t`, and adding `\\0` 
> > should
> > only be a one-line change.
> 

> > However, I imagine that it would not be ideal to maintain a patched version 
> > of
> > a package in VMs. Even if the patch gets merged upstream, it'll probably 
> > still
> > take a few months for it to land in the next Fedora release.
> > Other solutions I can think of right now are:
> > 

> > -   Maintain a separate modified copy of `repoquery.py`. From a first 
> > glance, it
> >     seems to be fairly self-contained, and the only internal thing it calls,
> >     AFAICT, is for i18n, which we don't care about in this context.
> >     

> > -   Write a simplified version of `repoquery` using the DNF API. Probably 
> > not too
> >     hard but feels like reinventing the wheel.
> >     

> > 

> > > > [3]: Speaking of which, I'm also unsure what would happen if newlines 
> > > > appear
> > > > in, say, the summary field. Maybe I can conduct some experiments about 
> > > > this...
> > 

> > > Certainly.
> > 

> > Unfortunately, newlines in the summary field does break things. In
> > particular, besides malformed repo XML files, a malformed package with 
> > newlines
> > in the summary can also introduce erroneous newlines in the output of
> > `repoquery`.
> 

> > [4]: Technically, there may also be issues with the colon separator we
> > currently use, though it's in essence the same issue.
> 

> I wouldn't worry much about either newlines or additional colons generated on
> the untrusted side of qrexec call. I'd worry about proper checking of the
> untrusted input. Yes, I can make .rpm package with colon and newline in some
> metadata fields. So what? The template manager on the trusted side should just
> reject any malformed input from the untrusted side.
> 

> Now if it would be easy, you could also make some provisions to also reject
> such malformed input inside the downloading vm, but it's not as critical as in
> dom0. That's why I certainly wouldn't patch dnf, because effort to gains ratio
> doesn't look promising, and I'd only fork the the repoquery if it would look
> easy to maintain. That ':'.join(['%{something}']) formatstring looks good
> enough for PoC.

One issue is that from the qrexec client side it is basically impossible to
distinguish between the two. (Consider the case where a field contains
`xxx\na:b:c`.)

Security-wise, this is unlikely to cause issues as an entity that can do this
can probably modify the repo contents directly.

However, if the repo, by accident, does contain packages with, say, colons in
summaries, it may be an issue usability-wise as it's hard to give meaningful
error messages when things break.

There's also the original issue with descriptions (assuming that we don't omit
them), which contains newlines a lot of the time.

That being said, if we treat such errors as "repo errors" and leave to the repo
maintainers to ensure that the fields follow a certain format, then we can just
use a special character for the separator [5] and ban the character from the
fields.

[5]: The separator may also need to be placed at the end of the format string.

> > Also, initial implementations of `qvm-template {info,search}` along with 
> > some
> > other changes have been added to the PR tocore-admin-client.
> 

> Acked, thanks.

I'm mainly working on the TODOs in the code currently. After this (and other
stuff in the discussion above) is finished, I'll probably continue to work on
the "Other Possible Features" in the design doc. Is there any particular order
you'd prefer me to proceed in?

Thanks,
WillyPillow

> https://blog.nerde.pw/
>
> PGP fingerprint = 6CCF 3FC7 32AC 9D83 D154 217F 1C16 C70E E7C3 1C84
>
> Protonmail PGP = D02D CEFF ACE5 5A7B FF5D 871E 4004 1CB1 F52B 127E

-- 
You received this message because you are subscribed to the Google Groups 
"qubes-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-devel/dRl0lrXJw6yjU-flDv-LRAO9eVnWNqiqiMVSLj2N3akxxr7dazmeDQluskY9EKdRwlui3qXy4T9nBZ_KelzM9TuIREWBkyLVReJFGFNxP74%3D%40nerde.pw.

Attachment: publickey - [email protected] - 0xD02DCEFF.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to