Re: [gentoo-portage-dev] Re: Speeding up Tree Verification

2020-06-29 Thread Zac Medico
On 6/29/20 7:15 PM, Sid Spry wrote:
> On Mon, Jun 29, 2020, at 9:13 PM, Sid Spry wrote:
>> Hello,
>>
>> I have some runnable pseudocode outlining a faster tree verification 
>> algorithm.
> 
> Ah, right. It's worth noting that even faster than this algorithm is simply 
> verifying
> a .tar.xz. Is that totally off the table? I realize it doesn't fit every 
> usecase, but it
> seems to be faster in both sync and verification time.

We've already got support for that with sync-type = webrsync. However, I
imagine sync-type = git is even better. All of the types are covered here:

https://wiki.gentoo.org/wiki/Portage_Security
-- 
Thanks,
Zac



[gentoo-portage-dev] Re: Speeding up Tree Verification

2020-06-29 Thread Sid Spry
On Mon, Jun 29, 2020, at 9:13 PM, Sid Spry wrote:
> Hello,
> 
> I have some runnable pseudocode outlining a faster tree verification 
> algorithm.

Ah, right. It's worth noting that even faster than this algorithm is simply 
verifying
a .tar.xz. Is that totally off the table? I realize it doesn't fit every 
usecase, but it
seems to be faster in both sync and verification time.



[gentoo-portage-dev] Speeding up Tree Verification

2020-06-29 Thread Sid Spry
Hello,

I have some runnable pseudocode outlining a faster tree verification algorithm.
Before I create patches I'd like to see if there is any guidance on making the
changes as unobtrusive as possible. If the radical change in algorithm is
acceptable I can work on adding the changes.

Instead of composing any kind of structured data out of the portage tree my
algorithm just lists all files and then optionally batches them out to threads.
There is a noticeable speedup by eliding the tree traversal operations which
can be seen when running the algorithm with a single thread and comparing it to
the current algorithm in gemato (which should still be discussed here?).

Some simple tests like counting all objects traversed and verified returns the
same(ish). Once it is put into portage it could be tested in detail.

There is also my partial attempt at removing the brittle interface to GnuPG
(it's not as if the current code is badly designed, just that parsing the
output of GnuPG directly is likely not the best idea).

Needs gemato, dnspython, and requests. Slightly better than random code because
I took inspiration from the existing gemato classes.

```python (veriftree.py)
#!/usr/bin/env python3
import os, sys, zlib, hashlib, tempfile, shutil, timeit
import subprocess
from typing import List
from pprint import pprint

from gemato.manifest import (
ManifestFile,
ManifestFileEntry,
)
from wkd import (
check_domain_signature,
hash_localpart,
build_web_key_uri,
stream_to_file
)
from fetchmedia import (
OpenPGPEnvironment,
setup_verification_environment
)

# 0. Top level directory (repository) contains Manifest, a PGP signature of
#blake2b and sha512 hashes of Manifest.files.gz.
# 1. Manifest.files contains hashes of each category Manifest.gz.
# 2. The category Manifest contains hashes of each package Manifest.
# 3. The package Manifest contains hashes of each package file.
#Must be aware of PMS, e.g. aux tag specifies a file in files/.

# 0. Check signature of repo Manifest.
# 1. Merge items in Manifest.files, each category Manifest, and each package
#Manifest into one big list. The path must be made absolute.
# 2. Distribute items to threads.

# To check operation compare directory tree to files appearing in all
# ManifestRecords.

class ManifestTree(object):
__slots__ = ['_directory', '_manifest_list', '_manifest_records',
'_manifest_results']

def __init__(self, directory: str):
self._directory = directory
# Tuples of (base_path, full_path).
self._manifest_list = []
self._manifest_records = []
self._manifest_results = []

def build_manifest_list(self):
for path, dirs, files in os.walk(self._directory):
#if 'glsa' in path or 'news' in path:
#if 'metadata' in path:
#continue # Skip the metadata directory for now.
# It contains a repository. Current algo barfs on Manifest
# containing only sig.

if 'Manifest.files.gz' in files:
self._manifest_list += [(path, path + '/Manifest.files.gz')]
if 'Manifest.gz' in files:
self._manifest_list += [(path, path + '/Manifest.gz')]

if path == self._directory:
continue # Skip the repo manifest. Order matters, fix 
eventually.
if 'Manifest' in files:
self._manifest_list += [(path, path + '/Manifest')]

def parse_manifests(self):
td = tempfile.TemporaryDirectory(dir='./')
for manifest in self._manifest_list:
def inner():
if manifest[1].endswith('.gz'):
name = 'Manifest.files' # Need to also handle Manifest.gz.
path = '{0}/{1}'.format(td.name, name)
subprocess.run(['sh', '-c', 'gunzip -c {0} > {1}'
.format(manifest[1], path)])
for line in open(path):
mr = ManifestRecord(line)
mr.make_absolute(manifest[0])
self._manifest_records += [mr]
else:
for line in open(manifest[1]):
if line.startswith('-'):
return # Skip the signed manifest.
mr = ManifestRecord(line)
mr.make_absolute(manifest[0])
self._manifest_records += [mr]
inner()

def verify_manifests(self):
for record in self._manifest_records:
self._manifest_results += [record.verify()]


class ManifestRecord(object):
__slots__ = ['_tag', '_abs_path', '_path', '_size', '_hashes']

def __init__(self, line: str=None):
self._tag = None
self._abs_path = None
self._path = None
self._size = None
self._hashes = []
if line:
self.from_string(line)

def 

Re: [gentoo-portage-dev] [PATCH] ecompress: optimize docompress -x precompressed comparison

2020-06-29 Thread Robin H. Johnson
On Sun, Jun 28, 2020 at 12:54:56PM -0700, Zac Medico wrote:
> Use sort and comm with temporary files in order to compare lists
> of docompress -x and precompressed files, since the file lists
> can be extremely large. Also strip ${D%/} from paths in order to
> reduce length.
+1 looks much better.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: PGP signature