On Wed, Nov 16, 2005 at 04:33:02AM +0100, Marius Mauch wrote:
> On Wed, 16 Nov 2005 04:07:56 +0100
> Marius Mauch <[EMAIL PROTECTED]> wrote:
> 
> > Hey,
> > 
> > IIRC we (=Gentoo as a whole) pretty much agreed to drop the digest
> > files in favor of a extended Manifest format. Well, today I wrote some
> > initial code for this new format, you can find the result in the
> > attachment. It's far from complete, but most basic stuff seems to work
> > with some restrictions:
> > - `ebuild foo digest` works if you have all SRC_URI files for the
> > package (= all ebuilds of that package), however it will not verify
> > the Manifest before overwriting it. Also it currently won't create
> > the old digests (digestgen is a bit nasty).
> > - `emerge foo` should also work (thankfully the parsing could be
> > implemented in a simple compat layer)
> > - repoman will likely have problems (didn't touch it yet)
> > - I didn't add al the fancy exception handling and cvs code (which IMO
> > shouldn't be in portage itself but repoman)
> > 
> > Opinions about adding Manifest2 support to the next "minor" version
> > (2.2.0 or 2.0.54 or 2.1.0 or ...)?
I'm game, assuming it's sane.

> Index: pym/portage.py
> ===================================================================
> --- pym/portage.py    (revision 2312)
> +++ pym/portage.py    (working copy)
> @@ -95,6 +95,7 @@
>       import portage_gpg
>       import portage_locks
>       import portage_exec
> +     import portage_manifest
>       from portage_locks import unlockfile,unlockdir,lockfile,lockdir
>       import portage_checksum
>       from portage_checksum import 
> perform_md5,perform_checksum,prelink_capable
> @@ -1723,9 +1724,9 @@
>                                               mymirrors += [x]
>  
>       mydigests = {}
> -     digestfn  = mysettings["FILESDIR"]+"/digest-"+mysettings["PF"]
> +     digestfn  = 
> portdb.finddigest(mysettings["CATEGORY"]+"/"+mysettings["PF"])
Like this mod personally, but don't like the '/'.  You're passing in 
two chunks of data, pass them in seperate imo, rather then collapsing 
it for every call.

>       if os.path.exists(digestfn):
> -             mydigests = digestParseFile(digestfn)
> +             mydigests = digestParseFile(digestfn, calledFor="digest")

>  
>       fsmirrors = []
>       for x in range(len(mymirrors)-1,-1,-1):
> @@ -2136,8 +2137,10 @@
>               # Track the old digest so we can assume checksums without 
> requiring
>               # all files to be downloaded. 'Assuming'
>               myolddigest = {}
> -             if os.path.exists(digestfn):
> -                     myolddigest = digestParseFile(digestfn)
> +             if portage_manifest.getManifestVersion(manifestfn) == 2:
> +                     myolddigests = digestParseFile(manifestfn, 
> calledFor="digest")
> +             elif os.path.exists(digestfn):
> +                     myolddigest = digestParseFile(digestfn, 
> calledFor="digest")

Kind of disliking the 'calledFor' being thrown around here, plus the 
v2 check, then resorting to a seperate method of checking.
Integrate 'em potentially?

You're basically treating it as two seperate funcs (from how I look at 
it), should split it up (even if the seperated funcs call each other, 
or a common func).

> @@ -2218,7 +2229,7 @@
>       return 1
>  
>  
> -def digestParseFile(myfilename):
> +def digestParseFile(myfilename, calledFor="manifest"):
>       """(filename) -- Parses a given file for entries matching:
>       MD5 MD5_STRING_OF_HEX_CHARS FILE_NAME FILE_SIZE
>       Ignores lines that do not begin with 'MD5' and returns a
> @@ -2230,6 +2241,10 @@
>  
>       mydigests={}
>       for x in mylines:
> +             tmp = portage_manifest.convertManifest2ToDigest(x, 
> calledAs=calledFor)
> +             if tmp[0] == True:
> +                     mydigests[tmp[1]] = tmp[2]
> +                     continue
>               myline=string.split(x)
>               if len(myline) < 4:
>                       #invalid line
> @@ -2258,7 +2273,10 @@
>       the basedir given. Returns 1 only if all files exist and match the md5s.
>       """
>       for x in myfiles:
> -             if not mydigests.has_key(x):
> +             if portage_manfest.getManifestVersion(basedir+"/Manifest") == 2 
> and x.startswith("files/digest-"):
hardcoding sucks, kthnx
If possible, try to shove it into the func so modification (down the line) 
occurs 
in one spot, rather then having to resort to search/replace :)

> Index: pym/portage_checksum.py
> ===================================================================
> --- pym/portage_checksum.py   (revision 2312)
> +++ pym/portage_checksum.py   (working copy)
> @@ -126,3 +126,12 @@
>               portage_locks.unlockfile(mylock)
>  
>       return (myhash,mysize)
> +
> +
> +hashfunc_map = {"MD5": md5hash, "SHA1": sha1hash}
> +
> +def perform_multiple_checksums(filename, hashes=["MD5"], calc_prelink=0):
> +     rVal = {}
> +     for x in hashes:
> +             rVal[x] = perform_checksum(filename, hashfunc_map[x], 
> calc_prelink)[0]
> +     return rVal
Not sure if I like the [0] tbh.
Still think the size test should be broken off as it's own pseudo 
hashfunc, and implemented in that way.  Course, would need to weight 
it somehow, so that size test is done first, then the more expensive 
chf tests...

> Index: pym/portage_manifest.py
> ===================================================================
> --- pym/portage_manifest.py   (revision 0)
> +++ pym/portage_manifest.py   (revision 0)
> @@ -0,0 +1,163 @@
> +import os
> +
> +import portage, portage_exception, portage_checksum, portage_versions
> +
> +class Manifest2Entry:
> +     def __init__(self, entrytype, name, config, 
> methods=portage_checksum.get_valid_checksum_keys()):
The methods trick you're doing is a one time evaluation... not sure if 
that's the best approach tbh.

> +             if entrytype not in ["EBUILD", "AUXFILE", "MISCFILE", "SRCURI"]:
> +                     raise portage_exception.PortageException("unknown 
> Manifest entry type: %s" % entrytype)
break that out into a class constant, and please kill off the capps 
(they kill kittens, or something).

> +             self.type = entrytype
> +             self.name = name
> +             self.methods = methods
> +             self.config = config
I realize stable does this all over, but I really am not much for 
storing a config instance on this attribute.

Thoughts on yanking the settings you need up front?

> +     def generate(self, pkgdir, genOldStyle=True):
> +             if self.type == "EBUILD" or self.type == "MISCFILE":
> +                     filename = pkgdir+"/"+self.name
> +             elif self.type == "AUXFILE":
> +                     filename = pkgdir+"/files/"+self.name
> +             elif self.type == "SRCURI":
> +                     filename = self.config["DISTDIR"]+"/"+self.name
> +             if not os.path.exists(filename):
> +                     raise portage_exception.PortageException("file for 
> Manifest entry doesn't exist: %s" % filename)
> +             
> +             self.size = os.stat(filename).st_size
> +             self.checksums = 
> portage_checksum.perform_multiple_checksums(filename, self.methods)
See comment about making size into a pseudo chf func- would eliminate 
this case.
Chunking this up into type handlers might be wise- would make it 
easier to extend (modify a class dict rather then N funcs).  You've 
already got the seperated beast, just throw it into a mapping and 
it'll be nice and pretty.

> +             cs_list = [x+" "+self.checksums[x] for x in 
> self.checksums.keys()]
> +             self.entry = " ".join([self.type, self.name, 
> str(self.size)]+cs_list)
> +             if genOldStyle and self.type != "SRCURI":
> +                     if "MD5" in self.methods:
> +                             oldentry = Manifest2Entry._makeOldEntry(pkgdir, 
> filename, size=self.size, md5sum=self.checksums["MD5"])
> +                     else:
> +                             oldentry = Manifest2Entry._makeOldEntry(pkgdir, 
> filename, size=self.size)
Kinda iffy on this approach also, seems unclean (quite subjective I 
realize).

> +                     self.entry += "\n"+oldentry
> +             return self.entry
> +     
> +     def generateOldDigestEntries(self, pkgdir, digestfiles):
> +             rVal = []
> +             for d in digestfiles:
> +                     filename = pkgdir+"/"+d
> +                     if not os.path.exists(filename):
> +                             raise 
> portage_exception.PortageException("digest file for Manifest compatibility 
> entry doesn't exist: %s" % filename)
> +                     rVal.append(Manifest2Entry._makeOldEntry(pkgdir, 
> filename))
> +             return rVal
> +     generateOldDigestEntries = classmethod(generateOldDigestEntries)
> +     
> +     def _makeOldEntry(self, pkgdir, filename, size=None, md5sum=None):
> +             oldname = "/".join(filename.split("/")[len(pkgdir.split("/")):])
> +             if md5sum == None:
> +                     md5sum = portage_checksum.perform_md5(filename)
> +             if size == None:
> +                     size = os.stat(filename).st_size
> +             return " ".join(["MD5", md5sum, oldname, str(size)])
> +     _makeOldEntry = classmethod(_makeOldEntry)
> +
> +def manifest2AuxfileFilter(filename):
> +     filename = filename.strip("/")
> +     return not (filename in ["CVS", ".svn"] or filename[:len("digest-")] == 
> "digest-")
> +
> +def manifest2MiscfileFilter(filename):
> +     filename = filename.strip("/")
> +     return not (filename in ["CVS", ".svn", "files", "Manifest"] or 
> filename[-7:] == ".ebuild")
> +
> +def verifyManifest2(cpv, config):
> +     return (True, None)
> +
> +def createFullManifest2(pkgdir, config, checkExisting=False, 
> requireAllSrcUriFiles=True, createOldDigests=True, manifestCompatible=True, 
> debug=1):
> +     mydb = portage.db["/"]["porttree"].dbapi
Really don't like that.
Pass it in.

> +     
> +     if debug:
> +             print "pkgdir:", pkgdir
> +     
> +     cpv_list = [pkgdir.rstrip("/").split("/")[-2]+"/"+x[:-7] for x in 
> portage.listdir(pkgdir) if x.endswith(".ebuild")]
os.path.sep...
not much for the portage.listdir reference there either, kind of an 
indication the evil listdir/cachedir should be broken out into their 
own module.

> +     srcuri_list = []
> +     auxfile_list = []
> +     miscfile_list = []
> +     
> +     if checkExisting:
> +             (isOk, brokenFile) = verifyManifest2(pkgdir, config)
> +             if not isOk:
> +                     return (1, brokenFile)
> +     for cpv in cpv_list:
> +             srcuri_list.extend([x.split("/")[-1] for x in mydb.aux_get(cpv, 
> ["SRC_URI"])[0].split()])
> +     auxfile_list = filter(manifest2AuxfileFilter, 
> portage.listdir(pkgdir+"/files/", recursive=1, filesonly=1))
> +     miscfile_list = filter(manifest2MiscfileFilter, portage.listdir(pkgdir, 
> recursive=0, filesonly=1))
> +     
> +     if requireAllSrcUriFiles:
> +             for f in srcuri_list:
> +                     if not os.path.exists(config["DISTDIR"]+"/"+f):
> +                             return (2, f)
> +     else:
> +             srcuri_list = [f for f in srcuri_list if 
> os.path.exists(config["DISTDIR"]+"/"+f)]
> +     
> +     if debug:
> +             print "pkgdir:", pkgdir
> +             print "CPV:", cpv_list
> +             print "SRC_URI:", srcuri_list
> +             print "auxfiles:", auxfile_list
> +             print "miscfiles:", miscfile_list
> +
> +     myentries = []
> +     myentries += [Manifest2Entry("EBUILD", 
> portage_versions.catsplit(x)[1]+".ebuild", config).generate(pkgdir, 
> genOldStyle=manifestCompatible) for x in cpv_list]
> +     myentries += [Manifest2Entry("AUXFILE", x, config).generate(pkgdir, 
> genOldStyle=manifestCompatible) for x in auxfile_list]
> +     myentries += [Manifest2Entry("MISCFILE", x, config).generate(pkgdir, 
> genOldStyle=manifestCompatible) for x in miscfile_list]
> +     myentries += [Manifest2Entry("SRCURI", x, config).generate(pkgdir, 
> genOldStyle=manifestCompatible) for x in srcuri_list]
> +     
> +     if createOldDigests:
> +             # TODO
> +             pass
> +     
> +     if manifestCompatible:
> +             digestfiles = ["files/"+x for x in 
> portage.listdir(pkgdir+"/files/") if x.startswith("digest-")]
> +             myentries += Manifest2Entry.generateOldDigestEntries(pkgdir, 
> digestfiles)
> +     
> +     # hack to allow sorting
> +     myentries = ("\n".join(myentries)).split("\n")
> +     myentries.sort()
> +     
> +     if debug:
> +             for x in myentries:
> +                     print x
> +     manifest = "\n".join(myentries)
> +     manifestfile = open(pkgdir+"/Manifest", "w")
> +     manifestfile.write(manifest)
> +     manifestfile.close()
Not commenting too much on the bits above atm, but I'd personally open 
a tmpfile alongside the manifest and write to it directly, same for 
digest, mving it in afterwards (or unlinking on failure).

> +     
> +     return (0, None)
Eh?
Exceptions seem like a better route here then magic constants and 
string error msgs.

> +def convertManifest2ToDigest(manifestline, calledAs="manifest"):
> +     mysplit = manifestline.split()
> +     if mysplit[0] not in ["EBUILD", "AUXFILE", "MISCFILE", "SRCURI"] or 
> ((len(mysplit) - 3) % 2) != 0:
> +             return (False, None, None)
> +     if mysplit[0] == "SRCURI" and calledAs == "manifest":
> +             return (False, None, None)
> +     if mysplit[0] in ["EBUILD", "AUXFILE", "MISCFILE"] and calledAs == 
> "digest":
> +             return (False, None, None)
> +     
> +     checksums = {}
> +     
> +     mytype = mysplit.pop(0)
> +     filename = mysplit.pop(0)
> +     if mytype == "AUXFILE":
> +             filename = "files/"+filename
> +     size = long(mysplit.pop(0))
> +     checksums = dict(zip(mysplit[::2], mysplit[1::2]))
> +     if not checksums.has_key("size"):
> +             checksums["size"] = size
> +     
> +     return (True, filename, checksums)
> +
> +def getManifestVersion(manifest):
> +     rVal = 0
> +     
> +     if not os.path.exists(manifest):
> +             return rVal
> +     myfile = open(manifest, "r")
> +     lines = myfile.readlines()
> +     for l in lines:
> +             mysplit = l.split()
> +             if mysplit[0] == "MD5" and len(mysplit) == 4 and rVal == 0:
> +                     rVal = 1
> +             elif mysplit[0] in ["EBUILD", "AUXFILE", "MISCFILE", "SRCURI"] 
> and ((len(mysplit) - 3) % 2) == 0 and rVal in [0,1]:
> +                     rVal = 2
> +     return rVal

Offhand... I really think you need to do some subclassing above.  
Flow's mostly there, but a lot of icky stuff for long term maintenance 
(imo).  Seems like a refactoring of the patch would bring line count 
down, and up readability.

Note that was just a read through... haven't tested it yet, just code 
comments.  Meanwhile, damn nice to see manifest/digest code (ty) :)
~harring

Attachment: pgpFNSFAr5ydY.pgp
Description: PGP signature

Reply via email to