On 01/03/21 at 22:41 +0000, Paul Wise wrote: > On Mon, Mar 1, 2021 at 5:25 PM Holger Levsen wrote: > > > > How would the mirroring work? > > > > to be discussed, but my raw idea would be to use rsync with excluding the > > years > > before 2015 or 2017. or can't this work? 8-) > > That won't work, since the filesystem storing the data is hash (SHA1) > based, so you need to look up hashes for the relevant data in the > database and then copy only those files.
Hi, For https://trends.debian.net/, I have a local mirror of snapshot.d.o (with sources only, and only for specific versions). The code used to create it is available in https://salsa.debian.org/lucas/dhistory/-/blob/master/dhistory Specifically, it: - queries the snapshot DB to identify the files and hashes for each source package - fetches and analyses Sources files to identify (source,version) of interest, and thus hashes to transfer - transfers those hashes from snapshot.d.o to my own machine using rsync The query used for the first step is: psql -At service=snapshot-guest -c "select row_to_json(t) from (select srcpkg.name as source_name, srcpkg.version as source_version, file.name as file_name, file.hash as file_hash, file.size as file_size, node_with_ts.first_run as file_first_run, node_with_ts.last_run as file_last_run from srcpkg inner join file_srcpkg_mapping on srcpkg.srcpkg_id = file_srcpkg_mapping.srcpkg_id inner join file on file.hash = file_srcpkg_mapping.hash inner join node_with_ts on node_with_ts.node_id = file.node_id inner join archive on node_with_ts.archive_id = archive.archive_id where archive.name = 'debian') t" That's the query that would have to be adapted for binary packages and for a specific date range. Lucas _______________________________________________ Reproducible-builds mailing list Reproducible-builds@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/reproducible-builds