Re: Fun play with egrep, sed and awk
On Fri, Dec 27, 2019 at 10:49 PM Guilherme Janczak wrote: > > On Thu, 26 Dec 2019 16:13:33 + > "goleo ." wrote: > > > I was wondering how much space distfiles on "ftp" take, so because > > I couldn't see that in my web browser clearly, I downloaded the page > > https://ftp.openbsd.org/pub/OpenBSD/distfiles/ as distfiles.txt > > With wget, you can download the HTML of a web page, and also recurse > into links within it. > > $ wget -r -l 0 -A '*.html' --no-parent -O everything.html > https://ftp.openbsd.org/pub/OpenBSD/distfiles/ > > This command recurses into an infinite number of links without going up > in the hierarchy and into the parent directory, downloads only other > .html files (from which more links can be acquired), and appends > everything to an "everything.html" file. > > After a few minutes running and just ~1.7MiB of HTML downloaded, it > tried to recurse into a lot of non-existing directories, so I cut it > short there. The figure may not be perfect. > > $ grep -E '[0-9]$' everything.html | sed 's|.* \([0-9]*\)$|\1|' | awk > '{sum+=$1} END{print sum / 1024 / 1024}' > 65629 > > > The sum of all filesizes, which are listed in kebibytes, divided by > 1024^2, to turn it into gibibytes, returns 65629 gibibytes or about > 65 tebibytes. > This number seems a little absurd, I'm not sure if I made a mistake. > It does not seem completely implausible either however, the tree > does have files dating all the way back to 1990. > https://ftp.openbsd.org/pub/OpenBSD/distfiles/ja-fonts/ Filesizes are listed just in bytes, that means your calculation shows 65629 megabytes. Still nice, I didn't know it's so easy to fetch contents of subdirectories :)
Re: Fun play with egrep, sed and awk
On Thu, 26 Dec 2019 16:13:33 + "goleo ." wrote: > I was wondering how much space distfiles on "ftp" take, so because > I couldn't see that in my web browser clearly, I downloaded the page > https://ftp.openbsd.org/pub/OpenBSD/distfiles/ as distfiles.txt With wget, you can download the HTML of a web page, and also recurse into links within it. $ wget -r -l 0 -A '*.html' --no-parent -O everything.html https://ftp.openbsd.org/pub/OpenBSD/distfiles/ This command recurses into an infinite number of links without going up in the hierarchy and into the parent directory, downloads only other .html files (from which more links can be acquired), and appends everything to an "everything.html" file. After a few minutes running and just ~1.7MiB of HTML downloaded, it tried to recurse into a lot of non-existing directories, so I cut it short there. The figure may not be perfect. $ grep -E '[0-9]$' everything.html | sed 's|.* \([0-9]*\)$|\1|' | awk '{sum+=$1} END{print sum / 1024 / 1024}' 65629 The sum of all filesizes, which are listed in kebibytes, divided by 1024^2, to turn it into gibibytes, returns 65629 gibibytes or about 65 tebibytes. This number seems a little absurd, I'm not sure if I made a mistake. It does not seem completely implausible either however, the tree does have files dating all the way back to 1990. https://ftp.openbsd.org/pub/OpenBSD/distfiles/ja-fonts/
Re: Fun play with egrep, sed and awk
On 2019-12-26, goleo . wrote: > I was wondering how much space distfiles on "ftp" take, so because > I couldn't see that in my web browser clearly, I downloaded the page > https://ftp.openbsd.org/pub/OpenBSD/distfiles/ as distfiles.txt btw, there are files in subdirectories as well (another 35GB or so). They are fetched with dpb(1)'s -F flag and old files are cleaned every so often woth clean-old-distfiles(1) - the manuals are in base but the actual programs are in the ports tree - so the total space depends on how long old distfiles are kept when they're no longer used by a port. > $ egrep '[0-9]$' distfiles.txt | sed 's|.* \([0-9]*\)$|\1|' | awk '{ > sum += $1 / 10 } END { print sum "G" }' > 54.8126G > > Most of space is taken by distfiles which are at least 100 MB big: > > $ egrep '[0-9]{9}$' distfiles.txt | sed 's|.* \([0-9]*\)$|\1|' | awk > '{ sum += $1 / 10 } END { print sum "G" }' > 34.5359G For more fun and efficiency, combine the egrep/sed commands into awk :)
Re: Fun play with egrep, sed and awk
On Thu, Dec 26, 2019 at 04:13:33PM +, goleo . wrote: > I was wondering how much space distfiles on "ftp" take, so because > I couldn't see that in my web browser clearly, I downloaded the page > https://ftp.openbsd.org/pub/OpenBSD/distfiles/ as distfiles.txt and > then I was lucky with HTML layout which allowed me to go > straightforward: > > $ egrep '[0-9]$' distfiles.txt | sed 's|.* \([0-9]*\)$|\1|' | awk '{ > sum += $1 / 10 } END { print sum "G" }' > 54.8126G > > Most of space is taken by distfiles which are at least 100 MB big: > > $ egrep '[0-9]{9}$' distfiles.txt | sed 's|.* \([0-9]*\)$|\1|' | awk > '{ sum += $1 / 10 } END { print sum "G" }' > 34.5359G > > Most of them are games, but what is Linux 4.20 kernel doing here? See the sysutils/dtb port. > > $ egrep '[0-9]{9}$' distfiles.txt | sed 's|\(.*\).* [cut] > linux-4.20.tar.xz 0.104258G [cut] -- Andreas (Kusalananda) Kähäri SciLifeLab, NBIS, ICM Uppsala University, Sweden
Re: Fun play with egrep, sed and awk
On Thu, Dec 26, 2019 at 04:13:33PM +, goleo . wrote: > Most of them are games, but what is Linux 4.20 kernel doing here? sysutils/dtb
Re: Fun play with egrep, sed and awk
On Thu, Dec 26, 2019 at 04:13:33PM +, goleo . wrote: > I was wondering how much space distfiles on "ftp" take, so because > I couldn't see that in my web browser clearly, I downloaded the page > https://ftp.openbsd.org/pub/OpenBSD/distfiles/ as distfiles.txt and > then I was lucky with HTML layout which allowed me to go > straightforward: > > $ egrep '[0-9]$' distfiles.txt | sed 's|.* \([0-9]*\)$|\1|' | awk '{ > sum += $1 / 10 } END { print sum "G" }' > 54.8126G > > Most of space is taken by distfiles which are at least 100 MB big: > > $ egrep '[0-9]{9}$' distfiles.txt | sed 's|.* \([0-9]*\)$|\1|' | awk > '{ sum += $1 / 10 } END { print sum "G" }' > 34.5359G > > Most of them are games, but what is Linux 4.20 kernel doing here? you could also playing with SQL. $ doas pkg_add sqlports $ sqlite3 /usr/local/share/sqlports sqlite> select fullpkgpath from distfiles where value like 'linux-4.20%'; sysutils/dtb -- Sebastien Marie
Fun play with egrep, sed and awk
I was wondering how much space distfiles on "ftp" take, so because I couldn't see that in my web browser clearly, I downloaded the page https://ftp.openbsd.org/pub/OpenBSD/distfiles/ as distfiles.txt and then I was lucky with HTML layout which allowed me to go straightforward: $ egrep '[0-9]$' distfiles.txt | sed 's|.* \([0-9]*\)$|\1|' | awk '{ sum += $1 / 10 } END { print sum "G" }' 54.8126G Most of space is taken by distfiles which are at least 100 MB big: $ egrep '[0-9]{9}$' distfiles.txt | sed 's|.* \([0-9]*\)$|\1|' | awk '{ sum += $1 / 10 } END { print sum "G" }' 34.5359G Most of them are games, but what is Linux 4.20 kernel doing here? $ egrep '[0-9]{9}$' distfiles.txt | sed 's|\(.*\).* \([0-9]*\)$|\1 \2|' | awk '{ print $1 " " $2 / 10 "G"}' 0ad-0.0.23b-alpha-unix-data.tar.gz 0.884753G FlightGear-2016.3.1-data.tar.bz2 1.43026G FreeOrion_v0.4.8_2018-08-23.26f16b0_Source.tar.gz 0.106254G GPSTk-2.10.tar.gz 0.123292G LostPixels-0.5.3-source-with-addons.tar.gz 0.224314G MuseScore-3.3.3.zip 0.124736G MuseScore-3.3.4.zip 0.12474G RetroArch-1.7.6.tar.xz 0.22278G SuperTux-v0.6.0-Source.tar.gz 0.131204G UrbanTerror434_full.zip 1.472G ValyriaTear-src-with-deps-1.1.0.tar.gz 0.117039G ZAP_2.7.0_Linux.tar.gz 0.130903G chromium-76.0.3809.132.tar.xz 0.7289G chromium-78.0.3904.106.tar.xz 0.74289G chromium-79.0.3945.79.tar.xz 0.778215G chromium-79.0.3945.88.tar.xz 0.778182G dangerdeep-data-0.4.0_pre3327.zip 0.198099G digikam-6.2.0.tar.xz 0.339473G egoboo-2.7.4.tar.gz 0.143488G elasticsearch-oss-7.4.2-darwin-x86_64.tar.gz 0.210139G elasticsearch-oss-7.5.0-darwin-x86_64.tar.gz 0.210878G fillets-ng-data-1.0.1.tar.gz 0.146419G fira-fonts-20170227-a6069274.tar.gz 0.139255G firefox-60.9.0esr.source.tar.xz 0.269089G flang-8.0.1.20191107-cbadb276.tar.gz 0.133269G flare-game-v1.11.tar.gz 0.14683G freedroidRPG-0.16.1.tar.gz 0.226744G gcompris-17.05.tar.bz2 0.333581G ghidra_9.0.4_PUBLIC_20190516.zip 0.298504G go-openbsd-arm-bootstrap-1.13.tar.gz 0.12035G go-openbsd-arm64-bootstrap-1.13.tar.gz 0.11867G hedgewars-src-0.9.25.tar.bz2 0.175277G ideaIC-2019.2.3.tar.gz 0.673552G iridium-browser-2019.11.78.tar.xz 0.762059G kicad-packages3D-5.1.4.tar.gz 0.888666G kicad-packages3D-5.1.5.tar.gz 0.17G krita-4.2.8.2.tar.gz 0.246236G lilypond-2.18.2-1.documentation.tar.bz2 0.231545G linux-4.20.tar.xz 0.104258G logstash-oss-7.4.2.tar.gz 0.173115G logstash-oss-7.5.0.tar.gz 0.163987G mame0216s.zip 0.172834G mame0217s.zip 0.172974G mattermost-5.17.1-linux-amd64.tar.gz 0.154934G mattermost-5.18.0-linux-amd64.tar.gz 0.155307G megaglest-data-3.13.0.tar.gz 0.353518G mono-5.20.1.34.tar.bz2 0.246846G netbeans-11.2-bin.zip 0.338525G noto-cjk-2.001.tar.gz 1.88215G noto-fonts-20171024.tar.gz 0.26134G ogre-1.9.0.tar.gz 0.131566G openarena-0.8.8.zip 0.425189G openbsd-backgrounds-2.9.tar.gz 0.140614G openclipart-2.0-full.tar.bz2 0.374733G openclonk-8.1-src.tar.bz2 0.120149G openfire_src_4_2_3.tar.gz 0.113557G pioneer-20190203.tar.gz 0.360574G plaso-20180818.tar.gz 0.109783G pycharm-community-2019.2.5.tar.gz 0.36381G qgis-3.10.0.tar.bz2 0.101076G qt-everywhere-opensource-src-4.8.7.tar.gz 0.241076G raspberrypi-firmware-1.20190925.tar.gz 0.185571G redeclipse_1.6.0_combined.tar.bz2 0.906217G sauerbraten_2013_02_03_collect_edition_linux.tar.bz2 0.589941G solr-8.3.0.tgz 0.186098G solr-8.3.1.tgz 0.186101G speed-dreams-src-base-2.2.1-r6404.tar.xz 0.161632G speed-dreams-src-hq-cars-and-tracks-2.2.1-r6404.tar.xz 0.452312G speed-dreams-src-more-hq-cars-and-tracks-2.2.1-r6404.tar.xz 0.530668G speed-dreams-src-wip-cars-and-tracks-2.2.1-r6404.tar.xz 0.250477G stellarium-0.19.2.tar.gz 0.318683G sumwars-0.5.8-src.tar.bz2 0.107811G supertuxkart-0.9.3-src.tar.xz 0.544518G t-engine4-src-1.5.10.tar.bz2 0.42266G taiwan-cns11643-fonts-103.1.tar.gz 0.150262G telegraf-1.12.3.tar.gz 0.127591G tessdata_fast-4.0.0.tar.gz 0.35138G texlive-20190410-texmf.tar.xz 2.84581G tuxpaint-stamps-2018.09.01.tar.gz 0.194176G ufoai-2.5-data.tar 1.27714G unifi-5.11.50.zip 0.112355G unknown-horizons-2017.2.tar.gz 0.266301G vegastrike-data-0.5.1.r1.tar.bz2 0.447919G vegastrike-music-0.5.1.r1.tar 0.164465G virtuoso-opensource-6.1.6.tar.gz 0.113255G warmux-11.04.1.tar.bz2 0.110084G wesnoth-1.14.7.tar.bz2 0.452705G widelands-build20.tar.bz2 0.232364G wkhtmltopdf-qt-5db36ec76b29712eb2c5bd0625c2c77d7468b3fc_1.tar.gz 0.173051G xonotic-0.8.2.zip 0.991046G