Re: Fun play with egrep, sed and awk

2019-12-27 Thread goleo .
On Fri, Dec 27, 2019 at 10:49 PM Guilherme Janczak
 wrote:
>
> On Thu, 26 Dec 2019 16:13:33 +
> "goleo ."  wrote:
>
> > I was wondering how much space distfiles on "ftp" take, so because
> > I couldn't see that in my web browser clearly, I downloaded the page
> > https://ftp.openbsd.org/pub/OpenBSD/distfiles/ as distfiles.txt
>
> With wget, you can download the HTML of a web page, and also recurse
> into links within it.
>
> $ wget -r -l 0 -A '*.html' --no-parent -O everything.html 
> https://ftp.openbsd.org/pub/OpenBSD/distfiles/
>
> This command recurses into an infinite number of links without going up
> in the hierarchy and into the parent directory, downloads only other
> .html files (from which more links can be acquired), and appends
> everything to an "everything.html" file.
>
> After a few minutes running and just ~1.7MiB of HTML downloaded, it
> tried to recurse into a lot of non-existing directories, so I cut it
> short there. The figure may not be perfect.
>
> $ grep -E '[0-9]$' everything.html | sed 's|.* \([0-9]*\)$|\1|' | awk 
> '{sum+=$1} END{print sum / 1024 / 1024}'
> 65629
>
>
> The sum of all filesizes, which are listed in kebibytes, divided by
> 1024^2, to turn it into gibibytes, returns 65629 gibibytes or about
> 65 tebibytes.
> This number seems a little absurd, I'm not sure if I made a mistake.
> It does not seem completely implausible either however, the tree
> does have files dating all the way back to 1990.
> https://ftp.openbsd.org/pub/OpenBSD/distfiles/ja-fonts/

Filesizes are listed just in bytes, that means your calculation shows
65629 megabytes.

Still nice, I didn't know it's so easy to fetch contents of
subdirectories :)



Re: Fun play with egrep, sed and awk

2019-12-27 Thread Guilherme Janczak
On Thu, 26 Dec 2019 16:13:33 +
"goleo ."  wrote:

> I was wondering how much space distfiles on "ftp" take, so because
> I couldn't see that in my web browser clearly, I downloaded the page
> https://ftp.openbsd.org/pub/OpenBSD/distfiles/ as distfiles.txt

With wget, you can download the HTML of a web page, and also recurse
into links within it. 

$ wget -r -l 0 -A '*.html' --no-parent -O everything.html 
https://ftp.openbsd.org/pub/OpenBSD/distfiles/

This command recurses into an infinite number of links without going up
in the hierarchy and into the parent directory, downloads only other
.html files (from which more links can be acquired), and appends 
everything to an "everything.html" file.

After a few minutes running and just ~1.7MiB of HTML downloaded, it 
tried to recurse into a lot of non-existing directories, so I cut it
short there. The figure may not be perfect.

$ grep -E '[0-9]$' everything.html | sed 's|.* \([0-9]*\)$|\1|' | awk 
'{sum+=$1} END{print sum / 1024 / 1024}'
65629


The sum of all filesizes, which are listed in kebibytes, divided by
1024^2, to turn it into gibibytes, returns 65629 gibibytes or about
65 tebibytes.
This number seems a little absurd, I'm not sure if I made a mistake.
It does not seem completely implausible either however, the tree 
does have files dating all the way back to 1990.
https://ftp.openbsd.org/pub/OpenBSD/distfiles/ja-fonts/



Re: Fun play with egrep, sed and awk

2019-12-27 Thread Stuart Henderson
On 2019-12-26, goleo .  wrote:
> I was wondering how much space distfiles on "ftp" take, so because
> I couldn't see that in my web browser clearly, I downloaded the page
> https://ftp.openbsd.org/pub/OpenBSD/distfiles/ as distfiles.txt

btw, there are files in subdirectories as well (another 35GB or so).
They are fetched with dpb(1)'s -F flag and old files are cleaned every
so often woth clean-old-distfiles(1) - the manuals are in base but the
actual programs are in the ports tree - so the total space depends on
how long old distfiles are kept when they're no longer used by a port.

> $ egrep '[0-9]$' distfiles.txt | sed 's|.* \([0-9]*\)$|\1|' | awk '{
> sum += $1 / 10 } END { print sum "G" }'
> 54.8126G
>
> Most of space is taken by distfiles which are at least 100 MB big:
>
> $ egrep '[0-9]{9}$' distfiles.txt | sed 's|.* \([0-9]*\)$|\1|' | awk
> '{ sum += $1 / 10 } END { print sum "G" }'
> 34.5359G

For more fun and efficiency, combine the egrep/sed commands into awk :)



Re: Fun play with egrep, sed and awk

2019-12-26 Thread Andreas Kusalananda Kähäri
On Thu, Dec 26, 2019 at 04:13:33PM +, goleo . wrote:
> I was wondering how much space distfiles on "ftp" take, so because
> I couldn't see that in my web browser clearly, I downloaded the page
> https://ftp.openbsd.org/pub/OpenBSD/distfiles/ as distfiles.txt and
> then I was lucky with HTML layout which allowed me to go
> straightforward:
> 
> $ egrep '[0-9]$' distfiles.txt | sed 's|.* \([0-9]*\)$|\1|' | awk '{
> sum += $1 / 10 } END { print sum "G" }'
> 54.8126G
> 
> Most of space is taken by distfiles which are at least 100 MB big:
> 
> $ egrep '[0-9]{9}$' distfiles.txt | sed 's|.* \([0-9]*\)$|\1|' | awk
> '{ sum += $1 / 10 } END { print sum "G" }'
> 34.5359G
> 
> Most of them are games, but what is Linux 4.20 kernel doing here?

See the sysutils/dtb port.

> 
> $ egrep '[0-9]{9}$' distfiles.txt | sed 's|\(.*\).*
[cut]
> linux-4.20.tar.xz 0.104258G
[cut]

-- 
Andreas (Kusalananda) Kähäri
SciLifeLab, NBIS, ICM
Uppsala University, Sweden



Re: Fun play with egrep, sed and awk

2019-12-26 Thread Stefan Sperling
On Thu, Dec 26, 2019 at 04:13:33PM +, goleo . wrote:
> Most of them are games, but what is Linux 4.20 kernel doing here?

sysutils/dtb



Re: Fun play with egrep, sed and awk

2019-12-26 Thread Sebastien Marie
On Thu, Dec 26, 2019 at 04:13:33PM +, goleo . wrote:
> I was wondering how much space distfiles on "ftp" take, so because
> I couldn't see that in my web browser clearly, I downloaded the page
> https://ftp.openbsd.org/pub/OpenBSD/distfiles/ as distfiles.txt and
> then I was lucky with HTML layout which allowed me to go
> straightforward:
> 
> $ egrep '[0-9]$' distfiles.txt | sed 's|.* \([0-9]*\)$|\1|' | awk '{
> sum += $1 / 10 } END { print sum "G" }'
> 54.8126G
> 
> Most of space is taken by distfiles which are at least 100 MB big:
> 
> $ egrep '[0-9]{9}$' distfiles.txt | sed 's|.* \([0-9]*\)$|\1|' | awk
> '{ sum += $1 / 10 } END { print sum "G" }'
> 34.5359G
> 
> Most of them are games, but what is Linux 4.20 kernel doing here?

you could also playing with SQL.

$ doas pkg_add sqlports
$ sqlite3 /usr/local/share/sqlports
sqlite> select fullpkgpath from distfiles where value like 'linux-4.20%';
sysutils/dtb

-- 
Sebastien Marie



Fun play with egrep, sed and awk

2019-12-26 Thread goleo .
I was wondering how much space distfiles on "ftp" take, so because
I couldn't see that in my web browser clearly, I downloaded the page
https://ftp.openbsd.org/pub/OpenBSD/distfiles/ as distfiles.txt and
then I was lucky with HTML layout which allowed me to go
straightforward:

$ egrep '[0-9]$' distfiles.txt | sed 's|.* \([0-9]*\)$|\1|' | awk '{
sum += $1 / 10 } END { print sum "G" }'
54.8126G

Most of space is taken by distfiles which are at least 100 MB big:

$ egrep '[0-9]{9}$' distfiles.txt | sed 's|.* \([0-9]*\)$|\1|' | awk
'{ sum += $1 / 10 } END { print sum "G" }'
34.5359G

Most of them are games, but what is Linux 4.20 kernel doing here?

$ egrep '[0-9]{9}$' distfiles.txt | sed 's|\(.*\).*
\([0-9]*\)$|\1 \2|' | awk '{ print $1 " " $2 / 10 "G"}'
0ad-0.0.23b-alpha-unix-data.tar.gz 0.884753G
FlightGear-2016.3.1-data.tar.bz2 1.43026G
FreeOrion_v0.4.8_2018-08-23.26f16b0_Source.tar.gz 0.106254G
GPSTk-2.10.tar.gz 0.123292G
LostPixels-0.5.3-source-with-addons.tar.gz 0.224314G
MuseScore-3.3.3.zip 0.124736G
MuseScore-3.3.4.zip 0.12474G
RetroArch-1.7.6.tar.xz 0.22278G
SuperTux-v0.6.0-Source.tar.gz 0.131204G
UrbanTerror434_full.zip 1.472G
ValyriaTear-src-with-deps-1.1.0.tar.gz 0.117039G
ZAP_2.7.0_Linux.tar.gz 0.130903G
chromium-76.0.3809.132.tar.xz 0.7289G
chromium-78.0.3904.106.tar.xz 0.74289G
chromium-79.0.3945.79.tar.xz 0.778215G
chromium-79.0.3945.88.tar.xz 0.778182G
dangerdeep-data-0.4.0_pre3327.zip 0.198099G
digikam-6.2.0.tar.xz 0.339473G
egoboo-2.7.4.tar.gz 0.143488G
elasticsearch-oss-7.4.2-darwin-x86_64.tar.gz 0.210139G
elasticsearch-oss-7.5.0-darwin-x86_64.tar.gz 0.210878G
fillets-ng-data-1.0.1.tar.gz 0.146419G
fira-fonts-20170227-a6069274.tar.gz 0.139255G
firefox-60.9.0esr.source.tar.xz 0.269089G
flang-8.0.1.20191107-cbadb276.tar.gz 0.133269G
flare-game-v1.11.tar.gz 0.14683G
freedroidRPG-0.16.1.tar.gz 0.226744G
gcompris-17.05.tar.bz2 0.333581G
ghidra_9.0.4_PUBLIC_20190516.zip 0.298504G
go-openbsd-arm-bootstrap-1.13.tar.gz 0.12035G
go-openbsd-arm64-bootstrap-1.13.tar.gz 0.11867G
hedgewars-src-0.9.25.tar.bz2 0.175277G
ideaIC-2019.2.3.tar.gz 0.673552G
iridium-browser-2019.11.78.tar.xz 0.762059G
kicad-packages3D-5.1.4.tar.gz 0.888666G
kicad-packages3D-5.1.5.tar.gz 0.17G
krita-4.2.8.2.tar.gz 0.246236G
lilypond-2.18.2-1.documentation.tar.bz2 0.231545G
linux-4.20.tar.xz 0.104258G
logstash-oss-7.4.2.tar.gz 0.173115G
logstash-oss-7.5.0.tar.gz 0.163987G
mame0216s.zip 0.172834G
mame0217s.zip 0.172974G
mattermost-5.17.1-linux-amd64.tar.gz 0.154934G
mattermost-5.18.0-linux-amd64.tar.gz 0.155307G
megaglest-data-3.13.0.tar.gz 0.353518G
mono-5.20.1.34.tar.bz2 0.246846G
netbeans-11.2-bin.zip 0.338525G
noto-cjk-2.001.tar.gz 1.88215G
noto-fonts-20171024.tar.gz 0.26134G
ogre-1.9.0.tar.gz 0.131566G
openarena-0.8.8.zip 0.425189G
openbsd-backgrounds-2.9.tar.gz 0.140614G
openclipart-2.0-full.tar.bz2 0.374733G
openclonk-8.1-src.tar.bz2 0.120149G
openfire_src_4_2_3.tar.gz 0.113557G
pioneer-20190203.tar.gz 0.360574G
plaso-20180818.tar.gz 0.109783G
pycharm-community-2019.2.5.tar.gz 0.36381G
qgis-3.10.0.tar.bz2 0.101076G
qt-everywhere-opensource-src-4.8.7.tar.gz 0.241076G
raspberrypi-firmware-1.20190925.tar.gz 0.185571G
redeclipse_1.6.0_combined.tar.bz2 0.906217G
sauerbraten_2013_02_03_collect_edition_linux.tar.bz2 0.589941G
solr-8.3.0.tgz 0.186098G
solr-8.3.1.tgz 0.186101G
speed-dreams-src-base-2.2.1-r6404.tar.xz 0.161632G
speed-dreams-src-hq-cars-and-tracks-2.2.1-r6404.tar.xz 0.452312G
speed-dreams-src-more-hq-cars-and-tracks-2.2.1-r6404.tar.xz 0.530668G
speed-dreams-src-wip-cars-and-tracks-2.2.1-r6404.tar.xz 0.250477G
stellarium-0.19.2.tar.gz 0.318683G
sumwars-0.5.8-src.tar.bz2 0.107811G
supertuxkart-0.9.3-src.tar.xz 0.544518G
t-engine4-src-1.5.10.tar.bz2 0.42266G
taiwan-cns11643-fonts-103.1.tar.gz 0.150262G
telegraf-1.12.3.tar.gz 0.127591G
tessdata_fast-4.0.0.tar.gz 0.35138G
texlive-20190410-texmf.tar.xz 2.84581G
tuxpaint-stamps-2018.09.01.tar.gz 0.194176G
ufoai-2.5-data.tar 1.27714G
unifi-5.11.50.zip 0.112355G
unknown-horizons-2017.2.tar.gz 0.266301G
vegastrike-data-0.5.1.r1.tar.bz2 0.447919G
vegastrike-music-0.5.1.r1.tar 0.164465G
virtuoso-opensource-6.1.6.tar.gz 0.113255G
warmux-11.04.1.tar.bz2 0.110084G
wesnoth-1.14.7.tar.bz2 0.452705G
widelands-build20.tar.bz2 0.232364G
wkhtmltopdf-qt-5db36ec76b29712eb2c5bd0625c2c77d7468b3fc_1.tar.gz 0.173051G
xonotic-0.8.2.zip 0.991046G