[Trisquel-users] Re : Scripting the random replacement of fields in an IPv6 address

lcerf Sat, 04 Apr 2020 15:30:30 -0700

That script generates a 164KB file with 4096 entries in about five minutesreal time.

I edited my previous post to reduce the unnecessarily high precision. Now,on my system, generating 4096 addresses takes ~15s.


grep -c -o -i :0 IPv6-SS.IPv6-NLU-2a02.2788.MB4420-4096.txt ==> 4095
grep -c -o -i :00 IPv6-SS.IPv6-NLU-2a02.2788.MB4420-4096.txt ==> 4053
grep -c -o -i :000 IPv6-SS.IPv6-NLU-2a02.2788.MB4420-4096.txt ==> 3599

Options -o and -i are useless here. You may believe that using -o would make'grep -c' count all occurrences on a line. It does not. It still counts thenumber of lines with at least one occurrence among the six random groups offour hexadecimal digits. Those outputs therefore mean that, inIPv6-SS.IPv6-NLU-2a02.2788.MB4420-4096.txt, all addresses but one have atleast one group that starts with "0", 99.0% have at least one group thatstarts with "00", 87.9% have at least one group that starts with "000".

Extending Magic Banana's reasoning about the relative frequency ofoccurrences of :0001, :0002 and :0003, the relative frequencies of theoccurrences of :0xxx, :00xx, and :000x in a 4096-row list of IPv6 addressesought to be 256/4096, 16/4096, and 1/4096, respectively. In a 65,536-addresslist, prefix::0/128 may happen just once.

First of all, it is not a reasoning but a choice of distribution to samplefrom. I believe groups of four hexadecimal digits chosen by local networkadministrators approximately follow a Zipfian distribution. The exponent maynot be 1 though. A more realistic exponent could be fitted from real-worldaddresses, by regression.

Your math looks wrong. If correct, the following AWK program outputs theprobabilities to sample 0000, 000x, 00xx or 0xxx:$ awk 'BEGIN { i = 1; for (p = 0; p != 5; ++p) { for (; i < 16^p + 1; ++i)cdf += 1 / i; partial[p] = cdf }; for (p = 0; p != 4; ++p) print partial[p] /cdf }'

Technically, generating 0000 or not is the realization of a Bernoullivariable of parameter 0.0857076, generating 000x or not is that of aBernoulli variable of parameter 0.289754, etc. Complementing the aboveprogram, here are, among 4096 addresses, the expected numbers of addresseswith at least one 0000, at least one 000x, at least one 00x and at least one0xxx:$ awk 'BEGIN { i = 1; for (p = 0; p != 5; ++p) { for (; i < 16^p + 1; ++i)cdf += 1 / i; partial[p] = cdf }; for (p = 0; p != 4; ++p) print 4096 - 4096* (1 - partial[p] / cdf)^6 }'

That looks compatible with the counts your 'grep -c' output. A p-value couldbe computed.... but I will stop here with the statistics!

It would appear that one needs to concatenate the variously randomized listsof addresses, eliminate duplicates, and then apply the last pair of scriptsto achieve a relatively accurate evaluation of the target CIDR block.

Duplicates are unlikely. I will not do the math to compute the probabilityof any duplicate. Notice however that the probability to get the most likelyaddress, ending with 0000:0000:0000:0000:0000:0000, is 0.0857076^6 =.000000396384. That is about 4 in 10 millions. You can figure out thatgetting it twice or more among 4096 addresses is therefore extremelyunlikely.

Could it be that the 79,228,162,514,264,337,593,543,950,336 addresses in2a02:2788::/32 are dynamically generated on demand ?

If you could generate one billion addresses per second, it would take79,228,162,514,264,337,594 seconds to generate them all. That is more than2510 billions of years.

[Trisquel-users] Re : Scripting the random replacement of fields in an IPv6 address

Reply via email to