Pe sâmbătă, 13 iulie 2019, Maarten Dammers <[email protected]> a scris:

> Hi Huij,
>
> In my day job I'm a network engineer. Nothing smaller than a /24 gets
> routed on the internet. I would just do a quick and dirty approach: Ignore
> the last octet. So cache based on /24. If you want to go more complicated
> you can loose the length. An ipv4 address is 32 bit. A /24 says: Network is
> 24 bits and the host part is 8 bits. So for a /23 it's 23 bits of network
> and 9 bits of host. It's always on the bit boundary so a /24 is alway from
> 0 (network) to 255 (broadcast). Just Google a bit to find posts like
> https://learningnetwork.cisco.com/blogs/vip-perspectives/
> 2014/05/15/network-binary-math-explained . So comparison is very easy and
> very efficient.
>

IPv4 is easy, you can just go with a bit map and be done with it on a
decent pc. The problems are IPv6 and usage fragmentation, as described
below.

> How are you going to deal with providers that announce large chunks of ip
> space (like a /13) that are used for all sorts of things? I assume you want
> to use INET objects and not ROUTE objects? Be aware that mass harvesting of
> databases like RIPE isn't allowed. Also the quality of these objects differ
> greatly depending on the LIR/country/RIR.
>
I suspect the other apis used in the script are going to split these
networks a lot, thus my concern with running a trie at Wikipedia scale.
Maybe there's a way to split the ipv6 space just enough to be feasible to
use a bitmap as well?

> Maarten
> On 12-07-19 04:43, Huji Lee wrote:
>
> Hi all,
>
> I am working on a bot that fetches a list of anonymous editors on fawiki,
> uses WHOIS to retrieve more info about that IP, and uses a number of online
> APIs to check if the IP is a proxy or not.[1]
>
> I would like to improve the code by implementing a CIDR cache, so that if
> I run whois on 8.4.4.8 and determine that its ASN range is 8.4.4.0/24 and
> then I encounter 8.4.4.9 in the next iteration of my for loop, I would
> quickly determine this IP also belongs to the same range and skip the WHOIS
> part for it.
>
> The search space would consist of IP ranges like "8.4.4.0 - 8.4.4.25"
> (these are the beginning and end IP addresses of the 8.4.4.0/24 range).
> Obviously, we can convert these IPs to Hex to make comparisons easier.
> Given an IP like 8.4.4.9, we need the object to efficiently determine if it
> already has an IP range that encompasses this given IP and if so, return
> the previously cached details for that IP pair. If not, we will store that
> in cache.
>
> The part that I am not fully clear about is the following: how can I avoid
> having to loop through every range in the cache? Is there a way to create a
> hash function that checks two inequality comparisons efficiently?
>
> Thanks!
>
> Huji
>
> [1] https://github.com/PersianWikipedia/fawikibot/
> blob/master/HujiBot/findproxy.py
>
> _______________________________________________
> pywikibot mailing 
> [email protected]https://lists.wikimedia.org/mailman/listinfo/pywikibot
>
>
_______________________________________________
pywikibot mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot

Reply via email to