> Hey, > > (re-adding the list to the Cc, I assume you accidentally didn't include > it in your reply)
Hi, Florian. Yes. Sorry, my fault. Thanks for adding it back. > That's true, and a valid concern. Currently we just can test for > membership of a host in a set, which is O(1). Host blocking lookups are > done quite a lot, and there are over 50k blocked hosts with the default > blocklist, so having that O(n) instead of O(1) is probably a noticable > performance difference. > > I added a benchmark test for adblock lookups: > https://github.com/qutebrowser/qutebrowser/commit/27d4796c2f9ced450ceaf3569de228532ce08df7 > > Currently looking up whether a host is blocked takes 3us (median). > When changing the lookup to: > > any(h == host for h in self._blocked_hosts) > > instead of > > host in self._blocked_hosts > > it instead takes 5.6ms, so almost 2000 times slower. Seeing that there > can be hundreds of adblock lookups for a page load, that's not > acceptable. > > > Like you say, the best way out would be having some kind of > UrlPatternSet class which can be more intelligent about lookups. I've > had some thoughts on that noted down somewhere already, but I can't find > them right now. Basically, you could hash patterns by their hosts, and > then if you want to check whether foobar.example.com is contained, make > some O(1) lookups for: > > - foobar.example.com > - *.example.com > - *.com > - * > > However, ensuring that this works efficiently for all imaginable cases > probably requires some more thought. > > I'd instead propose the following: > > - Add a UrlPatternSet class which we can extend like explained above > over time. > - For now, just add a special case for patterns in the form of > *://example.com/* (all schemes, constant host, all paths), by looking > them up in a Python set. For all patterns which are more complex than > that, fall back to iterating. > > That solution should be quite straightforward for now, and result in a > huge speedup for the common case of adblocking lists. > > What do you think? Sounds like an excellent plan. I'll implement it the way you described. After that I'll try to investigate and find a better way to speed it up. -- Best regards, Viacheslav Chimishuk vchimis...@yandex.ru Ukraine, Khmelnitsky