On Nov 10, 2008, at 11:12 AM, Brett Wilson wrote:

I was recently looking at the PageGroup and visited link coloring.
Chromium has some interesting requirements. Our design goal is to
store hundreds of thousands to a million URLs in the database with no
problems (basically all your history forever). We have multiple
processes so we can't just have a local list of visited pages in each
renderer process.

Our solution to the first problem is to have 64-bit hashes (with 1
million visited links, you would get too many collisions using 32-bit
hashes like WebKit currently uses). Our solution to the second problem
is to have a dedicated multiprocess hash table. This dedicated system
manages its own hashing because we also have salting which must be in
sync through all processes.

WebKit recently changed around how visited link coloring worked. It
used to work call a global function historyContains() and this was
easy to integrate into our system, The new system passes 32-bit hashes
around and maintains a global list of visited pages in the PageGroup.
Neither of these will work with our system.

My current idea is to create a new file LinkHash which has a typedef
for the hash type (rather than using unsigned everywhere) so we can
define it to be 64-bits in PLATFORM(CHROMIUM) and it can remain
32-bits for other platforms (or they can change it if they like).

I think it would be better to just always use a 64-bit hash with salting for all ports (assuming that is not a significant performance hit - I would expect it isn't). I say this because:

1) WebKit in general supports keeping unlimited history, and Safari in particular has a non-default option to keep history forever. I don't think it is Chromium-specific to support such a requirement.

2) The visited link color spoofing you mention seems like a fairly serious bug which would apply to any port regardless of history size.


It also defines a visitedLinkHash function which is moved from Document.
I have a patch for this, and it's very clean. I think it improves
things even without our porting constraint since almost 200 lines got
moved out of Document. This is described in
https://bugs.webkit.org/show_bug.cgi?id=22131

Moving the function out of Document seems reasonable.

The more complicated part is in PageGroup, which seems to basically be
the visited link database.

It's more accurate to say it *has* a visited link database rather than that it *is* one.

I'm thinking of just providing a new
PageGroupChromium.cpp which contains a different implementation that
proxies these calls to our glue layer to be sent to our multiprocess
database.

However, I'm not sure what exactly the intent of PageGroup is. It's
clearly not intended that this be port-specific. Is there a cleaner
way to integrate our link database with the rest of WebKit?

PageGroup is supposed to represent a set of Page objects (essentially top level Web content holders) which should be considered as together forming a "browser". Since WebKit is designed to be a public API framework and to be used for purposes other than a browser, it is possible for a browser to show some Web content views that are not part of the user's browsing. So it would put those in a separate PageGroup (however that is reflected in that platform's API). PageGroup takes care of those things that we judged to belong at this level of granularity, rather than global or per-Page.

I don't know what the right way to integrate Chromium's visited link checking would be. Do you incur IPC for every link checked? Does it cache on the client side? Does it use shared memory?

Regards,
Maciej

_______________________________________________
webkit-dev mailing list
[email protected]
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Reply via email to