https://bugzilla.wikimedia.org/show_bug.cgi?id=21860
John Erling Blad <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|INVALID | --- Comment #15 from John Erling Blad <[email protected]> 2010-07-13 01:02:00 UTC --- A solution to avoid changing the database schema is to add a digest function to the API, possibly allowing several digest functions. If this is used for identifying distinct versions within a returned set of revisions with similar sizes the digest can be very simple. We don't have to create 2¹²⁸ -ish possible values, 2⁸ -ish values is more than enough if combined with the size. The API function must get the content for the revisions from the database, but only the digest is transfered. The database request will be heavy but serving the request to the client will not. For most uses there will never be a need to calculate the more heavy hash functions. Something like "rvdigest=pearson", perhaps also using a variant of FNV-1 or even SHA1 or AES. It could also be interesting to use sone kind of locality sensitive hashing [1] to make similar systems as described in[2][3]. The computational heavier methods could be given a maximum number of revisions as a hard limit for each request, forcing the tool developer to choose the computational easy methods if they suffice for the purpose. [1] http://en.wikipedia.org/wiki/Locality_sensitive_hashing [2] http://www.grouplens.org/node/427 [3] http://doi.acm.org/10.1145/985692.985765 -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
