https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

John Erling Blad <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |

--- Comment #15 from John Erling Blad <[email protected]> 2010-07-13 01:02:00 UTC 
---
A solution to avoid changing the database schema is to add a digest function to
the API, possibly allowing several digest functions. If this is used for
identifying distinct versions within a returned set of revisions with similar
sizes the digest can be very simple. We don't have to create 2¹²⁸ -ish possible
values, 2⁸ -ish values is more than enough if combined with the size. 

The API function must get the content for the revisions from the database, but
only the digest is transfered. The database request will be heavy but serving
the request to the client will not. For most uses there will never be a need to
calculate the more heavy hash functions.

Something like "rvdigest=pearson", perhaps also using a variant of FNV-1 or
even SHA1 or AES. It could also be interesting to use sone kind of locality
sensitive hashing [1] to make similar systems as described in[2][3].

The computational heavier methods could be given a maximum number of revisions
as a hard limit for each request, forcing the tool developer to choose the
computational easy methods if they suffice for the purpose.

[1] http://en.wikipedia.org/wiki/Locality_sensitive_hashing
[2] http://www.grouplens.org/node/427
[3] http://doi.acm.org/10.1145/985692.985765

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to