OK, this is really just a sounding board for a couple ideas I'm mulling
over regarding a pseudo-randomisation system for some websites I'm
doing. Any thoughts on the subject greatly appreciated!
We have a system that lists things. The things are broken down by
category, but you can still end up at the leaf of the category with a
couple hundred things to list, which is done via a pagination system
(lets say 50 per page).
Now, the people who own the things pay to have their things on the site.
Lets say there are three levels of option for listing: gold, silver,
bronze. The default order is gold things, silver things then bronze
things. Within each level, the things are listed alphabetically (again
this is just the default).
Now if 100 things in one category have a gold level listing, those in
the second half of the alphabet will be on page two by default. They
don't like this and they question why they are paying for gold at all.
My client would like to present things in a more random way to give all
gold level things a chance to be on the first page of results in a
fairer way than just what they happen to be named.
Right that's the back story. It's more complex than that, but the above
is a nice and simple abstraction.
There are numerous problems to randomised listings: you can't actually
truly randomise results otherwise pagination breaks. Server-side
caching/denationalisation is affected as there is no longer "one
listing" but "many random listings". Discussing a link with a friend
over IM or email and saying things like "the third one down looks best"
is obviously broken too, but this is something my client accepts and can
live with. Also, if the intention is to reassure the thing owners that
their listing will appear further up the listings at times, the fact
that a simple refresh will not reorder things for a given session will
make that point harder to get across to less web-educated clients
(that's a nice way of saying it!). Caching proxies and other similar
things after the webserver will also come into play.
So to me there are only really two options:
1. Random-per user (or session): Each user session gets some kind of
randomisation key and a fresh set of random numbers is generated for
each thing. They can then be reliably "randomised" for a given user. The
fact that each user has their own unique randomisation is good, but it
doesn't help things like server side full page caching and thus more
"work" needs to be done to support this approach.
2. Random-bank + user/session assignment: So with this approach we have
a simple table of numbers. First column is an id and is sequential form
1 to <very big number>. This table has lots of columns: say 32. These
columns will store a random number. Once generated, this table acts as
an orderer. It can be joined into our thing lookup query and the results
can be ordered by one of the columns. Which column to use for ordering
is picked by a cookie stored on the users machine. That way the user
will always get the same random result, even if they revisit the site
some time later (users not accepting cookies is not a huge deal, but I
would suggest the "pick a random column" algorithm (used to set the
cookie initially) is actually based on source IP address. That way even
cookieless folks should get a consistent listing unless their change
I'm obviously leaning towards the second approach. If I have 32
"pre-randomised" columns, this would get a pretty good end result I
think. If we re-randomise periodically (i.e. once a week or month) then
this can be extended further (or simply more columns can be added).
I think it's the lowest impact but there are sill some concerns:
Server side caching is still problematic. Instead of storing one page
per "result" I now have to store 32. This will lower significantly the
cache hits and perhaps make full result caching somewhat redundant. If
that is the case, then so be it, but load will have to be managed.
So my question for the lazy-web:
Are there any other approaches I've missed? Is there some cunning,
cleverness that eludes me?
Are there any problems with the above approach? Would a caching proxy
ultimately cause problems for some users (i.e. storing a cache for page
1 and page 2 of the same listing but with different randomisations)? And
if so can this be mitigated?
Thanks for reading and any insights you may have!
Tribalogic Limited [http://www.tribalogic.net/]
Mandriva Linux Contributor [http://www.mandriva.com/]
PulseAudio Hacker [http://www.pulseaudio.org/]
Trac Hacker [http://trac.edgewall.org/]
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php