Hi all,
I'm sure someone else has done this so I figure I'd try to see if anyone
wants to share.
I'm creating a social networking type of site which has some pretty big
scaling/feature requirements and would like to evaluate Riak as a candidate
for use as a storage engine.
For scale, I'm looking at
- 50 million users
- 500,000 users online at any given time (1 million unique visitors/day)
- hundreds of millions of events (messages, likes, pictures, etc...)
Features
- Friends
- Comments
- Likes (pictures, videos, various profile entities)
- Pictures
- Videos
- Suggested matches
- More like this
- Lots of searching options
- Messaging
- etc...
Pseudo off-the-cuff naive model to get discussion going:
Profiles
Key: userid
Value: { handle: 'hammer', tagline: {content: 'please hammer dont hurt em',
likes: [{timestamp: 123123123, userid: 213123}, ...], comments: [timestamp:
123123123, userid: 213123, message: 'blabbity blah', ...]}, age: 36,
latitude: 42.2, longitude: 57.3, geohash: ezs42, ... }
Pics
Key: pictureid
Value: { userid: 12323, visibility: 'private', allowed: [121314, 322342,
1241241], likes: [{timestamp: 123123123, userid: 213123},
...] , {comments: [{timestamp: 123123123, userid: 213123, comment: 'rad
pic d00d'}, ...]}
Friends
Key: friendshipid
Value: {from: 123123, to: 5243, timestamp: 252151, status: 'pending',
last_updated: 232342}
MessageThreads
Key: messageid
Value: {to: 12321, from: 5212, content: 'yeah!', status: 'read', replies:
[{from: 12321, timestamp: 1412, content: 'asdf lorem ipsom', status:
'unread'}, ...]}
Timeline
Key: eventid
Value: {to: 12412, from: 5212, eventtype: 'friend request', timestamp:
124112}
Sample queries:
Get all picture metadata for userid 12142
Get all unread messages for user 5212
Get all messages to user 5212 from user 12312
Who are user 2521's friends?
New users since date X aged less than Y <possibly lots of ANDs here>
--
I'm open to suggestions here on pretty much everything (model, Riak
suitability/functionality, conflict resolution strategy, geo search
possibilities, etc...)
I'm thinking that all generated ids should use something like snowflake to
get rough time ordering capabilities on keys (and possibly be able to
exploit this for key-filters).
Anyone have suggestions or recommendations? War stories? I've looked at the
Kiip/Voxer/Clipboard presentations and I'm wondering what the current state
of Riak is. Should I start thinking about implementing the tracking of
indices myself with all the ugliness that entails?
Also, I'm not opposed to running an ElasticSearch cluster if Search/2i
doesn't fit the bill, however the more functionality I can get from one
system the better (less chances of indices getting out of sync, etc...)
We're obviously going to be done all the usual scaling techniques as part
of this system (edge side includes with varnish, application caching,
etc...) however I do expect a pretty big cluster to emerge from this.
Cheers
Mark
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com