My question is in context of a social network schema design I am thinking of following schema for storing a user's data that is required as he logs in & is led to his homepage:- (I aimed at a schema design such that through a single row read query all the data that would be required to put up the homepage of that user, is retreived.)
UserSuperColumnFamily: { // Column Family UserIDKey: {columns: MyName, MyEmail, MyCity,...etc supercolumns: MyFollowersList, MyFollowiesList, MyPostsIdKeysList, MyInterestsList, MyAlbumsIdKeysList, MyVideoIdKeysList, RecentNotificationsForUserList, MessagesReceivedList, MessagesSentList, AccountSettingsList, RecentSelfActivityList, UpdatesFromFollowiesList } } Thus user's newfeed would be generated using superColumn: UpdatesFromFollowiesList. But the UpdatesFromFollowiesList, would obviously contain only Id of the posts and not the entire post data. Questions: 1.) What could be the problems with this design, any improvements ? 2.) Would frequent & heavy overwrite operations/ row mutations (for example; when propagating the post updates for news-feed from some user to all his followies) which leads to rows ultimately being in several SSTables, will lead to degraded read performance ?? Is it suitable to use row Cache(too big row but all data required uptil user is logged in) If I do not use cache, it may be very expensive to pull the row each time a data is required for the given user since row would be in several sstables. How can I improve the read performance here The actual data of the posts from network would be retrieved using PostIdKey through subsequent read queries from columnFamily PostsSuperColumnFamily which would be like follows: PostsSuperColumnFamily:{ PostIdKey: { columns: PostOwnerId, PostBody supercolumns: TagsForPost {list of columns of all tags for the post}, PeopleWhoLikedThisPost {list of columns of UserIdKey of all the likers} } } Is this the best design to go with or are there any issues to consider here ? Thanks in anticipation of your valuable comments.!