> Actually the person who is posting will not be restricted by any limit on > the kind of topics he could post on. He may even post on the topics beyond > the list of what he himself is following. Thus this list cannot be defined > from earlier and since the topics list would comprise in hundreds perhaps > hence it would not be possible to implement this here. > Actually the database would know defined & limited list of the topics user > is following but not any restrictions on what he could post on. > > Even if you think alternatively "putting all the topics that his followers > are following" as categories for splitting the rows of users followers > according to topics this would be ways toooo.o much of denormalizing. > Although it may reduce the pain in frequent operations but may increase too > much pain somewhere else. >
Yes, I was describing: For every user, create one row for every topic and populate each of those with that user's followers who are interested in the topic. Note that if a user does not have any followers who are interested in a topic, that row will have no columns, so the row won't exist. The amount of denormalization is not excessive here. I would guess that you have to store roughly 10x as much information about followers, and followers are a small amount of data compared to posts and timelines. > What do you think about the JSON encoded columns(that contain list of topic > tags & corresponding postID) as I referred above, although this does put > some pressure on reads but still sounds quite (ok?). Let me know your > views. > The method you described is not bad, but it does have some downsides. First, you will potentially be appending posts to users' timelines who are not interested in the topic. As you say, you will have to resolve this at read-time by checking the user's interested topics and filtering the timeline. This means "getting the last 10 posts" may take more than one read if you don't get more than ten posts after filtering (supposing you get >10 posts from the timeline). You will also have to read the user's list of interested topics. Second, doing a per-topic timeline becomes painful either at the time of post creation or when reading the user's timeline. You always want to denormalize and write more data if it means you can make fewer reads elsewhere (within reasonable limits). Remember, writes are 10x faster than reads and disk space is cheap. - Tyler