Hi, your mail got me thinking about a general answer.
I think a good answer would be: all data that are only usefull for a specific time AND are possibly generated infinitely for a finite number of users should have a ttl. OR when the space is very small compared to the number of users. An example are e.g. cookies. A single user generates a handfull of cookie events per day. Let's just look at the generation of a session. Perhaps once a day. So for a number of finite users and finite number of data per user the number of cookies would grow and grow by day. Without any usefull purpose (under the assumption that you use such a cookie system with a session that expires). Another example would be password reset attempts or something like that in a web app. This events should expire after a number of days and should be deleted after a longer time (to say that the attempt is "out of date" or something like that there should be 2 different "expiration times"). Without that the password reset attempts would be just old junk in your db. Or you would have to make MR jobs to clean the db on a regular basis. An example could also be a aggregation service, where a user can make a list of things to be saved that are generated elsewhere (e.g. news headlines). A finite number of users would generate infinite number of rows just by waiting. So you could make policy where only the last 30 days are aggregated. And this could be implemented by a ttl. A further example would be a mechanism to prevent brute force attacks where you save the last attempts, and if a user has more than N attempts in M seconds the attempt fails. This could be implemented by a column family "attempts", where the last attempts are saved. If it's larger than N => fail. And when you set the TTL to M seconds, you are ready to go. An example for the second use case (finite space for large number of users) would be a service that serves files for fast and easy sharing between the users. Paid by ads. Thus you have a large user base, but very small space. An example would be "one click hosting" or something like that, where the users use the files perhaps a week, and the forget anything about it. So in your policy there could be something like "expire after 30 days after last use" which you can implement just by a ttl and without MR jobs. All this example come from the usage of hbase for the implementation of user driven systems. Web apps or something like that. However, it should be easy to find examples for more general applications of hbase. I once read a question from a hbase user, which had the problem that the logging (which was saved in the hbase) went to large, and he only wants to save the last N days and asked for help for implemeneting a MR job which regularly kicks older logging messages. A ttl and he was good to go ;). Hope this helped. Best wishes Wilm Am 26.09.2014 um 17:20 schrieb yonghu: > Hello, > > Can anyone give me a concrete use case for ttl deletions? I mean in which > situation we should set ttl property? > > regards! > > Yong >
