Hi all, recently I've been at a conference [1] where I attended an interesting keynote about data management [2] (I think it refers to this 2016 paper [3]).
Apart from the approaches proposed to solve the data management problem (e.g. get rid of DBMSs!) I got interested in the discussion about how we deal with the increasing amount of data that we have to manage (also because of some issues we have [4]). In many systems only a very small subset of the data is used because the amount of information users really need refers only to most recently ingested data (e.g. social networks); while that doesn't always apply for content repositories in general (e.g. if you build a CMS on top of it) I think it's interesting to think about whether we can optimize our persistence layer to work better with highly used data (e.g. more recent) and use less space/cpu for data that is used more rarely. For example, putting this together with the incremental indexing section of the paper [3] I was thinking (but that's already a solution rather than "just" a discussion) perhaps we could simply avoid indexing *some* content until it's needed (e.g. the first time you get traversal, then index so that next query over same data will be faster) but that's just an example. What do others think ? Regards, Tommaso [1] : http://www.iccs-meeting.org/iccs2017/ [2] : http://www.iccs-meeting.org/iccs2017/keynote-lectures/#Ailamaki [3] : https://infoscience.epfl.ch/record/219993/files/p12-pavlovic.pdf [4] : https://issues.apache.org/jira/browse/OAK-5192
