Hello Heiner, Looks like a very interesting project. Would love to hear how it turns out. Do let us know what you end up deploying. Here are some of my thoughts.
Between secondary indexes and/or search you should not have a problem retrieving a "blip page" of data. Certainly do not use the bitcask backend due to in memory key index. You would definitely need to use leveldb. Data bloat. Your 200B will triple on disk because Riak writes ~400B of data for itself per key (not exactly certain of the numbers in the post 1.0 branch). For this reason alone I'm not sure Riak is optimal for collection of large quantities of small values. The sweetspot is in the multi KB range where the overhead becomes some smaller % of total data. Ring state changes late in your collection cycle will result in a significant amount of data rebalancing which will obviously take a while. I would overdeploy more nodes early on and replace them in-place, over time to take advantage of newer hardware. I feel that although a lot of great work has already been done, there is still more work to be done on the rebalancing mechanics and algorithm to make this less of a pain point for larger systems. I would consider evaluating Cassandra or HBase for this project as well. Cheers, Alexander @siculars on twitter http://siculars.posterous.com Sent from my iRotaryPhone On Feb 6, 2012, at 5:22, Heiner Bunjes <[email protected]> wrote: > I need a database to log and retrieve sensor data. > > Is riak the right solution for this task and if, how? > If not, which other DB system might be a better fit? > > I know that before the implementation of secondary indices riak probably was > not suitable for this kind of task. I just hope it is now. > > > > The details are as follows: > > ######## <requirements version="3"> > > Glossary > > - Node = A computer on which an instance of the database > is running > > - Blip = one data record send by a sensor > > - Blip page = The sorted list of all blips for a specific sensor > and a specific time range. > > > The scale is as follows: > > (01) 10E6 sensors deliver 1 blip every 100 seconds > -> Insert rate = 10 kiloblip/s > -> Insert rate ~ 315 gigablip/Year > > (02) They have to be stored for ~3 years > -> Size of database = 1 terablip > > (03) Each blip has about 200 bytes > -> Size of database = 200TB > > (04) The system will start with just 10E4 sensors but will > soon increase upto the described volume. > > > The main operations on the data are: > > (05) Add the new blips to the database > (written blips are never changed)! > > (06) Return all blips for sensor X with a timestamp > between timestamp_a and timestamp_b! > With other words: Return a blip page. > > (07) Return all the blips specified in (06) ordered > by timestamp! > > (08) Delete all blips older than Y! > > > Further the following is true: > > (09) Each added blip is clearly (without ambiguity) identifiable by > sensor_id+timestamp. > > (10) 99.9% of the blips are inserted in > chronological order, the rest is not. > > (11) The database system MUST be free and open source. > > (12) The DB SHOULD be easy to administrate. > > (13) All data MUST still be writable and readable while less > then the configurable number N of nodes are down (unexpectedly). > > (14) The mechanisms to distribute the data to the available > nodes SHOULD be handled by the database. > This means that the database SHOULD automatically > redistribute the data when nodes are added or removed. > > ######## </requirements> > > > Many thanks in advance > Heiner > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
