Ryan, I went ahead and began modeling our data as you have suggested below. However, we just realized something with our compound key. We don't actually have access to the patient identifier at the level of the data collection that is being performed. What we do know is the bed #. We have a predetermined number of beds so I was thinking if there were better ways to model everything given this finite (and predetermined) set for the compound keys.
Given this, would it be better to have a different table for each bed (and just have the row key be the time stamp)? What are the downsides to having hundreds of different tables that have the same "schema" otherwise? Thanks! --Andrew -- Andrew Nguyen [email protected] The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain confidential or privileged information. Any unauthorized review, dissemination, distribution, or copying of this communication is prohibited. If you are not the intended recipient, please notify the sender immediately by reply e-mail, and destroy all copies of this message and any attachments from your files. On Apr 24, 2010, at 1:45 PM, Ryan Rawson wrote: > Hey, > > So in my case, timestamp wasnt unique, so I had to put in event id. > For timeseries systems, you of course wouldnt need to have an > additional id. So your first thought where you have: > <patient id><timestamp> > > then putting physiologic parameters in different columns (But the same > column family) sounds great to me. This is a good example of where > flexible schema is good, since you can store any number of parameters > per row, but only the ones you want. > > As for HBase and multi-datacenter, there is work underway by my > colleague JD to write a replication system. It's in the late stages > and we are hoping to get it into advanced testing soon. Practically > speaking you dont want to split your HDFS and HBase cluster across a > datacenter.
