Hi, yonghu I am not sure the following timestamp info. whether valuable for you, post it anyway.
So, I wonder > 1. If there are any predefined semantics of TS in HBase or the semantics of > TS is application-specific? > As I know, the timestamp is mainly used for 1. fetch order: from newest to oldest (biggest long to smallest long) 2. versioning: if you have t1, t2, t3, and t4 value, with HBase default versioning is 3, then you can fetch only t4, t3 and t2 3. time-to-live (ttl): Predicate deletion. A threshold based on the timestamp of a value and the internal housekeeping is checking automatically if a value exceeds its TTL. For more details, pls refer to http://hbase.apache.org/book/versions.html 2. Can anyone give any rules of how to assign TS for data versions which > belong to the same row? > I think you can refer to facebook's Inbox search case, http://www.slideshare.net/brizzzdotcom/facebook-messages-hbase FYI~ Best regards takeshi 2013/9/28 yonghu <[email protected]> > Hi, Ted > > Thanks for your response. This is also the way I use to avoid the problem. > > regards! > > Yong > > > On Sat, Sep 28, 2013 at 4:31 PM, Ted Yu <[email protected]> wrote: > > > Can you make NetworkSpeed as column family ? > > > > This way you can treat individual suppliers as columns within the column > > family. > > So for "user Tom has a new supplier d instead of supplier c and its speed > > is 15K": > > > > rk NetworkSpeed > > c d > > Tom {10K:1} > > Tom {15K:2} > > > > In the example above, the numbers after colon are TS. If the speed is > > unknown, you can store a special marker in the Cell. > > I used two rows, but as you said, the two Cells can be written using one > > RPC call. > > > > This way, NetworkSupplier column is not needed. > > > > Cheers > > > > > > On Fri, Sep 27, 2013 at 3:04 PM, yonghu <[email protected]> wrote: > > > > > To Ted, > > > > > > --"Can you tell me why readings corresponding to different timestamps > > would > > > appear in the same row ?" > > > > > > Is that mean the data versions which belong to the same row should at > > least > > > have the same timestamps? > > > > > > For adding a row into HBase, I can use single Put instance, for > example, > > > Put put = new Put("tom") and put.addColumn("Network:Supplier","c" ), > > > put.addColmn("Network:Supplier","d"). And hence the data versions will > > have > > > the same TS. > > > > > > However, I can also use multiple Put instances, each Put instance for > > > single data version. For example, Put put1 = new Put1("tom"), > > > put1.addaddColumn("Network:Supplier","c" ). Put put2 = new Put2("tom"), > > > put2.addaddColumn("Network:Supplier","d" ). In this situation, each > data > > > version which belongs to the same row will have different TSs even if > > > logically they should have the same TSs. This situation can happen > when I > > > first know the name of network supplier and later get the speed of > > > supplier. > > > > > > To lars, > > > > > > --"You have a single row with two columns?" > > > > > > This is just an example for discussion. I had a heavy discussion with > the > > > other person about how to understand the right data representation and > > the > > > semantics of TS in HBase. Your explanation is one possible scenario > which > > > means "user Tom has a new supplier d instead of supplier c and its > speed > > is > > > 15K". > > > However, it is possible that "user Tom has both suppliers c and d and > 15K > > > may belong to supplier c, as the speed of supplier d is not tested > yet." > > > The second understanding is very tricky and if it happened, we need to > > > redesign the schema of database. > > > > > > So, I wonder > > > 1. If there are any predefined semantics of TS in HBase or the > semantics > > of > > > TS is application-specific? > > > 2. Can anyone give any rules of how to assign TS for data versions > which > > > belong to the same row? > > > > > > regards! > > > > > > Yong > > > > > > > > > > > > > > > > > > On Fri, Sep 27, 2013 at 7:02 PM, lars hofhansl <[email protected]> > wrote: > > > > > > > Not sure I follow. > > > > You have a single row with two columns? > > > > In your scenario you'd see that supplier c has 15k iff you query the > > > > latest data, which seems to be what you want. > > > > Note that you could also query as of TS 4 (c:20k), TS3 (d:20k), TS2 > > > (d:10k) > > > > > > > > > > > > -- Lars > > > > > > > > > > > > > > > > ________________________________ > > > > From: yonghu <[email protected]> > > > > To: [email protected] > > > > Sent: Friday, September 27, 2013 7:24 AM > > > > Subject: How to understand the TS of each data version? > > > > > > > > > > > > Hello, > > > > > > > > In my understanding, the timestamp of each data version is generated > by > > > Put > > > > command. The value of TS is either indicated by user or assigned by > > HBase > > > > itself. If the TS is generated by HBase, it only records when (the > time > > > > point) that data version is generated (Have no meaning to the > > > application). > > > > However, if TS is indicated by user, it may have a specific meaning > to > > > > applications. The reason why I want to ask this question is: How can > I > > > > correctly understand the meaning of following data? Suppose I have a > > > table > > > > which is used to record the internet speed of different suppliers for > > > > specific users. > > > > For example, > > > > > > > > rk Network:Supplier Network:speed > > > > > > > > Tom {d:1, c:4} {10K:1, 20K:3, 15K:5} > > > > > > > > Then I can have following different data information representations: > > > > > > > > 1. Supplier d have speeds 10K and 20K. Supplier c have 15K. > > > > 2. Supplier d have speeds 10K, 20K and 15K. We only insert the > > supplier c > > > > but has not inserted any speed information. > > > > > > > > which one is the right understanding? Anyone knows whether there are > > any > > > > predefined semantics of TS in HBase? > > > > > > > > regards! > > > > > > > > Yong > > > > > > > > > >
