On Tue, Nov 27, 2012 at 12:21 PM, Roshan Punnoose <[email protected]> wrote:
> The <string> would most likely be a fixed set of strings that do not > change over time. > > My question is if it is bad to use a reverse index timestamp in the row > id? Will it cause problems with the tablet splitting, compaction, and > performance if the data is always being sent to the top of the tablet? If I > define a split as everything prefixed with <string>, then the ingest will > go to one tablet, but then I add a reverse timestamp in the row, and that > would mean I am always copying data to the top of the tablet. Will this > cause performance issues? Or is it better to append to a tablet? > I do not think it should matter. Inserts go into a C++ STL map on the tablet server if using the nativemap. I think the implementation of that is a balanced binary tree. So I do not think inserting at the beginning vs the end would make difference. That being said, I do not think I have tried this so I do not know if there would be any suprises. I would be interested in hearing about your experiences. > > > On Tue, Nov 27, 2012 at 11:51 AM, Keith Turner <[email protected]> wrote: > >> >> >> Keith >> >> On Tue, Nov 27, 2012 at 10:41 AM, Roshan Punnoose <[email protected]>wrote: >> >>> I want to have a table where the row will consist of "<string>-<reverse >>> index timestamp>". But this means that the data is always being prefixed to >>> the beginning of the row (or tablet if the row is large). Will this be a >>> problem for compaction or performance? >> >> >> Can you tell me more about what <string> is? For example is it a hash or >> does it come from the set "foo1","foo2","foo3". How does it change over >> time? I think the answer to your question depends on what <string> is. >> >> >>> >>> I don't know if I heard this correctly, but someone once mentioned that >>> making the row id the direct timestamp could cause performance issues >>> because data is always going to one tablet, but also because there is >>> trouble splitting since it always appends to the tablet. Is this true, is >>> it similar to what could happen if I am always prefixing to a tablet? >>> >> >> Yes using a timestamp for a row could cause data from many clients to >> always go to the same tablet, which would be bad for performance on a >> cluster. >> >> >>> >>> Thanks! >>> Roshan >>> >> >> >
