Hi Sylvain, Thank you very much! I still have some further questions, I didn't find how row cache is being configured? Regarding the splitting of rows, I understand that it is not so necessary, still I am curious whether it is implementable by the client code. Best regards, Daniel.
2010/4/28 Sylvain Lebresne <sylv...@yakaz.com> > 2010/4/28 Даниел Симеонов <dsimeo...@gmail.com>: > > Hi, > > I have a question about if a row in a Column Family has only columns > > whether all of the columns are deserialized in memory if you need any of > > them? As I understood it is the case, > > No it's not. Only the columns you request are deserialized in memory. The > only > thing is that, as of now, during compaction the entire row will be > deserialize at > once. So it just have to still fit in memory. But depending of the > typical size of > your column, you can easily millions of columns in a row without it > being a problem > at all. > > > and if the Column Family is super > > Column Family, then only the Super Column (entire) is brought up in > memory? > > Yes, that part is true. That is the problem with the current > implementation of super > columns. While you can have lots of column in one row, you probably > don't want to > have lots of columns in one super column (but it's no problem to have > lots of super > column in one row). > > > What about row cache, is it different than memtable? > > Be careful with row cache. If row cache is enable, then yes, any read > in a row will read > the entire row. So you typically don't want to use row cache in column > family where rows > have lots of columns (unless you always read all the columns in the > row each time of > course). > > > I have another one question, let's say there is only data to be inserted > and > > a solution to it is to have columns to be added to rows in Column Family, > is > > it possible in Cassandra to split the row if certain threshold is > reached, > > say 100 columns per row, what if there are concurrent inserts? > > No, cassandra can't do that for you. But you should be okay with what > you describe > below. That is, if a given row corresponds to an hour of data, it will > limit it's size. > And again, the number of column in a row is not really limited as long as > the > overall size of the row fits easily in memory. > > > The original data model and use case is to insert timestamped data and to > > make range queries. The original keys of CF rows were in the form of > > <id>.<timestamp> and then a single column with data, OPP was used. This > is > > not an optimal solution, since nodes are hotter than others, I am > thinking > > of changing the model in the way to have keys like <id>.<year/month/day> > and > > then a list of columns with timestamps within this range and > > RandomPartitioner or using OPP but preprocess part of the key with MD5, > i.e. > > the key is MD5(<id>.<year/month/day>) + "hour of the day" . Just the > problem > > is how to deal with large number of columns being inserted in a > particular > > row. > > Thank you very much! > > Best regards, Daniel. >