Hi, What about if the upper bound of columns in a row is loosely defined, i.e. it is ok that we have maximum of around 100 for example, but not exactly (maybe 105, 110)? What if I make a slice query to return say 1/5th of the columns in a row, I believe that such query again will not deserialize all columns in memory? Best regards, Daniel.
2010/4/28 Sylvain Lebresne <sylv...@yakaz.com> > 2010/4/28 Даниел Симеонов <dsimeo...@gmail.com>: > > Hi Sylvain, > > Thank you very much! I still have some further questions, I didn't find > > how row cache is being configured? > > Provided you don't use trunk but something stable like 0.6.1 (which > you should), > it is in storage-conf.xml. It's one option of the definition of the > column families (it > is documented in the file). > > > Regarding the splitting of rows, I > > understand that it is not so necessary, still I am curious whether it is > > implementable by the client code. > > Well, I'm not sure there is any simple way to do it (at least not > efficiently). Counting > the number of columns in a row is expensive plus there is no easy way > to implement > counter in cassandra (even though > https://issues.apache.org/jira/browse/CASSANDRA-580 > will make that better someday). > > > Best regards, Daniel. > > > > 2010/4/28 Sylvain Lebresne <sylv...@yakaz.com> > >> > >> 2010/4/28 Даниел Симеонов <dsimeo...@gmail.com>: > >> > Hi, > >> > I have a question about if a row in a Column Family has only > columns > >> > whether all of the columns are deserialized in memory if you need any > of > >> > them? As I understood it is the case, > >> > >> No it's not. Only the columns you request are deserialized in memory. > The > >> only > >> thing is that, as of now, during compaction the entire row will be > >> deserialize at > >> once. So it just have to still fit in memory. But depending of the > >> typical size of > >> your column, you can easily millions of columns in a row without it > >> being a problem > >> at all. > >> > >> > and if the Column Family is super > >> > Column Family, then only the Super Column (entire) is brought up in > >> > memory? > >> > >> Yes, that part is true. That is the problem with the current > >> implementation of super > >> columns. While you can have lots of column in one row, you probably > >> don't want to > >> have lots of columns in one super column (but it's no problem to have > >> lots of super > >> column in one row). > >> > >> > What about row cache, is it different than memtable? > >> > >> Be careful with row cache. If row cache is enable, then yes, any read > >> in a row will read > >> the entire row. So you typically don't want to use row cache in column > >> family where rows > >> have lots of columns (unless you always read all the columns in the > >> row each time of > >> course). > >> > >> > I have another one question, let's say there is only data to be > inserted > >> > and > >> > a solution to it is to have columns to be added to rows in Column > >> > Family, is > >> > it possible in Cassandra to split the row if certain threshold is > >> > reached, > >> > say 100 columns per row, what if there are concurrent inserts? > >> > >> No, cassandra can't do that for you. But you should be okay with what > >> you describe > >> below. That is, if a given row corresponds to an hour of data, it will > >> limit it's size. > >> And again, the number of column in a row is not really limited as long > as > >> the > >> overall size of the row fits easily in memory. > >> > >> > The original data model and use case is to insert timestamped data and > >> > to > >> > make range queries. The original keys of CF rows were in the form of > >> > <id>.<timestamp> and then a single column with data, OPP was used. > This > >> > is > >> > not an optimal solution, since nodes are hotter than others, I am > >> > thinking > >> > of changing the model in the way to have keys like > <id>.<year/month/day> > >> > and > >> > then a list of columns with timestamps within this range and > >> > RandomPartitioner or using OPP but preprocess part of the key with > MD5, > >> > i.e. > >> > the key is MD5(<id>.<year/month/day>) + "hour of the day" . Just the > >> > problem > >> > is how to deal with large number of columns being inserted in a > >> > particular > >> > row. > >> > Thank you very much! > >> > Best regards, Daniel. > > > > >