Re: RE : Re: RE : Re: Two dimensional matrices
I'm confused : don't range queries such as the ones we've been discussing require using an orderedpartitionner ? Alright, so distribution depends on your choice of token. Ah yes, I get it now : with a naive orderedpartitioner, the key is associated with the node whose token is the closest numerically-wise and that is where the master replica is located. Yes ? Now let's assume I am using super columns as {X} and columns as {timeFrame}. In time each row will grow very large because X can (very sparsly) go to 2^28 i) does cassandra load all columns everytime it reads a row ? Same question for super column ii) Similarly does it cache all columns in memory ? Now some order of magnitudes, let's say a row is about 20KB and the cluster is running smoothly on low-end servers. There are millions of rows per node. i) If I were to only issue gets on the key, what is the order of magnitude I can expect to reach : 10/s, 100/s, 1000/s or 10.000/s ? ii) If I were to issue a slice on just the keys, does cassandra optimize the gets or does it run every get on the server and then concatenate to send to the client ? iii) is slicing on the columns going to improve the time to get the data on the server side or does it just cut down on network traffic ? Thanks Philippe
RE : Re: Two dimensional matrices
Okay so if i switch columns and super columns in my model i get what i want don't i? Super column = x Column = time frame Now i can get 2d range extracts from the grid and every cell will contain all time frame data. Is this correct ? I suppose that if that becomes too much data to retrieve, i can put different time frames in different keyspaces ? Assuming this is all correct, what are the consequences of these design decisions in terms of partition tolerance and how data is balanced across nodes ? Thanks Philippe Le 13 avr. 2010 03:26, Eric Evans eev...@rackspace.com a écrit : On Tue, 2010-04-13 at 00:45 +0200, Philippe wrote: However, you are also saying there is... No, what I mean is that you can perform a slice that returns either sub-columns, or super columns. In the former, the column names you are slicing on are the sub-columns (X coords), in the latter it is super columns (time). So: On X coords, (same as my previous mail). get_range_slice( keyspaceName, ColumnParent(CFname, timeFrame), SlicePredicate( ... The columns attribute of the KeySlice structs returned will contain the sub-columns contained in timeFrame that match your predicate. On time. get_range_slice( keyspaceName, ColumnParent(CFname, null), SlicePredicate( slice_range=SliceRange(timeStart, timeEnd, false, colCount) ), ystart, yend, rowCount, consistencyLevel, ) The columns attribute of the KeySlice structs returned will contain the super columns that match the predicate. Each of these super columns will contain *all* of the sub-columns. -- Eric Evans eev...@rackspace.com
Re: Two dimensional matrices
On Mon, 2010-04-12 at 01:31 +0200, Philippe wrote: I have data that is two dimensional, time varying (think of a grid). At each cell of this grid,I store a binary array. My data model will be - single keyspace - key = {Y dimension} - super column family = {type of data represented in each cell} - super column = {time = week or month} - column ={X dimension} - value = { binary} Will I be able to retrieve all values from a rectangle from this grid in a single call to cassandra for given SCF and SC ? If I understand what you're asking, a rectangle (identified by X and Y coordinates for a time-frame), will boil down to a single column. There are certainly no problems with retrieving a single sub-column from a super column. Will the result associate each value with its key and column ? The result will be a column that contains the binary value, which you obtained by using key, column family, and super column name. So, yes. Does it matter if it's a single call performance wise ? Yes, if for no other reason than it requires another round-trip across the network. -- Eric Evans eev...@rackspace.com
Re: Two dimensional matrices
Eric, Dop, Thanks for your answers. If I understand what you're asking, a rectangle (identified by X and Y coordinates for a time-frame), will boil down to a single column. There are certainly no problems with retrieving a single sub-column from a super column. I realize I wasn't clear enough. Getting cell {x,y} is easy, I understand that. I am interested in getting a slice of that grid : all cells {x,y} where - x_min=x=x_max - y_min=y=y_max My understanding from the docs is that get_range_slices would do it but I would like confirmation. In fact, I believe this is the same question as Dop Sun is asking. Will the result associate each value with its key and column ? The result will be a column that contains the binary value, which you obtained by using key, column family, and super column name. So, yes. What about this new case ? Philippe
Re: Two dimensional matrices
On Mon, 2010-04-12 at 22:40 +0200, Philippe wrote: If I understand what you're asking, a rectangle (identified by X and Y coordinates for a time-frame), will boil down to a single column. There are certainly no problems with retrieving a single sub-column from a super column. I realize I wasn't clear enough. Getting cell {x,y} is easy, I understand that. I am interested in getting a slice of that grid : all cells {x,y} where - x_min=x=x_max - y_min=y=y_max My understanding from the docs is that get_range_slices would do it but I would like confirmation. In fact, I believe this is the same question as Dop Sun is asking. Alright, so assuming we're looking for a slice of the grid against a given time-frame, that would look something like: get_range_slice( keyspaceName, ColumnParent(CFname, timeFrame), SlicePredicate( slice_range=SliceRange(xstart, xend, false, colCount) ), ystart, yend, rowCount, consistencyLevel, ) Does that help? Will the result associate each value with its key and column ? The result will be a column that contains the binary value, which you obtained by using key, column family, and super column name. So, yes. What about this new case ? The result will be a collection of KeySlices representing the key and corresponding columns, (so yes). -- Eric Evans eev...@rackspace.com
Re: Two dimensional matrices
Alright, so assuming we're looking for a slice of the grid against a given time-frame, that would look something like: get_range_slice( keyspaceName, ColumnParent(CFname, timeFrame), SlicePredicate( slice_range=SliceRange(xstart, xend, false, colCount) ), ystart, yend, rowCount, consistencyLevel, ) Does that help? Yes it confirms my understanding, thanks. However, you are also saying there is no way to also take into account the timeFrame supercolumn in the same API call ? IE, it is not possible to get back a data structure keyed by 'key,supercolumn,column' hence y,x and timeframe which I can then process to my heart's delight ? Philippe
Re: Two dimensional matrices
On Tue, 2010-04-13 at 00:23 +0200, Philippe wrote: Alright, so assuming we're looking for a slice of the grid against a given time-frame, that would look something like: get_range_slice( keyspaceName, ColumnParent(CFname, timeFrame), SlicePredicate( slice_range=SliceRange(xstart, xend, false, colCount) ), ystart, yend, rowCount, consistencyLevel, ) Does that help? Yes it confirms my understanding, thanks. However, you are also saying there is no way to also take into account the timeFrame supercolumn in the same API call ? IE, it is not possible to get back a data structure keyed by 'key,supercolumn,column' hence y,x and timeframe which I can then process to my heart's delight ? If you're talking about constructing predicates to slice on both time *and* X coordinate, then no. You can omit the super column name from the ColumnParent and return a slice of super columns (by time period) complete with all contained sub-columns, but you can't have it both ways, no. -- Eric Evans eev...@rackspace.com
Re: Two dimensional matrices
However, you are also saying there is no way to also take into account the timeFrame supercolumn in the same API call ? IE, it is not possible to get back a data structure keyed by 'key,supercolumn,column' hence y,x and timeframe which I can then process to my heart's delight ? If you're talking about constructing predicates to slice on both time *and* X coordinate, then no. You can omit the super column name from the ColumnParent and return a slice of super columns (by time period) complete with all contained sub-columns, but you can't have it both ways, no. Eric, I'm trying to get my head around this... If I omit the super column name and do the query as you mentionned in your previous email, then you are saying it will return all columns corresponding to the column range of all super columns corresponding to the key range. This means it is possible to get a rectangular slice of the grid AND to get the third dimension which is time in my case, the only catch being that I cannot limit the amount of data retrieved in the 3rd dimension (timeframe). Is this correct ? Philippe
Re: Two dimensional matrices
On Tue, 2010-04-13 at 00:45 +0200, Philippe wrote: However, you are also saying there is no way to also take into account the timeFrame supercolumn in the same API call ? IE, it is not possible to get back a data structure keyed by 'key,supercolumn,column' hence y,x and timeframe which I can then process to my heart's delight ? If you're talking about constructing predicates to slice on both time *and* X coordinate, then no. You can omit the super column name from the ColumnParent and return a slice of super columns (by time period) complete with all contained sub-columns, but you can't have it both ways, no. Eric, I'm trying to get my head around this... If I omit the super column name and do the query as you mentionned in your previous email, then you are saying it will return all columns corresponding to the column range of all super columns corresponding to the key range. This means it is possible to get a rectangular slice of the grid AND to get the third dimension which is time in my case, the only catch being that I cannot limit the amount of data retrieved in the 3rd dimension (timeframe). Is this correct ? No, what I mean is that you can perform a slice that returns either sub-columns, or super columns. In the former, the column names you are slicing on are the sub-columns (X coords), in the latter it is super columns (time). So: On X coords, (same as my previous mail). get_range_slice( keyspaceName, ColumnParent(CFname, timeFrame), SlicePredicate( slice_range=SliceRange(xstart, xend, false, colCount) ), ystart, yend, rowCount, consistencyLevel, ) The columns attribute of the KeySlice structs returned will contain the sub-columns contained in timeFrame that match your predicate. On time. get_range_slice( keyspaceName, ColumnParent(CFname, null), SlicePredicate( slice_range=SliceRange(timeStart, timeEnd, false, colCount) ), ystart, yend, rowCount, consistencyLevel, ) The columns attribute of the KeySlice structs returned will contain the super columns that match the predicate. Each of these super columns will contain *all* of the sub-columns. -- Eric Evans eev...@rackspace.com
RE: Two dimensional matrices
I don't know whether I'm wrong or not (I'm also new to Cassandra). But looks like we only can query a single Super Column at a single query since these values are specified in the ColumnParent parameter. Which means that you only can query a single week or month (as your super column). From: Philippe [mailto:watche...@gmail.com] Sent: Monday, April 12, 2010 7:31 AM To: user@cassandra.apache.org Subject: Two dimensional matrices Hello, I would like to know if the following is indeed possible with Cassandra, from my understanding of key column slices it is but I am just beggining to get my head around Cassandra... I have data that is two dimensional, time varying (think of a grid). At each cell of this grid,I store a binary array. My data model will be * single keyspace * key = {Y dimension} * super column family = {type of data represented in each cell} * super column = {time = week or month} * column ={X dimension} * value = { binary} Will I be able to retrieve all values from a rectangle from this grid in a single call to cassandra for given SCF and SC ? Will the result associate each value with its key and column ? Does it matter if it's a single call performance wise ? Thanks Philippe