Re: RE : Re: RE : Re: Two dimensional matrices

2010-04-14 Thread Philippe
 I'm confused : don't range queries such as the ones we've been

  discussing require using an orderedpartitionner ?

 Alright, so distribution depends on your choice of token.

Ah yes, I get it now : with a naive orderedpartitioner, the key is
associated with the node whose token is the closest numerically-wise and
that is where the master replica is located. Yes ?

Now let's assume I am using super columns as {X} and columns as {timeFrame}.
In time each row will grow very large because X can (very sparsly) go to
2^28
i) does cassandra load all columns everytime it reads a row ? Same question
for super column
ii) Similarly does it cache all columns in memory ?

Now some order of magnitudes, let's say a row is about 20KB and the cluster
is running smoothly on low-end servers. There are millions of rows per node.
i) If I were to only issue gets on the key, what is the order of magnitude I
can expect to reach : 10/s, 100/s, 1000/s or 10.000/s ?
ii) If I were to issue a slice on just the keys, does cassandra optimize the
gets or does it run every get on the server and then concatenate to send to
the client ?
iii) is slicing on the columns going to improve the time to get the data on
the server side or does it just cut down on network traffic ?

Thanks
Philippe


RE : Re: Two dimensional matrices

2010-04-13 Thread Philippe
Okay so if i switch columns and super columns in my model i get what i want
don't i?

Super column = x
Column = time frame
Now i can get 2d range extracts from the grid and every cell will contain
all time frame data. Is this correct ?
I suppose that if that becomes too much data to retrieve, i can put
different time frames in different keyspaces ?

Assuming this is all correct, what are the consequences of these design
decisions in terms of partition tolerance and how data is balanced across
nodes ?

Thanks
Philippe

Le 13 avr. 2010 03:26, Eric Evans eev...@rackspace.com a écrit :

On Tue, 2010-04-13 at 00:45 +0200, Philippe wrote:
  However, you are also saying there is...
No, what I mean is that you can perform a slice that returns either
sub-columns, or super columns. In the former, the column names you are
slicing on are the sub-columns (X coords), in the latter it is super
columns (time). So:

On X coords, (same as my previous mail).


get_range_slice(
keyspaceName,
ColumnParent(CFname, timeFrame),
SlicePredicate(
...
The columns attribute of the KeySlice structs returned will contain
the sub-columns contained in timeFrame that match your predicate.

On time.

get_range_slice(
   keyspaceName,
   ColumnParent(CFname, null),
   SlicePredicate(
   slice_range=SliceRange(timeStart, timeEnd, false, colCount)
   ),
   ystart,
   yend,
   rowCount,
   consistencyLevel,
)

The columns attribute of the KeySlice structs returned will contain
the super columns that match the predicate. Each of these super columns
will contain *all* of the sub-columns.


-- 
Eric Evans
eev...@rackspace.com


Re: Two dimensional matrices

2010-04-12 Thread Eric Evans
On Mon, 2010-04-12 at 01:31 +0200, Philippe wrote:
 I have data that is two dimensional, time varying (think of a grid).
 At each
 cell of this grid,I store a binary array.
 My data model will be
 
- single keyspace
- key = {Y dimension}
- super column family = {type of data represented in each cell}
- super column  = {time = week or month}
- column ={X dimension}
- value = { binary}
 
 Will I be able to retrieve all values from a rectangle from this grid
 in a single call to cassandra for given SCF and SC ?

If I understand what you're asking, a rectangle (identified by X and Y
coordinates for a time-frame), will boil down to a single column. There
are certainly no problems with retrieving a single sub-column from a
super column.
 
 Will the result associate each value with its key and column ?

The result will be a column that contains the binary value, which you
obtained by using key, column family, and super column name. So, yes.

 Does it matter if it's a single call performance wise ? 

Yes, if for no other reason than it requires another round-trip across
the network.

-- 
Eric Evans
eev...@rackspace.com



Re: Two dimensional matrices

2010-04-12 Thread Philippe
Eric, Dop,
Thanks for your answers.

If I understand what you're asking, a rectangle (identified by X and Y
 coordinates for a time-frame), will boil down to a single column. There
 are certainly no problems with retrieving a single sub-column from a
 super column.

I realize I wasn't clear enough. Getting cell {x,y} is easy, I understand
that.
I am interested in getting a slice of that grid : all cells {x,y} where

   - x_min=x=x_max
   - y_min=y=y_max

My understanding from the docs is that get_range_slices would do it but I
would like confirmation.
In fact, I believe this is the same question as Dop Sun is asking.



  Will the result associate each value with its key and column ?
 The result will be a column that contains the binary value, which you
 obtained by using key, column family, and super column name. So, yes.

What about this new case ?

Philippe


Re: Two dimensional matrices

2010-04-12 Thread Eric Evans
On Mon, 2010-04-12 at 22:40 +0200, Philippe wrote:
 If I understand what you're asking, a rectangle (identified by X and Y
  coordinates for a time-frame), will boil down to a single column.
 There
  are certainly no problems with retrieving a single sub-column from a
  super column.
 
 I realize I wasn't clear enough. Getting cell {x,y} is easy, I
 understand
 that.
 I am interested in getting a slice of that grid : all cells {x,y}
 where
 
- x_min=x=x_max
- y_min=y=y_max
 
 My understanding from the docs is that get_range_slices would do it
 but I
 would like confirmation.
 In fact, I believe this is the same question as Dop Sun is asking.

Alright, so assuming we're looking for a slice of the grid against a
given time-frame, that would look something like:

get_range_slice(
keyspaceName,
ColumnParent(CFname, timeFrame),
SlicePredicate(
slice_range=SliceRange(xstart, xend, false, colCount)
),
ystart,
yend,
rowCount,
consistencyLevel,
)

Does that help?

   Will the result associate each value with its key and column ?
  The result will be a column that contains the binary value, which
 you
  obtained by using key, column family, and super column name. So,
 yes.
 
 What about this new case ? 

The result will be a collection of KeySlices representing the key and
corresponding columns, (so yes).

-- 
Eric Evans
eev...@rackspace.com



Re: Two dimensional matrices

2010-04-12 Thread Philippe

 Alright, so assuming we're looking for a slice of the grid against a
 given time-frame, that would look something like:

 get_range_slice(
keyspaceName,
ColumnParent(CFname, timeFrame),
SlicePredicate(
slice_range=SliceRange(xstart, xend, false, colCount)
),
ystart,
yend,
rowCount,
consistencyLevel,
 )

 Does that help?

Yes it confirms my understanding, thanks.

However, you are also saying there is no way to also take into account the
timeFrame supercolumn in the same API call ? IE, it is not possible to get
back a data structure keyed by 'key,supercolumn,column' hence y,x and
timeframe which I can then process to my heart's delight ?

Philippe


Re: Two dimensional matrices

2010-04-12 Thread Eric Evans
On Tue, 2010-04-13 at 00:23 +0200, Philippe wrote:
  Alright, so assuming we're looking for a slice of the grid against a
  given time-frame, that would look something like:
 
  get_range_slice(
 keyspaceName,
 ColumnParent(CFname, timeFrame),
 SlicePredicate(
 slice_range=SliceRange(xstart, xend, false, colCount)
 ),
 ystart,
 yend,
 rowCount,
 consistencyLevel,
  )
 
  Does that help?
 
 Yes it confirms my understanding, thanks.
 
 However, you are also saying there is no way to also take into account
 the timeFrame supercolumn in the same API call ? IE, it is not
 possible to get back a data structure keyed by
 'key,supercolumn,column' hence y,x and timeframe which I can then
 process to my heart's delight ? 

If you're talking about constructing predicates to slice on both time
*and* X coordinate, then no. You can omit the super column name from the
ColumnParent and return a slice of super columns (by time period)
complete with all contained sub-columns, but you can't have it both
ways, no.

-- 
Eric Evans
eev...@rackspace.com



Re: Two dimensional matrices

2010-04-12 Thread Philippe

  However, you are also saying there is no way to also take into account
  the timeFrame supercolumn in the same API call ? IE, it is not
  possible to get back a data structure keyed by
  'key,supercolumn,column' hence y,x and timeframe which I can then
  process to my heart's delight ?

 If you're talking about constructing predicates to slice on both time
 *and* X coordinate, then no. You can omit the super column name from the
 ColumnParent and return a slice of super columns (by time period)
 complete with all contained sub-columns, but you can't have it both
 ways, no.

Eric, I'm trying to get my head around this...

If I omit the super column name and do the query as you mentionned in your
previous email, then you are saying it will return all columns corresponding
to the column range of all super columns corresponding to the key range.
This means it is possible to get a rectangular slice of the grid AND to get
the third dimension which is time in my case, the only catch being that I
cannot limit the amount of data retrieved in the 3rd dimension (timeframe).

Is this correct ?

Philippe


Re: Two dimensional matrices

2010-04-12 Thread Eric Evans
On Tue, 2010-04-13 at 00:45 +0200, Philippe wrote:
  However, you are also saying there is no way to also take
 into account
  the timeFrame supercolumn in the same API call ? IE, it is
 not
  possible to get back a data structure keyed by
  'key,supercolumn,column' hence y,x and timeframe which I can
 then
  process to my heart's delight ?
 
 
 If you're talking about constructing predicates to slice on
 both time
 *and* X coordinate, then no. You can omit the super column
 name from the
 ColumnParent and return a slice of super columns (by time
 period)
 complete with all contained sub-columns, but you can't have it
 both
 ways, no.
 Eric, I'm trying to get my head around this...
 
 
 If I omit the super column name and do the query as you mentionned
 in your previous email, then you are saying it will return all columns
 corresponding to the column range of all super columns corresponding
 to the key range.
 This means it is possible to get a rectangular slice of the grid AND
 to get the third dimension which is time in my case, the only catch
 being that I cannot limit the amount of data retrieved in the 3rd
 dimension (timeframe).
 
 
 Is this correct ?

No, what I mean is that you can perform a slice that returns either
sub-columns, or super columns. In the former, the column names you are
slicing on are the sub-columns (X coords), in the latter it is super
columns (time). So:

On X coords, (same as my previous mail).

get_range_slice(
keyspaceName,
ColumnParent(CFname, timeFrame),
SlicePredicate(
slice_range=SliceRange(xstart, xend, false, colCount)
),
ystart,
yend,
rowCount,
consistencyLevel,
)

The columns attribute of the KeySlice structs returned will contain
the sub-columns contained in timeFrame that match your predicate.

On time.

get_range_slice(
keyspaceName,
ColumnParent(CFname, null),
SlicePredicate(
slice_range=SliceRange(timeStart, timeEnd, false, colCount)
),
ystart,
yend,
rowCount,
consistencyLevel,
)

The columns attribute of the KeySlice structs returned will contain
the super columns that match the predicate. Each of these super columns
will contain *all* of the sub-columns.

-- 
Eric Evans
eev...@rackspace.com



RE: Two dimensional matrices

2010-04-11 Thread Dop Sun
I don't know whether I'm wrong or not (I'm also new to Cassandra). But looks
like we only can query a single Super Column at a single query since these
values are specified in the ColumnParent parameter. Which means that you
only can query a single week or month (as your super column).

 

From: Philippe [mailto:watche...@gmail.com] 
Sent: Monday, April 12, 2010 7:31 AM
To: user@cassandra.apache.org
Subject: Two dimensional matrices

 

Hello,

 

I would like to know if the following is indeed possible with Cassandra,
from my understanding of key  column slices it is but I am just beggining
to get my head around Cassandra...

 

 

I have data that is two dimensional, time varying (think of a grid). At each
cell of this grid,I store a binary array.

My data model will be

*   single keyspace
*   key = {Y dimension}
*   super column family = {type of data represented in each cell}
*   super column  = {time = week or month}
*   column ={X dimension}
*   value = { binary}

Will I be able to retrieve all values from a rectangle from this grid in a
single call to cassandra for given SCF and SC ? Will the result associate
each value with its key and column ?

Does it matter if it's a single call performance wise ?

 

Thanks

Philippe