Re: Data Modelling Suggestions

2012-08-26 Thread aaron morton
 Im finding that only the first component is used ….is this understanding 
 correct?
The result is correct. 

 to (end)component1=timestamp3,component2=123 
is less than 
 Timestamp3: 777

Example:

CREATE COLUMN FAMILY 
Foo
WITH 
key_validation_class = UTF8Type
AND 
comparator = 'CompositeType(IntegerType, IntegerType)'
AND 
default_validation_class = UTF8Type
;


set Foo['bar']['1:1'] = 'baz1';
set Foo['bar']['2:2'] = 'baz2';
set Foo['bar']['3:3'] = 'baz3';
set Foo['bar']['4:4'] = 'baz4';


aarons-MBP-2011:pycassa aaron$ ./pycassaShell -k dev
In [2]: FOO.get(bar)
Out[2]: OrderedDict([((1, 1), u'baz1'), ((2, 2), u'baz2'), ((3, 3), u'baz3'), 
((4, 4), u'baz4')])

In [6]: FOO.get(bar, column_start=(2,2))
Out[6]: OrderedDict([((2, 2), u'baz2'), ((3, 3), u'baz3'), ((4, 4), u'baz4')])

In [8]: FOO.get(bar, column_start=(2,2), column_finish=(3,3))
Out[8]: OrderedDict([((2, 2), u'baz2'), ((3, 3), u'baz3')])

In [9]: FOO.get(bar, column_start=(2,2), column_finish=(3,1))
Out[9]: OrderedDict([((2, 2), u'baz2')])

In [10]: FOO.get(bar, column_start=(2,), column_finish=(3,))
Out[10]: OrderedDict([((2, 2), u'baz2'), ((3, 3), u'baz3')])

 We see a lot of examples about Timeseries modelling ...

Sorry I do not understand this question. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 11:17 PM, Roshni Rajagopal roshni.rajago...@wal-mart.com 
wrote:

 Thank you Aaron  Guillermo,
 
 I find composite columns very confusing :(
 To reconfirm ,
 
 1.  we can only search for columns  range with the first component on the 
 composite column.
 2.  After specifying a range for the first component, we cannot further 
 filter for the second component.  I found this link 
 http://doanduyhai.wordpress.com/2012/07/05/apache-cassandra-tricks-and-traps/ 
  which seems to suggest filtering is possible by second component in addition 
 to first, and I tried the same example but I couldn't get it to work. Does 
 anyone have an example where suppose I have data like this in my column names
 
 Timestamp1: 123, Timestamp2: 456, Timestamp3: 777,Timestamp4: 654  ---get 
 range of columns for (start)component1 = timestamp1, component2=123 , to 
 (end)component1=timestamp3,component2=123  -- should give me only one column
 Im finding that only the first component is used ….is this understanding 
 correct?
 
 
 We see a lot of examples about Timeseries modelling with TimeUUID as column 
 names. But how is the updating or deletion of columns happening here, how are 
 the columns found to know which ones to delete or modify. Does one always 
 need a separate column family to handle updating/deletion for time series, or 
 is usually handled by setting TTL for data outside the archival period, or 
 does time series modelling usually not involve any manipulation of past 
 records?
 
 Regards,
 Roshni
 
 
 
 From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Data Modelling Suggestions
 
 I was trying to find hector examples where we search for second column in a 
 composite column, but I couldn't find any good one. Im not sure if its 
 possible.…if you have any do have any example please share.
 It's not. When slicing columns you can only return one contiguous range.
 
 Anyway I would prefer storing the item-ids as column names in the main column 
 family and having a second CF for the order-by-date query only with the pair 
 timestamp_itemid. That way you can add later other query strategies without 
 messing with how you store the item
 +1
 Have the orders somewhere, and build a time ordered custom index to show them 
 in order.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 24/08/2012, at 6:28 AM, Guillermo Winkler 
 gwink...@inconcertcc.commailto:gwink...@inconcertcc.com wrote:
 
 I think you need another CF as index.
 
 user_itemid - timestamped column_name
 
 Otherwise you can't guess what's the timestamp to use in the column name.
 
 Anyway I would prefer storing the item-ids as column names in the main column 
 family and having a second CF for the order-by-date query only with the pair 
 timestamp_itemid. That way you can add later other query strategies without 
 messing with how you store the item information.
 
 Maybe you can solve it with a secondary index by timestamp too.
 
 Guille
 
 
 On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal 
 roshni.rajago...@wal-mart.commailto:roshni.rajago...@wal-mart.com wrote:
 Hi,
 
 Need some help on a data modelling question. We're using Hector  Datastax 
 Enterprise 2.1.
 
 
 I want to associate a list of items for a user. It should be sorted on the 
 time added

Re: Data Modelling Suggestions

2012-08-24 Thread aaron morton
 I was trying to find hector examples where we search for second column in a 
 composite column, but I couldn't find any good one. Im not sure if its 
 possible.…if you have any do have any example please share.
It's not. When slicing columns you can only return one contiguous range. 

 Anyway I would prefer storing the item-ids as column names in the main column 
 family and having a second CF for the order-by-date query only with the pair 
 timestamp_itemid. That way you can add later other query strategies without 
 messing with how you store the item 
+1
Have the orders somewhere, and build a time ordered custom index to show them 
in order. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 6:28 AM, Guillermo Winkler gwink...@inconcertcc.com wrote:

 I think you need another CF as index.
 
 user_itemid - timestamped column_name
 
 Otherwise you can't guess what's the timestamp to use in the column name.
 
 Anyway I would prefer storing the item-ids as column names in the main column 
 family and having a second CF for the order-by-date query only with the pair 
 timestamp_itemid. That way you can add later other query strategies without 
 messing with how you store the item information.
 
 Maybe you can solve it with a secondary index by timestamp too.
 
 Guille
 
 
 On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal 
 roshni.rajago...@wal-mart.com wrote:
 Hi,
 
 Need some help on a data modelling question. We're using Hector  Datastax 
 Enterprise 2.1.
 
 
 I want to associate a list of items for a user. It should be sorted on the 
 time added. And items can be updated (quantity of the item can be changed), 
 and items can be deleted.
 I can model it like this so that its denormalized and I get all my 
 information in one go from one row, sorted by time added. I can use composite 
 columns.
 
 Row key: User Id
 Column Name: TimeUUID:item ID: Item Name: Item Description: Item Price: Item 
 Qty
 Column Value : Null
 
 Now, how do I handle manipulations
 
  1.  Add new item :Easy , just a new column
  2.  Add exiting item or modify qty: I want to get to the correct column to 
 update . Can I search by second column in the composite column (equals 
 condition)  update the column name itself to reflect new TimeUUID and qty?  
 Or would it be better to just add it as a new column and always use the 
 latest column for an item in the application code and delete duplicates in 
 the background.
  3.  Delete item: Can I search by second column in the composite column to 
 find the correct column to delete?
 
 I was trying to find hector examples where we search for second column in a 
 composite column, but I couldn't find any good one. Im not sure if its 
 possible.…if you have any do have any example please share.
 
 Regards,
 Roshni
 
 
 This email and any files transmitted with it are confidential and intended 
 solely for the individual or entity to whom they are addressed. If you have 
 received this email in error destroy it immediately. *** Walmart Confidential 
 ***
 



Re: Data Modelling Suggestions

2012-08-24 Thread Roshni Rajagopal
Thank you Aaron  Guillermo,

I find composite columns very confusing :(
To reconfirm ,

 1.  we can only search for columns  range with the first component on the 
composite column.
 2.  After specifying a range for the first component, we cannot further filter 
for the second component.  I found this link 
http://doanduyhai.wordpress.com/2012/07/05/apache-cassandra-tricks-and-traps/  
which seems to suggest filtering is possible by second component in addition to 
first, and I tried the same example but I couldn't get it to work. Does anyone 
have an example where suppose I have data like this in my column names

Timestamp1: 123, Timestamp2: 456, Timestamp3: 777,Timestamp4: 654  ---get range 
of columns for (start)component1 = timestamp1, component2=123 , to 
(end)component1=timestamp3,component2=123  -- should give me only one column
Im finding that only the first component is used ….is this understanding 
correct?


We see a lot of examples about Timeseries modelling with TimeUUID as column 
names. But how is the updating or deletion of columns happening here, how are 
the columns found to know which ones to delete or modify. Does one always need 
a separate column family to handle updating/deletion for time series, or is 
usually handled by setting TTL for data outside the archival period, or does 
time series modelling usually not involve any manipulation of past records?

Regards,
Roshni



From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Data Modelling Suggestions

I was trying to find hector examples where we search for second column in a 
composite column, but I couldn't find any good one. Im not sure if its 
possible.…if you have any do have any example please share.
It's not. When slicing columns you can only return one contiguous range.

Anyway I would prefer storing the item-ids as column names in the main column 
family and having a second CF for the order-by-date query only with the pair 
timestamp_itemid. That way you can add later other query strategies without 
messing with how you store the item
+1
Have the orders somewhere, and build a time ordered custom index to show them 
in order.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/08/2012, at 6:28 AM, Guillermo Winkler 
gwink...@inconcertcc.commailto:gwink...@inconcertcc.com wrote:

I think you need another CF as index.

user_itemid - timestamped column_name

Otherwise you can't guess what's the timestamp to use in the column name.

Anyway I would prefer storing the item-ids as column names in the main column 
family and having a second CF for the order-by-date query only with the pair 
timestamp_itemid. That way you can add later other query strategies without 
messing with how you store the item information.

Maybe you can solve it with a secondary index by timestamp too.

Guille


On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal 
roshni.rajago...@wal-mart.commailto:roshni.rajago...@wal-mart.com wrote:
Hi,

Need some help on a data modelling question. We're using Hector  Datastax 
Enterprise 2.1.


I want to associate a list of items for a user. It should be sorted on the time 
added. And items can be updated (quantity of the item can be changed), and 
items can be deleted.
I can model it like this so that its denormalized and I get all my information 
in one go from one row, sorted by time added. I can use composite columns.

Row key: User Id
Column Name: TimeUUID:item ID: Item Name: Item Description: Item Price: Item Qty
Column Value : Null

Now, how do I handle manipulations

 1.  Add new item :Easy , just a new column
 2.  Add exiting item or modify qty: I want to get to the correct column to 
update . Can I search by second column in the composite column (equals 
condition)  update the column name itself to reflect new TimeUUID and qty?  Or 
would it be better to just add it as a new column and always use the latest 
column for an item in the application code and delete duplicates in the 
background.
 3.  Delete item: Can I search by second column in the composite column to find 
the correct column to delete?

I was trying to find hector examples where we search for second column in a 
composite column, but I couldn't find any good one. Im not sure if its 
possible.…if you have any do have any example please share.

Regards,
Roshni


This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***


This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom

Re: Data Modelling Suggestions

2012-08-23 Thread Guillermo Winkler
I think you need another CF as index.

user_itemid - timestamped column_name

Otherwise you can't guess what's the timestamp to use in the column name.

Anyway I would prefer storing the item-ids as column names in the main
column family and having a second CF for the order-by-date query only with
the pair timestamp_itemid. That way you can add later other query
strategies without messing with how you store the item information.

Maybe you can solve it with a secondary index by timestamp too.

Guille


On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal 
roshni.rajago...@wal-mart.com wrote:

 Hi,

 Need some help on a data modelling question. We're using Hector  Datastax
 Enterprise 2.1.


 I want to associate a list of items for a user. It should be sorted on the
 time added. And items can be updated (quantity of the item can be changed),
 and items can be deleted.
 I can model it like this so that its denormalized and I get all my
 information in one go from one row, sorted by time added. I can use
 composite columns.

 Row key: User Id
 Column Name: TimeUUID:item ID: Item Name: Item Description: Item Price:
 Item Qty
 Column Value : Null

 Now, how do I handle manipulations

  1.  Add new item :Easy , just a new column
  2.  Add exiting item or modify qty: I want to get to the correct column
 to update . Can I search by second column in the composite column (equals
 condition)  update the column name itself to reflect new TimeUUID and qty?
  Or would it be better to just add it as a new column and always use the
 latest column for an item in the application code and delete duplicates in
 the background.
  3.  Delete item: Can I search by second column in the composite column to
 find the correct column to delete?

 I was trying to find hector examples where we search for second column in
 a composite column, but I couldn't find any good one. Im not sure if its
 possible.…if you have any do have any example please share.

 Regards,
 Roshni


 This email and any files transmitted with it are confidential and intended
 solely for the individual or entity to whom they are addressed. If you have
 received this email in error destroy it immediately. *** Walmart
 Confidential ***