Re: Data Modelling Suggestions
Im finding that only the first component is used ….is this understanding correct? The result is correct. to (end)component1=timestamp3,component2=123 is less than Timestamp3: 777 Example: CREATE COLUMN FAMILY Foo WITH key_validation_class = UTF8Type AND comparator = 'CompositeType(IntegerType, IntegerType)' AND default_validation_class = UTF8Type ; set Foo['bar']['1:1'] = 'baz1'; set Foo['bar']['2:2'] = 'baz2'; set Foo['bar']['3:3'] = 'baz3'; set Foo['bar']['4:4'] = 'baz4'; aarons-MBP-2011:pycassa aaron$ ./pycassaShell -k dev In [2]: FOO.get(bar) Out[2]: OrderedDict([((1, 1), u'baz1'), ((2, 2), u'baz2'), ((3, 3), u'baz3'), ((4, 4), u'baz4')]) In [6]: FOO.get(bar, column_start=(2,2)) Out[6]: OrderedDict([((2, 2), u'baz2'), ((3, 3), u'baz3'), ((4, 4), u'baz4')]) In [8]: FOO.get(bar, column_start=(2,2), column_finish=(3,3)) Out[8]: OrderedDict([((2, 2), u'baz2'), ((3, 3), u'baz3')]) In [9]: FOO.get(bar, column_start=(2,2), column_finish=(3,1)) Out[9]: OrderedDict([((2, 2), u'baz2')]) In [10]: FOO.get(bar, column_start=(2,), column_finish=(3,)) Out[10]: OrderedDict([((2, 2), u'baz2'), ((3, 3), u'baz3')]) We see a lot of examples about Timeseries modelling ... Sorry I do not understand this question. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/08/2012, at 11:17 PM, Roshni Rajagopal roshni.rajago...@wal-mart.com wrote: Thank you Aaron Guillermo, I find composite columns very confusing :( To reconfirm , 1. we can only search for columns range with the first component on the composite column. 2. After specifying a range for the first component, we cannot further filter for the second component. I found this link http://doanduyhai.wordpress.com/2012/07/05/apache-cassandra-tricks-and-traps/ which seems to suggest filtering is possible by second component in addition to first, and I tried the same example but I couldn't get it to work. Does anyone have an example where suppose I have data like this in my column names Timestamp1: 123, Timestamp2: 456, Timestamp3: 777,Timestamp4: 654 ---get range of columns for (start)component1 = timestamp1, component2=123 , to (end)component1=timestamp3,component2=123 -- should give me only one column Im finding that only the first component is used ….is this understanding correct? We see a lot of examples about Timeseries modelling with TimeUUID as column names. But how is the updating or deletion of columns happening here, how are the columns found to know which ones to delete or modify. Does one always need a separate column family to handle updating/deletion for time series, or is usually handled by setting TTL for data outside the archival period, or does time series modelling usually not involve any manipulation of past records? Regards, Roshni From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Data Modelling Suggestions I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share. It's not. When slicing columns you can only return one contiguous range. Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item +1 Have the orders somewhere, and build a time ordered custom index to show them in order. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/08/2012, at 6:28 AM, Guillermo Winkler gwink...@inconcertcc.commailto:gwink...@inconcertcc.com wrote: I think you need another CF as index. user_itemid - timestamped column_name Otherwise you can't guess what's the timestamp to use in the column name. Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item information. Maybe you can solve it with a secondary index by timestamp too. Guille On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal roshni.rajago...@wal-mart.commailto:roshni.rajago...@wal-mart.com wrote: Hi, Need some help on a data modelling question. We're using Hector Datastax Enterprise 2.1. I want to associate a list of items for a user. It should be sorted on the time added
Re: Data Modelling Suggestions
I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share. It's not. When slicing columns you can only return one contiguous range. Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item +1 Have the orders somewhere, and build a time ordered custom index to show them in order. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/08/2012, at 6:28 AM, Guillermo Winkler gwink...@inconcertcc.com wrote: I think you need another CF as index. user_itemid - timestamped column_name Otherwise you can't guess what's the timestamp to use in the column name. Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item information. Maybe you can solve it with a secondary index by timestamp too. Guille On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal roshni.rajago...@wal-mart.com wrote: Hi, Need some help on a data modelling question. We're using Hector Datastax Enterprise 2.1. I want to associate a list of items for a user. It should be sorted on the time added. And items can be updated (quantity of the item can be changed), and items can be deleted. I can model it like this so that its denormalized and I get all my information in one go from one row, sorted by time added. I can use composite columns. Row key: User Id Column Name: TimeUUID:item ID: Item Name: Item Description: Item Price: Item Qty Column Value : Null Now, how do I handle manipulations 1. Add new item :Easy , just a new column 2. Add exiting item or modify qty: I want to get to the correct column to update . Can I search by second column in the composite column (equals condition) update the column name itself to reflect new TimeUUID and qty? Or would it be better to just add it as a new column and always use the latest column for an item in the application code and delete duplicates in the background. 3. Delete item: Can I search by second column in the composite column to find the correct column to delete? I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share. Regards, Roshni This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
Re: Data Modelling Suggestions
Thank you Aaron Guillermo, I find composite columns very confusing :( To reconfirm , 1. we can only search for columns range with the first component on the composite column. 2. After specifying a range for the first component, we cannot further filter for the second component. I found this link http://doanduyhai.wordpress.com/2012/07/05/apache-cassandra-tricks-and-traps/ which seems to suggest filtering is possible by second component in addition to first, and I tried the same example but I couldn't get it to work. Does anyone have an example where suppose I have data like this in my column names Timestamp1: 123, Timestamp2: 456, Timestamp3: 777,Timestamp4: 654 ---get range of columns for (start)component1 = timestamp1, component2=123 , to (end)component1=timestamp3,component2=123 -- should give me only one column Im finding that only the first component is used ….is this understanding correct? We see a lot of examples about Timeseries modelling with TimeUUID as column names. But how is the updating or deletion of columns happening here, how are the columns found to know which ones to delete or modify. Does one always need a separate column family to handle updating/deletion for time series, or is usually handled by setting TTL for data outside the archival period, or does time series modelling usually not involve any manipulation of past records? Regards, Roshni From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Data Modelling Suggestions I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share. It's not. When slicing columns you can only return one contiguous range. Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item +1 Have the orders somewhere, and build a time ordered custom index to show them in order. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/08/2012, at 6:28 AM, Guillermo Winkler gwink...@inconcertcc.commailto:gwink...@inconcertcc.com wrote: I think you need another CF as index. user_itemid - timestamped column_name Otherwise you can't guess what's the timestamp to use in the column name. Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item information. Maybe you can solve it with a secondary index by timestamp too. Guille On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal roshni.rajago...@wal-mart.commailto:roshni.rajago...@wal-mart.com wrote: Hi, Need some help on a data modelling question. We're using Hector Datastax Enterprise 2.1. I want to associate a list of items for a user. It should be sorted on the time added. And items can be updated (quantity of the item can be changed), and items can be deleted. I can model it like this so that its denormalized and I get all my information in one go from one row, sorted by time added. I can use composite columns. Row key: User Id Column Name: TimeUUID:item ID: Item Name: Item Description: Item Price: Item Qty Column Value : Null Now, how do I handle manipulations 1. Add new item :Easy , just a new column 2. Add exiting item or modify qty: I want to get to the correct column to update . Can I search by second column in the composite column (equals condition) update the column name itself to reflect new TimeUUID and qty? Or would it be better to just add it as a new column and always use the latest column for an item in the application code and delete duplicates in the background. 3. Delete item: Can I search by second column in the composite column to find the correct column to delete? I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share. Regards, Roshni This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential *** This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom
Re: Data Modelling Suggestions
I think you need another CF as index. user_itemid - timestamped column_name Otherwise you can't guess what's the timestamp to use in the column name. Anyway I would prefer storing the item-ids as column names in the main column family and having a second CF for the order-by-date query only with the pair timestamp_itemid. That way you can add later other query strategies without messing with how you store the item information. Maybe you can solve it with a secondary index by timestamp too. Guille On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal roshni.rajago...@wal-mart.com wrote: Hi, Need some help on a data modelling question. We're using Hector Datastax Enterprise 2.1. I want to associate a list of items for a user. It should be sorted on the time added. And items can be updated (quantity of the item can be changed), and items can be deleted. I can model it like this so that its denormalized and I get all my information in one go from one row, sorted by time added. I can use composite columns. Row key: User Id Column Name: TimeUUID:item ID: Item Name: Item Description: Item Price: Item Qty Column Value : Null Now, how do I handle manipulations 1. Add new item :Easy , just a new column 2. Add exiting item or modify qty: I want to get to the correct column to update . Can I search by second column in the composite column (equals condition) update the column name itself to reflect new TimeUUID and qty? Or would it be better to just add it as a new column and always use the latest column for an item in the application code and delete duplicates in the background. 3. Delete item: Can I search by second column in the composite column to find the correct column to delete? I was trying to find hector examples where we search for second column in a composite column, but I couldn't find any good one. Im not sure if its possible.…if you have any do have any example please share. Regards, Roshni This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***