Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?
It's possible that I am misunderstanding the question in some way. The row keys can be Time UUIDs and with those row keys as column names, u can use comparator TIMEUUIDTYPE to have them sorted by time automatically. On Fri, Jan 14, 2011 at 9:18 AM, Aaron Morton aa...@thelastpickle.comwrote: You could make the time an a fixed width integer and prefix your row keys with it, then set the comparotor to ascii or utf. Some issues: - Will you have time collisions ? - Not sure what your are storing in the super columns, but their are limitations http://wiki.apache.org/cassandra/CassandraLimitations http://wiki.apache.org/cassandra/CassandraLimitations- If you are using cassandra 0.7, have you looked at the secondary indexes ? http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexes http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexesIf you provide some more info on the problem your trying to solve we may be able to help some more. Cheers Aaron On 14 Jan, 2011,at 04:27 PM, Aklin_81 asdk...@gmail.com wrote: I would like to keep the reference of other rows as names of super column and sort those super columns according to time. Is there any way I could implement that ? Thanks in advance! -- Roshan Blog: http://roshandawrani.wordpress.com/ Twitter: @roshandawrani http://twitter.com/roshandawrani Skype: roshandawrani # # # #
Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?
@Roshan Yes, I thought about that, but then I wouldn't be able to use the Random Partitioner. @Aaron Do you mean like this: 'timeUUID+ row_key' as the supercolumn names? then when retriving the row_key from this column name, will I be required to parse the name ? How do I do that exactly ? Some issues: - Will you have time collisions ? No I wont be mostly having time collisions. If they happen in 1% case, I dont mind. - Not sure what your are storing in the super columns, but their are limitations. I would be storing maximum 5 subcolumns inside and would be retrieving them altogether. - If you are using cassandra 0.7, have you looked at the secondary indexes ? Yes I did but I think they are not helpful in my case. This is what I am trying to do : ** This is from an older post that I made earlier on the mailing list:- I am working on a project of Questions/answers forum that allows a user to follow questions on certain topics from his followies. I want to build user's news-feed that comprises of only those questions that have been posted by his followies tagged on the topics that he is following. Simple news-feed design that shows all the posts from network would be easy to design using Cassandra by executing fast writes to all followers of a user about the post from user. But for my application, there is an additional filter of 'followed topics', (ie, the user receives posts created by his followies on topics user is following) I was thinking of implementing this way: Initially writing to all followers, the postID of posts from their network, by adding a supercolumn to the rows of all followers in the News-feed supercolumnfamily, with supercolumn name as timestamp(for sort by time) and 5 sub-columns containing the topic tags of that post. At the read time, compare subcolumn values with the topics user is following, if they match then show the post. (I would be required to fetch the list of followed topics of the user at read time, hence should I store the topic list as a supercolumn in this Newsfeed supercolumnfamily only?) An important point to note that, often, the posts will have zero subcolumn which would mean that this post has to be shown without validating with the user's list of followed topics. There is another view for the users which allows them to see all the posts from their followies(without topic filters). In this case no checking of subcolumns for topics will be performed. I got good insights from Tyler on this, but he was recommending me an approach which although would be beneficial for reads performance, but by way of too much denormalizing like 70-80x. I currently fear that approach and would like to test upon this. ** any comments, feedback greatly appreciated.. thanks so much! On 1/14/11, Roshan Dawrani roshandawr...@gmail.com wrote: It's possible that I am misunderstanding the question in some way. The row keys can be Time UUIDs and with those row keys as column names, u can use comparator TIMEUUIDTYPE to have them sorted by time automatically. On Fri, Jan 14, 2011 at 9:18 AM, Aaron Morton aa...@thelastpickle.comwrote: You could make the time an a fixed width integer and prefix your row keys with it, then set the comparotor to ascii or utf. Some issues: - Will you have time collisions ? - Not sure what your are storing in the super columns, but their are limitations http://wiki.apache.org/cassandra/CassandraLimitations http://wiki.apache.org/cassandra/CassandraLimitations- If you are using cassandra 0.7, have you looked at the secondary indexes ? http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexes http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexesIf you provide some more info on the problem your trying to solve we may be able to help some more. Cheers Aaron On 14 Jan, 2011,at 04:27 PM, Aklin_81 asdk...@gmail.com wrote: I would like to keep the reference of other rows as names of super column and sort those super columns according to time. Is there any way I could implement that ? Thanks in advance! -- Roshan Blog: http://roshandawrani.wordpress.com/ Twitter: @roshandawrani http://twitter.com/roshandawrani Skype: roshandawrani # # # #
Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?
On Fri, Jan 14, 2011 at 7:15 PM, Aklin_81 asdk...@gmail.com wrote: @Roshan Yes, I thought about that, but then I wouldn't be able to use the Random Partitioner. Can you please expand a bit on this? What is this restriction? Can you point me to some relevant documentation on this? Thanks. # # # #
Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?
I am not sure but I guess because all the rows of certain time range will go to just one node will not be evenly distributed because the timeUUID will not be random but sequential according to time... I am not sure anyways... On Fri, Jan 14, 2011 at 7:18 PM, Roshan Dawrani roshandawr...@gmail.comwrote: On Fri, Jan 14, 2011 at 7:15 PM, Aklin_81 asdk...@gmail.com wrote: @Roshan Yes, I thought about that, but then I wouldn't be able to use the Random Partitioner. Can you please expand a bit on this? What is this restriction? Can you point me to some relevant documentation on this? Thanks. #12d84d3a0b3ce961_12d84c9312ae2134_ #12d84d3a0b3ce961_12d84c9312ae2134_ #12d84d3a0b3ce961_12d84c9312ae2134_ #12d84d3a0b3ce961_12d84c9312ae2134_
Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?
I too believed so! but not totally sure. On 1/14/11, Rajkumar Gupta rajkumar@gmail.com wrote: I am not sure but I guess because all the rows of certain time range will go to just one node will not be evenly distributed because the timeUUID will not be random but sequential according to time... I am not sure anyways... On Fri, Jan 14, 2011 at 7:18 PM, Roshan Dawrani roshandawr...@gmail.comwrote: On Fri, Jan 14, 2011 at 7:15 PM, Aklin_81 asdk...@gmail.com wrote: @Roshan Yes, I thought about that, but then I wouldn't be able to use the Random Partitioner. Can you please expand a bit on this? What is this restriction? Can you point me to some relevant documentation on this? Thanks. #12d84d3a0b3ce961_12d84c9312ae2134_ #12d84d3a0b3ce961_12d84c9312ae2134_ #12d84d3a0b3ce961_12d84c9312ae2134_ #12d84d3a0b3ce961_12d84c9312ae2134_
Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?
I am not clear what you guys are trying to do and say :-) So, let's take some specifics... Say you want to create rows in some column family (say CF_A), and as you create them, you want to store their row key in column names in some other column family (say CF_B) - possibly for filtering keys based on time later, etc, etc... Now your rows in CF_A may be keyed on a TimeUUID and if you store these keys as column names in CF_B that has comparator as TimeUUID, then you get your column names time sorted automatically. Now CF_A may be split across nodes - is that of any concern to you? Are you expecting any storage relationship between column names of CF_B and rows of CF_A? rgds, Roshan On Fri, Jan 14, 2011 at 7:58 PM, Aklin_81 asdk...@gmail.com wrote: I too believed so! but not totally sure. On 1/14/11, Rajkumar Gupta rajkumar@gmail.com wrote: I am not sure but I guess because all the rows of certain time range will go to just one node will not be evenly distributed because the timeUUID will not be random but sequential according to time... I am not sure anyways... # # # #
Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?
I just read that cassandra internally creates a md5 hash that is used for distributing the load by sending it to a node reponsible for the range within which that md5 hash falls, so even when we create sequential keys, their MD5 hash is not the same hence they are not sent to same node. This was my misunderstanding of this concept. Sorry for creating confusions ! So.. with this I think I will be able to use timeUUID as row key !? Aaron, if you could kindly share your views on my response to your queries above. On 1/14/11, Roshan Dawrani roshandawr...@gmail.com wrote: I am not clear what you guys are trying to do and say :-) So, let's take some specifics... Say you want to create rows in some column family (say CF_A), and as you create them, you want to store their row key in column names in some other column family (say CF_B) - possibly for filtering keys based on time later, etc, etc... Now your rows in CF_A may be keyed on a TimeUUID and if you store these keys as column names in CF_B that has comparator as TimeUUID, then you get your column names time sorted automatically. Now CF_A may be split across nodes - is that of any concern to you? Are you expecting any storage relationship between column names of CF_B and rows of CF_A? rgds, Roshan On Fri, Jan 14, 2011 at 7:58 PM, Aklin_81 asdk...@gmail.com wrote: I too believed so! but not totally sure. On 1/14/11, Rajkumar Gupta rajkumar@gmail.com wrote: I am not sure but I guess because all the rows of certain time range will go to just one node will not be evenly distributed because the timeUUID will not be random but sequential according to time... I am not sure anyways... # # # #
Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?
No, you do not need to shut up, please! :) you may be clearing up my further misconceptions on the topic! Anyways, the link b/w 1st and 2nd para was that since the rows distribution among nodes is not affected by key(as you rightly said) but by md5 hash of the key thus I can use just any key including the timeUUIDType key (that would be helpful in my case) with Random partition. On 1/14/11, Roshan Dawrani roshandawr...@gmail.com wrote: On Fri, Jan 14, 2011 at 8:51 PM, Aklin_81 asdk...@gmail.com wrote: I just read that cassandra internally creates a md5 hash that is used for distributing the load by sending it to a node reponsible for the range within which that md5 hash falls, so even when we create sequential keys, their MD5 hash is not the same hence they are not sent to same node. This was my misunderstanding of this concept. Sorry for creating confusions ! So.. with this I think I will be able to use timeUUID as row key !? Now, what really is the link between your corrected understanding and the conclusion in the 2nd para? :-) I miss the link you are using to come from para 1 to para 2. Just because you use time UUID as the row key, there is no storage guarantee because of that. Distribution of rows and ordering across nodes is only based on what partitioner you are using - it is not (only) related to the the type of the key. May be I should just shut up now as I don't seem to be understanding you requirement :-) # # # #
Is there any way I could use keys of other rows as column names that could be sorted according to time ?
I would like to keep the reference of other rows as names of super column and sort those super columns according to time. Is there any way I could implement that ? Thanks in advance!
Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?
You could make the time an a fixed width integer and prefix your row keys with it, then set the comparotor to ascii or utf.Some issues:- Will you have time collisions ?- Not sure what your are storing in the super columns, but their are limitationshttp://wiki.apache.org/cassandra/CassandraLimitations- If you are using cassandra 0.7, have you looked at the secondary indexes ?http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexesIf you provide some more info on the problem your trying to solve we may be able to help some more.CheersAaronOn 14 Jan, 2011,at 04:27 PM, Aklin_81 asdk...@gmail.com wrote:I would like to keep the reference of other rows as names of super column and sort those super columns according to time. Is there any way I could implement that ? Thanks in advance!