Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

2011-01-14 Thread Roshan Dawrani
It's possible that I am misunderstanding the question in some way.

The row keys can be Time UUIDs and with those row keys as column names, u
can use comparator TIMEUUIDTYPE to have them sorted by time automatically.

On Fri, Jan 14, 2011 at 9:18 AM, Aaron Morton aa...@thelastpickle.comwrote:

 You could make the time an a fixed width integer and prefix your row keys
 with it, then set the comparotor to ascii or utf.

 Some issues:
 - Will you have time collisions ?
 - Not sure what your are storing in the super columns, but their are
 limitations http://wiki.apache.org/cassandra/CassandraLimitations
 http://wiki.apache.org/cassandra/CassandraLimitations- If you are using
 cassandra 0.7, have you looked at the secondary indexes ?
 http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexes

 http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexesIf
 you provide some more info on the problem your trying to solve we may be
 able to help some more.

 Cheers
 Aaron


 On 14 Jan, 2011,at 04:27 PM, Aklin_81 asdk...@gmail.com wrote:

 I would like to keep the reference of other rows as names of super
 column and sort those super columns according to time.
 Is there any way I could implement that ?

 Thanks in advance!




-- 
Roshan
Blog: http://roshandawrani.wordpress.com/
Twitter: @roshandawrani http://twitter.com/roshandawrani
Skype: roshandawrani

   #
#
#   #


Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

2011-01-14 Thread Aklin_81
@Roshan
Yes, I thought about that, but then I wouldn't be able to use the
Random Partitioner.

@Aaron

Do you mean like this: 'timeUUID+ row_key'  as the supercolumn names?
then when retriving the row_key from this column name, will I be
required to parse the name ? How do I do that exactly ?


Some issues:
- Will you have time collisions ?
No I wont be mostly having time collisions. If they happen in 1% case,
I dont mind.

- Not sure what your are storing in the super columns, but their are 
limitations.
I would be storing maximum 5 subcolumns inside and would be retrieving
them altogether.

- If you are using cassandra 0.7, have you looked at the secondary indexes ?

Yes I did but I think they are not helpful in my case.

This is what I am trying to do :
**
This is from an older post that I made earlier on the mailing list:-
I am working on a project of Questions/answers forum that allows a
user to follow questions on certain topics from his followies.
I want to build user's news-feed that comprises of only those
questions that have been posted by his followies  tagged on the
topics that he is following.
Simple news-feed design that shows all the posts from network would be
easy to design using Cassandra by executing fast writes to all
followers of a user about the post from user. But for my application,
there is an additional filter of 'followed topics', (ie, the user
receives posts created by his followies  on topics user is
following)

I was thinking of implementing this way:
Initially writing to all followers, the postID of posts from their
network, by adding a supercolumn to the rows of all followers in the
News-feed supercolumnfamily, with supercolumn name as timestamp(for
sort by time) and 5 sub-columns containing the topic tags of that
post.
At the read time, compare subcolumn values with the topics user is
following, if they match then show the post. (I would be required to
fetch the list of followed topics of the user at read time, hence
should I store the topic list as a supercolumn in this Newsfeed
supercolumnfamily only?)

An important point to note that, often, the posts will have zero
subcolumn which would mean that this post has to be shown without
validating with the user's list of followed topics.

There is another view for the users which allows them to see all the
posts from their followies(without topic filters). In this case no
checking of subcolumns for topics will be performed.

I got good insights from Tyler on this, but he was recommending me an
approach which although would be beneficial for reads performance, but
by way of too much denormalizing like 70-80x. I currently fear that
approach and would like to test upon this.
**
any comments, feedback greatly appreciated..

thanks so much!

On 1/14/11, Roshan Dawrani roshandawr...@gmail.com wrote:
 It's possible that I am misunderstanding the question in some way.

 The row keys can be Time UUIDs and with those row keys as column names, u
 can use comparator TIMEUUIDTYPE to have them sorted by time automatically.

 On Fri, Jan 14, 2011 at 9:18 AM, Aaron Morton
 aa...@thelastpickle.comwrote:

 You could make the time an a fixed width integer and prefix your row keys
 with it, then set the comparotor to ascii or utf.

 Some issues:
 - Will you have time collisions ?
 - Not sure what your are storing in the super columns, but their are
 limitations http://wiki.apache.org/cassandra/CassandraLimitations
 http://wiki.apache.org/cassandra/CassandraLimitations- If you are using
 cassandra 0.7, have you looked at the secondary indexes ?
 http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexes

 http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexesIf
 you provide some more info on the problem your trying to solve we may be
 able to help some more.

 Cheers
 Aaron


 On 14 Jan, 2011,at 04:27 PM, Aklin_81 asdk...@gmail.com wrote:

 I would like to keep the reference of other rows as names of super
 column and sort those super columns according to time.
 Is there any way I could implement that ?

 Thanks in advance!




 --
 Roshan
 Blog: http://roshandawrani.wordpress.com/
 Twitter: @roshandawrani http://twitter.com/roshandawrani
 Skype: roshandawrani

#
 #
 #   #



Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

2011-01-14 Thread Roshan Dawrani
On Fri, Jan 14, 2011 at 7:15 PM, Aklin_81 asdk...@gmail.com wrote:

 @Roshan
 Yes, I thought about that, but then I wouldn't be able to use the
 Random Partitioner.


Can you please expand a bit on this? What is this restriction? Can you point
me to some relevant documentation on this?

Thanks.
   #
#
#   #


Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

2011-01-14 Thread Rajkumar Gupta
I am not sure but I guess because all the rows of certain time range will go
to just one node  will not be evenly distributed because the timeUUID will
not be random but sequential according to time... I am not sure anyways...

On Fri, Jan 14, 2011 at 7:18 PM, Roshan Dawrani roshandawr...@gmail.comwrote:

 On Fri, Jan 14, 2011 at 7:15 PM, Aklin_81 asdk...@gmail.com wrote:

 @Roshan
 Yes, I thought about that, but then I wouldn't be able to use the
 Random Partitioner.


 Can you please expand a bit on this? What is this restriction? Can you
 point me to some relevant documentation on this?

 Thanks.

 #12d84d3a0b3ce961_12d84c9312ae2134_
 #12d84d3a0b3ce961_12d84c9312ae2134_
 #12d84d3a0b3ce961_12d84c9312ae2134_   
 #12d84d3a0b3ce961_12d84c9312ae2134_



Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

2011-01-14 Thread Aklin_81
I too believed so!  but not totally sure.

On 1/14/11, Rajkumar Gupta rajkumar@gmail.com wrote:
 I am not sure but I guess because all the rows of certain time range will go
 to just one node  will not be evenly distributed because the timeUUID will
 not be random but sequential according to time... I am not sure anyways...

 On Fri, Jan 14, 2011 at 7:18 PM, Roshan Dawrani
 roshandawr...@gmail.comwrote:

 On Fri, Jan 14, 2011 at 7:15 PM, Aklin_81 asdk...@gmail.com wrote:

 @Roshan
 Yes, I thought about that, but then I wouldn't be able to use the
 Random Partitioner.


 Can you please expand a bit on this? What is this restriction? Can you
 point me to some relevant documentation on this?

 Thanks.

 #12d84d3a0b3ce961_12d84c9312ae2134_
 #12d84d3a0b3ce961_12d84c9312ae2134_
 #12d84d3a0b3ce961_12d84c9312ae2134_
 #12d84d3a0b3ce961_12d84c9312ae2134_




Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

2011-01-14 Thread Roshan Dawrani
I am not clear what you guys are trying to do and say :-)

So, let's take some specifics...

Say you want to create rows in some column family (say CF_A), and as you
create them, you want to store their row key in column names in some other
column family (say CF_B) - possibly for filtering keys based on time later,
etc, etc...

Now your rows in CF_A may be keyed on a TimeUUID and if you store these keys
as column names in CF_B that has comparator as TimeUUID, then you get your
column names time sorted automatically.

Now CF_A may be split across nodes - is that of any concern to you?

Are you expecting any storage relationship between column names of CF_B and
rows of CF_A?

rgds,
Roshan

On Fri, Jan 14, 2011 at 7:58 PM, Aklin_81 asdk...@gmail.com wrote:

 I too believed so!  but not totally sure.

 On 1/14/11, Rajkumar Gupta rajkumar@gmail.com wrote:
  I am not sure but I guess because all the rows of certain time range will
 go
  to just one node  will not be evenly distributed because the timeUUID
 will
  not be random but sequential according to time... I am not sure
 anyways...
 


   #
#
#   #


Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

2011-01-14 Thread Aklin_81
I just read that cassandra internally creates a md5 hash that is used
for distributing the load by sending it to a node reponsible for the
range within which that md5 hash falls, so even when we create
sequential keys, their MD5 hash is not the same  hence they are not
sent to same node. This was my misunderstanding of this concept.
Sorry for creating confusions !

So.. with this I think I will be able to use timeUUID as row key !?

Aaron, if you could kindly share your views on my response to your
queries above.




On 1/14/11, Roshan Dawrani roshandawr...@gmail.com wrote:
 I am not clear what you guys are trying to do and say :-)

 So, let's take some specifics...

 Say you want to create rows in some column family (say CF_A), and as you
 create them, you want to store their row key in column names in some other
 column family (say CF_B) - possibly for filtering keys based on time later,
 etc, etc...

 Now your rows in CF_A may be keyed on a TimeUUID and if you store these keys
 as column names in CF_B that has comparator as TimeUUID, then you get your
 column names time sorted automatically.

 Now CF_A may be split across nodes - is that of any concern to you?

 Are you expecting any storage relationship between column names of CF_B and
 rows of CF_A?

 rgds,
 Roshan

 On Fri, Jan 14, 2011 at 7:58 PM, Aklin_81 asdk...@gmail.com wrote:

 I too believed so!  but not totally sure.

 On 1/14/11, Rajkumar Gupta rajkumar@gmail.com wrote:
  I am not sure but I guess because all the rows of certain time range
  will
 go
  to just one node  will not be evenly distributed because the timeUUID
 will
  not be random but sequential according to time... I am not sure
 anyways...
 


#
 #
 #   #



Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

2011-01-14 Thread Aklin_81
No,  you do not need to shut up, please! :)
you may be clearing up my further misconceptions on the topic!

Anyways, the link b/w 1st and 2nd para was that since the rows
distribution among nodes is not affected by key(as you rightly said)
but by md5 hash of the key thus I can use just any key including the
timeUUIDType key (that would be helpful in my case) with Random
partition.



On 1/14/11, Roshan Dawrani roshandawr...@gmail.com wrote:
 On Fri, Jan 14, 2011 at 8:51 PM, Aklin_81 asdk...@gmail.com wrote:

 I just read that cassandra internally creates a md5 hash that is used
 for distributing the load by sending it to a node reponsible for the
 range within which that md5 hash falls, so even when we create
 sequential keys, their MD5 hash is not the same  hence they are not
 sent to same node. This was my misunderstanding of this concept.
 Sorry for creating confusions !

 So.. with this I think I will be able to use timeUUID as row key !?


 Now, what really is the link between your corrected understanding and the
 conclusion in the 2nd para? :-)

 I miss the link you are using to come from para 1 to para 2.

 Just because you use time UUID as the row key, there is no storage guarantee
 because of that. Distribution of rows and ordering across nodes is only
 based on what partitioner you are using - it is not (only) related to the
 the type of the key.

 May be I should just shut up now as I don't seem to be understanding you
 requirement :-)







   #
 #
 #   #



Is there any way I could use keys of other rows as column names that could be sorted according to time ?

2011-01-13 Thread Aklin_81
I would like to keep the reference of other rows as names of super
column and sort those super columns according to time.
Is there any way I could implement that ?

Thanks in advance!


Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

2011-01-13 Thread Aaron Morton
You could make the time an a fixed width integer and prefix your row keys with it, then set the comparotor to ascii or utf.Some issues:- Will you have time collisions ?- Not sure what your are storing in the super columns, but their are limitationshttp://wiki.apache.org/cassandra/CassandraLimitations- If you are using cassandra 0.7, have you looked at the secondary indexes ?http://www.riptano.com/blog/whats-new-cassandra-07-secondary-indexesIf you provide some more info on the problem your trying to solve we may be able to help some more.CheersAaronOn 14 Jan, 2011,at 04:27 PM, Aklin_81 asdk...@gmail.com wrote:I would like to keep the reference of other rows as names of super
column and sort those super columns according to time.
Is there any way I could implement that ?

Thanks in advance!