Re: schema design question

2010-03-10 Thread Matteo Caprari
Well, I don't like clunky and I'm java friendly. I'll go for the abstract class. Thanks for the help. On Tue, Mar 9, 2010 at 7:33 PM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Mar 9, 2010 at 7:30 AM, Matteo Caprari matteo.capr...@gmail.com wrote: On Tue, Mar 9, 2010 at 1:23 PM,

Re: schema design question

2010-03-10 Thread Jonathan Ellis
if you want to select stuff out w/ one query, then single CF is the only sane choice if not then 2 CFs may be more performant On Wed, Mar 10, 2010 at 4:42 AM, Matteo Caprari matteo.capr...@gmail.com wrote: I can't quite decide if to go with a flat schema, with keys repeated in different CFs

Re: schema design question

2010-03-09 Thread Matteo Caprari
Thanks Jonathan. Correct if I'm wrong: you are suggesting that each time we receive a new row (item, [users]) we do 2 operations: 1) insert (or merge) this row 'as it is' (item, [users]) 2) for each user in [users]: insert (user, [item]) Each incoming item is liked by 100 users, so it would be

Re: schema design question

2010-03-09 Thread Matteo Caprari
On Tue, Mar 9, 2010 at 1:23 PM, Jonathan Ellis jbel...@gmail.com wrote: One quad-core node can handle ~14000 inserts per second so you are in good shape. Well, yeah! instead of 'all users that liked N items'? That's true.  So you'd want to use a custom comparator where first 64 bits is the

Re: schema design question

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 3:53 AM, Matteo Caprari matteo.capr...@gmail.com wrote: Thanks Jonathan. Correct if I'm wrong: you are suggesting that each time we receive a new row (item, [users]) we do 2 operations: 1) insert (or merge) this row 'as it is' (item, [users]) 2) for each user in

Re: schema design question

2010-03-09 Thread Jonathan Ellis
On Tue, Mar 9, 2010 at 7:30 AM, Matteo Caprari matteo.capr...@gmail.com wrote: On Tue, Mar 9, 2010 at 1:23 PM, Jonathan Ellis jbel...@gmail.com wrote: That's true.  So you'd want to use a custom comparator where first 64 bits is the Long and the rest is the userid, for instance. (Long +

schema design question

2010-03-08 Thread Matteo Caprari
Hi. We have a collection operation that generates documents like this: item: { id: unique item id, title: ..., liked_by: [user_2, user_3, ...] } The liked_by list contains on average 100 unique users. Users may also appear in other items. Our database contains a few million entries and is

Re: schema design question

2010-03-08 Thread Jonathan Ellis
On Mon, Mar 8, 2010 at 6:18 AM, Matteo Caprari matteo.capr...@gmail.com wrote: The 'key' queries are: These map straightforwardly to one CF per query. - list all the items a user liked row key is user id, columns names are timeuuid of when the like-ing occurred, column value is either item

Re: schema design question

2010-03-08 Thread Keith Thornhill
jonathan, wouldn't using Long values as the column names for the 3rd CF cause potential conflicts if 2 users liked the same # of items? (only saving one user for any given value) was thinking about this same problem (sorted lists of top N user activity) and thought that was a roadblock for that