your suggestion works for fixed supercolumn name. the blog example now becomes: { blog-id {name, title, ...} blog-id-comments {time:commenter} }
what about supercolumn names that are not fixed? for example, I want to store comment's details with the blog like this: { blog-id { blog { name, title, ...} comments {comment-id:commenter} comment-id {commenter, time, text, ...} } a comment-id is generated on-the-fly when the comment is made. how do you flatten the comment-id supercolumn to normal column? just for brain exercise, not meant to pick on you. thanks, -aj On Mon, May 10, 2010 at 4:39 PM, William Ashley <wash...@gmail.com> wrote: > If you're storing your super column under a fixed name, you could just > concatenate that name with the row key and use normal columns. Then you get > your paging and sorting the way you want it. > > > On May 10, 2010, at 4:31 PM, AJ Chen wrote: > > supercolumn is good for modeling profile type of data. simple example is > blog: > blog { blog {author, title, ...} > comments {time: commenter} //sort by TimeUUID > } > when retrieving a blog, you get all the comments sorted by time already. > without supercolumn, you would need to concatenate multiple comment times > together as you suggested. > > requiring user to concatenating data fields together is not only an extra > burden on user but also a less clean design. there will be cases where the > list property of a profile data is a long list (say a million items). in > such cases, user wants to be able to directly insert/delete an item in that > list because it's more efficient. Retrieving the whole list, updating it, > concatenating again, and then putting it back to datastore is awkward and > less efficient. > > -aj > > > On Mon, May 10, 2010 at 2:20 PM, Mike Malone <m...@simplegeo.com> wrote: > >> On Mon, May 10, 2010 at 1:38 PM, AJ Chen <ajc...@web2express.org> wrote: >> >>> Could someone confirm this discussion is not about abandoning supercolumn >>> family? I have found modeling data with supercolumn family is actually an >>> advantage of cassadra compared to relational database. Hope you are going to >>> drop this important concept. How it's implemented internally is a different >>> matter. >>> >> >> SuperColumns are useful as a convenience mechanism. That's pretty much it. >> There's _nothing_ (as far as I can tell) that you can do with SuperColumns >> that you can't do by manually concatenating key names with a separator on >> the client side and implementing a custom comparator on the server (as ugly >> as that is). >> >> This discussion is about getting rid of SuperColumns and adding a more >> generic mechanism that will actually be useful and interesting and will >> continue to be convenient for the types of use cases for which people use >> SuperColumns. >> >> If there's a particular use case that you feel you can only implement with >> SuperColumns, please share! I honestly can't think of any. >> >> Mike >> >> >>> On Mon, May 10, 2010 at 10:08 AM, Jonathan Shook <jsh...@gmail.com>wrote: >>> >>>> Agreed >>>> >>>> On Mon, May 10, 2010 at 12:01 PM, Mike Malone <m...@simplegeo.com> >>>> wrote: >>>> > On Mon, May 10, 2010 at 9:52 AM, Jonathan Shook <jsh...@gmail.com> >>>> wrote: >>>> >> >>>> >> I have to disagree about the naming of things. The name of something >>>> >> isn't just a literal identifier. It affects the way people think >>>> about >>>> >> it. For new users, the whole naming thing has been a persistent >>>> >> barrier. >>>> > >>>> > I'm saying we shouldn't be worried too much about coming up with names >>>> and >>>> > analogies until we've decided what it is we're naming. >>>> > >>>> >> >>>> >> As for your suggestions, I'm all for simplifying or generalizing the >>>> >> "how it works" part down to a more generalized set of operations. I'm >>>> >> not sure it's a good idea to require users to think in terms building >>>> >> up a fluffy query structure just to thread it through a needle of an >>>> >> API, even for the simplest of queries. At some point, the level of >>>> >> generic boilerplate takes away from the semantic hand rails that >>>> >> developers like. So I guess I'm suggesting that "how it works" and >>>> >> "how we use it" are not always exactly the same. At least they should >>>> >> both hinge on a common conceptual model, which is where the naming >>>> >> becomes an important anchoring point. >>>> > >>>> > If things are done properly, client libraries could expose simplified >>>> query >>>> > interfaces without much effort. Most ORMs these days work by building >>>> a >>>> > propositional directed acyclic graph that's serialized to SQL. This >>>> would >>>> > work the same way, but it wouldn't be converted into a 4GL. >>>> > Mike >>>> > >>>> >> >>>> >> Jonathan >>>> >> >>>> >> On Mon, May 10, 2010 at 11:37 AM, Mike Malone <m...@simplegeo.com> >>>> wrote: >>>> >> > Maybe... but honestly, it doesn't affect the architecture or >>>> interface >>>> >> > at >>>> >> > all. I'm more interested in thinking about how the system should >>>> work >>>> >> > than >>>> >> > what things are called. Naming things are important, but that can >>>> happen >>>> >> > later. >>>> >> > Does anyone have any thoughts or comments on the architecture I >>>> >> > suggested >>>> >> > earlier? >>>> >> > >>>> >> > Mike >>>> >> > >>>> >> > On Mon, May 10, 2010 at 8:36 AM, Schubert Zhang <zson...@gmail.com >>>> > >>>> >> > wrote: >>>> >> >> >>>> >> >> Yes, the "column" here is not appropriate. >>>> >> >> Maybe we need not to create new terms, in Google's Bigtable, the >>>> term >>>> >> >> "qualifier" is a good one. >>>> >> >> >>>> >> >> On Thu, May 6, 2010 at 3:04 PM, David Boxenhorn < >>>> da...@lookin2.com> >>>> >> >> wrote: >>>> >> >>> >>>> >> >>> That would be a good time to get rid of the confusing "column" >>>> term, >>>> >> >>> which incorrectly suggests a two-dimensional tabular structure. >>>> >> >>> >>>> >> >>> Suggestions: >>>> >> >>> >>>> >> >>> 1. A hypercube (or hypocube, if only two dimensions): replace >>>> "key" >>>> >> >>> and >>>> >> >>> "column" with "1st dimension", "2nd dimension", etc. >>>> >> >>> >>>> >> >>> 2. A file system: replace "key" and "column" with "directory" and >>>> >> >>> "subdirectory" >>>> >> >>> >>>> >> >>> 3. A tuple tree: "Column family" replaced by top-level tuple, >>>> whose >>>> >> >>> value >>>> >> >>> is the set of keys, whose value is the set of supercolumns of the >>>> key, >>>> >> >>> whose >>>> >> >>> value is the set of columns for the supercolumn, etc. >>>> >> >>> >>>> >> >>> 4. Etc. >>>> >> >>> >>>> >> >>> On Thu, May 6, 2010 at 2:28 AM, Mike Malone <m...@simplegeo.com> >>>> >> >>> wrote: >>>> >> >>>> >>>> >> >>>> Nice, Ed, we're doing something very similar but less generic. >>>> >> >>>> Now replace all of the various methods for querying with a >>>> simple >>>> >> >>>> query >>>> >> >>>> interface that takes a Predicate, allow the user to specify (in >>>> >> >>>> storage-conf) which levels of the nested Columns should be >>>> indexed, >>>> >> >>>> and >>>> >> >>>> completely remove Comparators and have people subclass Column / >>>> >> >>>> implement >>>> >> >>>> IColumn and we'd really be on to something ;). >>>> >> >>>> Mock storage-conf.xml: >>>> >> >>>> <Column Name="ThingThatsNowKey" Indexed="True" >>>> >> >>>> ClusterPartitioned="True" Type="UTF8"> >>>> >> >>>> <Column Name="ThingThatsNowColumnFamily" >>>> DiskPartitioned="True" >>>> >> >>>> Type="UTF8"> >>>> >> >>>> <Column Name="ThingThatsNowSuperColumnName" Type="Long"> >>>> >> >>>> <Column Name="ThingThatsNowColumnName" Indexed="True" >>>> >> >>>> Type="ASCII"> >>>> >> >>>> <Column Name="ThingThatCantCurrentlyBeRepresented"/> >>>> >> >>>> </Column> >>>> >> >>>> </Column> >>>> >> >>>> </Column> >>>> >> >>>> </Column> >>>> >> >>>> Thrift: >>>> >> >>>> struct NamePredicate { >>>> >> >>>> 1: required list<binary> column_names, >>>> >> >>>> } >>>> >> >>>> struct SlicePredicate { >>>> >> >>>> 1: required binary start, >>>> >> >>>> 2: required binary end, >>>> >> >>>> } >>>> >> >>>> struct CountPredicate { >>>> >> >>>> 1: required struct predicate, >>>> >> >>>> 2: required i32 count=100, >>>> >> >>>> } >>>> >> >>>> struct AndPredicate { >>>> >> >>>> 1: required Predicate left, >>>> >> >>>> 2: required Predicate right, >>>> >> >>>> } >>>> >> >>>> struct SubColumnsPredicate { >>>> >> >>>> 1: required Predicate columns, >>>> >> >>>> 2: required Predicate subcolumns, >>>> >> >>>> } >>>> >> >>>> ... OrPredicate, OtherUsefulPredicates ... >>>> >> >>>> query(predicate, count, consistency_level) # Count here would >>>> be >>>> >> >>>> total >>>> >> >>>> count of leaf values returned, whereas CountPredicate specifies >>>> a >>>> >> >>>> column >>>> >> >>>> count for a particular sub-slice. >>>> >> >>>> Not fully baked... but I think this could really simplify stuff >>>> and >>>> >> >>>> make >>>> >> >>>> it more flexible. Downside is it may give people enough rope to >>>> hang >>>> >> >>>> themselves, but at least the predicate stuff is easily >>>> distributable. >>>> >> >>>> I'm thinking I'll play around with implementing some of this >>>> stuff >>>> >> >>>> myself if I have any free time in the near future. >>>> >> >>>> Mike >>>> >> >>>> >>>> >> >>>> On Wed, May 5, 2010 at 2:04 PM, Jonathan Ellis < >>>> jbel...@gmail.com> >>>> >> >>>> wrote: >>>> >> >>>>> >>>> >> >>>>> Very interesting, thanks! >>>> >> >>>>> >>>> >> >>>>> On Wed, May 5, 2010 at 1:31 PM, Ed Anuff <e...@anuff.com> wrote: >>>> >> >>>>> > Follow-up from last weeks discussion, I've been playing >>>> around >>>> >> >>>>> > with a >>>> >> >>>>> > simple >>>> >> >>>>> > column comparator for composite column names that I put up on >>>> >> >>>>> > github. I'd >>>> >> >>>>> > be interested to hear what people think of this approach. >>>> >> >>>>> > >>>> >> >>>>> > http://github.com/edanuff/CassandraCompositeType >>>> >> >>>>> > >>>> >> >>>>> > Ed >>>> >> >>>>> > >>>> >> >>>>> > On Wed, Apr 28, 2010 at 12:52 PM, Ed Anuff <e...@anuff.com> >>>> wrote: >>>> >> >>>>> >> >>>> >> >>>>> >> It might make sense to create a CompositeType subclass of >>>> >> >>>>> >> AbstractType for >>>> >> >>>>> >> the purpose of constructing and comparing these types of >>>> >> >>>>> >> "composite" >>>> >> >>>>> >> column >>>> >> >>>>> >> names so that if you could more easily do that sort of thing >>>> >> >>>>> >> rather >>>> >> >>>>> >> than >>>> >> >>>>> >> having to concatenate into one big string. >>>> >> >>>>> >> >>>> >> >>>>> >> On Wed, Apr 28, 2010 at 10:25 AM, Mike Malone >>>> >> >>>>> >> <m...@simplegeo.com> >>>> >> >>>>> >> wrote: >>>> >> >>>>> >>> >>>> >> >>>>> >>> The only thing SuperColumns appear to buy you (as someone >>>> >> >>>>> >>> pointed >>>> >> >>>>> >>> out to >>>> >> >>>>> >>> me at the Cassandra meetup - I think it was Eric >>>> Florenzano) is >>>> >> >>>>> >>> that you can >>>> >> >>>>> >>> use different comparator types for the Super/SubColumns, I >>>> >> >>>>> >>> guess..? >>>> >> >>>>> >>> But you >>>> >> >>>>> >>> should be able to do the same thing by creating your own >>>> Column >>>> >> >>>>> >>> comparator. >>>> >> >>>>> >>> I guess my point is that SuperColumns are mostly a >>>> convenience >>>> >> >>>>> >>> mechanism, as >>>> >> >>>>> >>> far as I can tell. >>>> >> >>>>> >>> Mike >>>> >> >>>>> > >>>> >> >>>>> > >>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>>> -- >>>> >> >>>>> Jonathan Ellis >>>> >> >>>>> Project Chair, Apache Cassandra >>>> >> >>>>> co-founder of Riptano, the source for professional Cassandra >>>> support >>>> >> >>>>> http://riptano.com >>>> >> >>>> >>>> >> >>> >>>> >> >> >>>> >> > >>>> >> > >>>> > >>>> > >>>> >>> >>> >>> >>> -- >>> AJ Chen, PhD >>> Chair, Semantic Web SIG, sdforum.org >>> http://web2express.org >>> twitter @web2express >>> Palo Alto, CA, USA >>> >> >> > > > -- > AJ Chen, PhD > Chair, Semantic Web SIG, sdforum.org > http://web2express.org > twitter @web2express > Palo Alto, CA, USA > > > -- AJ Chen, PhD Chair, Semantic Web SIG, sdforum.org http://web2express.org twitter @web2express Palo Alto, CA, USA