Re: Is SuperColumn necessary?

William Ashley Mon, 10 May 2010 17:37:06 -0700

I'm having a difficult time understanding your syntax. Could you provide an 
example with actual data?


On May 10, 2010, at 5:25 PM, AJ Chen wrote:

> your suggestion works for fixed supercolumn name. the blog example now 
> becomes:
> { blog-id {name, title, ...}
>   blog-id-comments {time:commenter}
> }
> 
> what about supercolumn names that are not fixed? for example, I want to store 
> comment's details with the blog like this:
> { blog-id { blog { name, title, ...}
>               comments {comment-id:commenter}
>               comment-id {commenter, time, text, ...}
> }
> 
> a comment-id is generated on-the-fly when the comment is made.  how do you 
> flatten the comment-id supercolumn to normal column?  just for brain 
> exercise, not meant to pick on you.
> 
> thanks,
> -aj
>   
> 
> 
> On Mon, May 10, 2010 at 4:39 PM, William Ashley <[email protected]> wrote:
> If you're storing your super column under a fixed name, you could just 
> concatenate that name with the row key and use normal columns. Then you get 
> your paging and sorting the way you want it.
> 
> 
> On May 10, 2010, at 4:31 PM, AJ Chen wrote:
> 
>> supercolumn is good for modeling profile type of data. simple example is 
>> blog:
>> blog { blog {author,  title, ...}
>>          comments   {time: commenter}  //sort by TimeUUID
>> }
>> when retrieving a blog, you get all the comments sorted by time already.
>> without supercolumn, you would need to concatenate multiple comment times 
>> together as you suggested. 
>> 
>> requiring user to concatenating data fields together is not only an extra 
>> burden on user but also a less clean design.  there will be cases where the 
>> list property of a profile data is a long list (say a million items). in 
>> such cases, user wants to be able to directly insert/delete an item in that 
>> list because it's more efficient.  Retrieving the whole list, updating it, 
>> concatenating again, and then putting it back to datastore is awkward and 
>> less efficient.
>> 
>> -aj
>> 
>> 
>> On Mon, May 10, 2010 at 2:20 PM, Mike Malone <[email protected]> wrote:
>> On Mon, May 10, 2010 at 1:38 PM, AJ Chen <[email protected]> wrote:
>> Could someone confirm this discussion is not about abandoning supercolumn 
>> family? I have found modeling data with supercolumn family is actually an 
>> advantage of cassadra compared to relational database. Hope you are going to 
>> drop this important concept.  How it's implemented internally is a different 
>> matter.
>> 
>> SuperColumns are useful as a convenience mechanism. That's pretty much it. 
>> There's _nothing_ (as far as I can tell) that you can do with SuperColumns 
>> that you can't do by manually concatenating key names with a separator on 
>> the client side and implementing a custom comparator on the server (as ugly 
>> as that is).
>> 
>> This discussion is about getting rid of SuperColumns and adding a more 
>> generic mechanism that will actually be useful and interesting and will 
>> continue to be convenient for the types of use cases for which people use 
>> SuperColumns.
>> 
>> If there's a particular use case that you feel you can only implement with 
>> SuperColumns, please share! I honestly can't think of any.
>> 
>> Mike
>> 
>> 
>> On Mon, May 10, 2010 at 10:08 AM, Jonathan Shook <[email protected]> wrote:
>> Agreed
>> 
>> On Mon, May 10, 2010 at 12:01 PM, Mike Malone <[email protected]> wrote:
>> > On Mon, May 10, 2010 at 9:52 AM, Jonathan Shook <[email protected]> wrote:
>> >>
>> >> I have to disagree about the naming of things. The name of something
>> >> isn't just a literal identifier. It affects the way people think about
>> >> it. For new users, the whole naming thing has been a persistent
>> >> barrier.
>> >
>> > I'm saying we shouldn't be worried too much about coming up with names and
>> > analogies until we've decided what it is we're naming.
>> >
>> >>
>> >> As for your suggestions, I'm all for simplifying or generalizing the
>> >> "how it works" part down to a more generalized set of operations. I'm
>> >> not sure it's a good idea to require users to think in terms building
>> >> up a fluffy query structure just to thread it through a needle of an
>> >> API, even for the simplest of queries. At some point, the level of
>> >> generic boilerplate takes away from the semantic hand rails that
>> >> developers like. So I guess I'm suggesting that "how it works" and
>> >> "how we use it" are not always exactly the same. At least they should
>> >> both hinge on a common conceptual model, which is where the naming
>> >> becomes an important anchoring point.
>> >
>> > If things are done properly, client libraries could expose simplified query
>> > interfaces without much effort. Most ORMs these days work by building a
>> > propositional directed acyclic graph that's serialized to SQL. This would
>> > work the same way, but it wouldn't be converted into a 4GL.
>> > Mike
>> >
>> >>
>> >> Jonathan
>> >>
>> >> On Mon, May 10, 2010 at 11:37 AM, Mike Malone <[email protected]> wrote:
>> >> > Maybe... but honestly, it doesn't affect the architecture or interface
>> >> > at
>> >> > all. I'm more interested in thinking about how the system should work
>> >> > than
>> >> > what things are called. Naming things are important, but that can happen
>> >> > later.
>> >> > Does anyone have any thoughts or comments on the architecture I
>> >> > suggested
>> >> > earlier?
>> >> >
>> >> > Mike
>> >> >
>> >> > On Mon, May 10, 2010 at 8:36 AM, Schubert Zhang <[email protected]>
>> >> > wrote:
>> >> >>
>> >> >> Yes, the "column" here is not appropriate.
>> >> >> Maybe we need not to create new terms, in Google's Bigtable, the term
>> >> >> "qualifier" is a good one.
>> >> >>
>> >> >> On Thu, May 6, 2010 at 3:04 PM, David Boxenhorn <[email protected]>
>> >> >> wrote:
>> >> >>>
>> >> >>> That would be a good time to get rid of the confusing "column" term,
>> >> >>> which incorrectly suggests a two-dimensional tabular structure.
>> >> >>>
>> >> >>> Suggestions:
>> >> >>>
>> >> >>> 1. A hypercube (or hypocube, if only two dimensions): replace "key"
>> >> >>> and
>> >> >>> "column" with "1st dimension", "2nd dimension", etc.
>> >> >>>
>> >> >>> 2. A file system: replace "key" and "column" with "directory" and
>> >> >>> "subdirectory"
>> >> >>>
>> >> >>> 3. A tuple tree: "Column family" replaced by top-level tuple, whose
>> >> >>> value
>> >> >>> is the set of keys, whose value is the set of supercolumns of the key,
>> >> >>> whose
>> >> >>> value is the set of columns for the supercolumn, etc.
>> >> >>>
>> >> >>> 4. Etc.
>> >> >>>
>> >> >>> On Thu, May 6, 2010 at 2:28 AM, Mike Malone <[email protected]>
>> >> >>> wrote:
>> >> >>>>
>> >> >>>> Nice, Ed, we're doing something very similar but less generic.
>> >> >>>> Now replace all of the various methods for querying with a simple
>> >> >>>> query
>> >> >>>> interface that takes a Predicate, allow the user to specify (in
>> >> >>>> storage-conf) which levels of the nested Columns should be indexed,
>> >> >>>> and
>> >> >>>> completely remove Comparators and have people subclass Column /
>> >> >>>> implement
>> >> >>>> IColumn and we'd really be on to something ;).
>> >> >>>> Mock storage-conf.xml:
>> >> >>>>   <Column Name="ThingThatsNowKey" Indexed="True"
>> >> >>>> ClusterPartitioned="True" Type="UTF8">
>> >> >>>>     <Column Name="ThingThatsNowColumnFamily" DiskPartitioned="True"
>> >> >>>> Type="UTF8">
>> >> >>>>       <Column Name="ThingThatsNowSuperColumnName" Type="Long">
>> >> >>>>         <Column Name="ThingThatsNowColumnName" Indexed="True"
>> >> >>>> Type="ASCII">
>> >> >>>>           <Column Name="ThingThatCantCurrentlyBeRepresented"/>
>> >> >>>>         </Column>
>> >> >>>>       </Column>
>> >> >>>>     </Column>
>> >> >>>>   </Column>
>> >> >>>> Thrift:
>> >> >>>>   struct NamePredicate {
>> >> >>>>     1: required list<binary> column_names,
>> >> >>>>   }
>> >> >>>>   struct SlicePredicate {
>> >> >>>>     1: required binary start,
>> >> >>>>     2: required binary end,
>> >> >>>>   }
>> >> >>>>   struct CountPredicate {
>> >> >>>>     1: required struct predicate,
>> >> >>>>     2: required i32 count=100,
>> >> >>>>   }
>> >> >>>>   struct AndPredicate {
>> >> >>>>     1: required Predicate left,
>> >> >>>>     2: required Predicate right,
>> >> >>>>   }
>> >> >>>>   struct SubColumnsPredicate {
>> >> >>>>     1: required Predicate columns,
>> >> >>>>     2: required Predicate subcolumns,
>> >> >>>>   }
>> >> >>>>   ... OrPredicate, OtherUsefulPredicates ...
>> >> >>>>   query(predicate, count, consistency_level) # Count here would be
>> >> >>>> total
>> >> >>>> count of leaf values returned, whereas CountPredicate specifies a
>> >> >>>> column
>> >> >>>> count for a particular sub-slice.
>> >> >>>> Not fully baked... but I think this could really simplify stuff and
>> >> >>>> make
>> >> >>>> it more flexible. Downside is it may give people enough rope to hang
>> >> >>>> themselves, but at least the predicate stuff is easily distributable.
>> >> >>>> I'm thinking I'll play around with implementing some of this stuff
>> >> >>>> myself if I have any free time in the near future.
>> >> >>>> Mike
>> >> >>>>
>> >> >>>> On Wed, May 5, 2010 at 2:04 PM, Jonathan Ellis <[email protected]>
>> >> >>>> wrote:
>> >> >>>>>
>> >> >>>>> Very interesting, thanks!
>> >> >>>>>
>> >> >>>>> On Wed, May 5, 2010 at 1:31 PM, Ed Anuff <[email protected]> wrote:
>> >> >>>>> > Follow-up from last weeks discussion, I've been playing around
>> >> >>>>> > with a
>> >> >>>>> > simple
>> >> >>>>> > column comparator for composite column names that I put up on
>> >> >>>>> > github.  I'd
>> >> >>>>> > be interested to hear what people think of this approach.
>> >> >>>>> >
>> >> >>>>> > http://github.com/edanuff/CassandraCompositeType
>> >> >>>>> >
>> >> >>>>> > Ed
>> >> >>>>> >
>> >> >>>>> > On Wed, Apr 28, 2010 at 12:52 PM, Ed Anuff <[email protected]> wrote:
>> >> >>>>> >>
>> >> >>>>> >> It might make sense to create a CompositeType subclass of
>> >> >>>>> >> AbstractType for
>> >> >>>>> >> the purpose of constructing and comparing these types of
>> >> >>>>> >> "composite"
>> >> >>>>> >> column
>> >> >>>>> >> names so that if you could more easily do that sort of thing
>> >> >>>>> >> rather
>> >> >>>>> >> than
>> >> >>>>> >> having to concatenate into one big string.
>> >> >>>>> >>
>> >> >>>>> >> On Wed, Apr 28, 2010 at 10:25 AM, Mike Malone
>> >> >>>>> >> <[email protected]>
>> >> >>>>> >> wrote:
>> >> >>>>> >>>
>> >> >>>>> >>> The only thing SuperColumns appear to buy you (as someone
>> >> >>>>> >>> pointed
>> >> >>>>> >>> out to
>> >> >>>>> >>> me at the Cassandra meetup - I think it was Eric Florenzano) is
>> >> >>>>> >>> that you can
>> >> >>>>> >>> use different comparator types for the Super/SubColumns, I
>> >> >>>>> >>> guess..?
>> >> >>>>> >>> But you
>> >> >>>>> >>> should be able to do the same thing by creating your own Column
>> >> >>>>> >>> comparator.
>> >> >>>>> >>> I guess my point is that SuperColumns are mostly a convenience
>> >> >>>>> >>> mechanism, as
>> >> >>>>> >>> far as I can tell.
>> >> >>>>> >>> Mike
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Jonathan Ellis
>> >> >>>>> Project Chair, Apache Cassandra
>> >> >>>>> co-founder of Riptano, the source for professional Cassandra support
>> >> >>>>> http://riptano.com
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >
>> >> >
>> >
>> >
>> 
>> 
>> 
>> -- 
>> AJ Chen, PhD
>> Chair, Semantic Web SIG, sdforum.org
>> http://web2express.org
>> twitter @web2express
>> Palo Alto, CA, USA
>> 
>> 
>> 
>> 
>> -- 
>> AJ Chen, PhD
>> Chair, Semantic Web SIG, sdforum.org
>> http://web2express.org
>> twitter @web2express
>> Palo Alto, CA, USA
> 
> 
> 
> 
> -- 
> AJ Chen, PhD
> Chair, Semantic Web SIG, sdforum.org
> http://web2express.org
> twitter @web2express
> Palo Alto, CA, USA

Re: Is SuperColumn necessary?

Reply via email to