in your implementation, is the comment still sorted by TIME? Will UTF8Type sort <TimeUUID>:author by time? thanks, -aj
On Mon, May 10, 2010 at 5:02 PM, Mike Malone <m...@simplegeo.com> wrote: > On Mon, May 10, 2010 at 4:31 PM, AJ Chen <ajc...@web2express.org> wrote: > >> supercolumn is good for modeling profile type of data. simple example is >> blog: >> blog { blog {author, title, ...} >> comments {time: commenter} //sort by TimeUUID >> } >> when retrieving a blog, you get all the comments sorted by time already. >> without supercolumn, you would need to concatenate multiple comment times >> together as you suggested. >> >> requiring user to concatenating data fields together is not only an extra >> burden on user but also a less clean design. there will be cases where the >> list property of a profile data is a long list (say a million items). in >> such cases, user wants to be able to directly insert/delete an item in that >> list because it's more efficient. Retrieving the whole list, updating it, >> concatenating again, and then putting it back to datastore is awkward and >> less efficient. >> > > There's nothing you said here that can't be implemented efficiently using > columns. You can slice rows and get a subset of Columns. In fact, this > example is particularly easy to implement. If you have a Blog with Entries > and Comments you'd do: > > <ColumnFamily Name="Blog" CompareWith="UTF8Type" /> > > Insert blog post: > batch_mutate(key=<blog post id>, [{name="~post:author", > value=<author>}, {name="~post:title", value=<title>, ...)) > Insert comment: > batch_mutate(key=<blog post id>, [{name=<TimeUUID> + ":author", ... }] > > Then you can get the Post only (slice for ["~", ""]), the comments only > (slice for ["", "~"]), or the post _and_ comments (slice for ["", ""]). > Inserting a comment does _not_ require a get/concatenate/insert. > > Yes, concatenating the names on the client side is hacky, clunky, and > inconvenient. That's why we _should_ build an interface that doesn't require > the client to concatenate names. But SuperColumns aren't the right way to do > it. They add no value. They could be implemented in client libraries, for > example, and nobody would know the difference. > > To really understand the problem with SuperColumns, though, you need to > look at the Cassandra source. Removing SuperColumns would make the code-base > much cleaner and tighter, and would probably reduce SLOC by 20%. I think a > replacement that assumed nested Columns (or Entries, or Thingies) would be > much cleaner. That's what Stu is working on. > > Mike > > On Mon, May 10, 2010 at 2:20 PM, Mike Malone <m...@simplegeo.com> wrote: >> >>> On Mon, May 10, 2010 at 1:38 PM, AJ Chen <ajc...@web2express.org> wrote: >>> >>>> Could someone confirm this discussion is not about abandoning >>>> supercolumn family? I have found modeling data with supercolumn family is >>>> actually an advantage of cassadra compared to relational database. Hope you >>>> are going to drop this important concept. How it's implemented internally >>>> is a different matter. >>>> >>> >>> SuperColumns are useful as a convenience mechanism. That's pretty much >>> it. There's _nothing_ (as far as I can tell) that you can do with >>> SuperColumns that you can't do by manually concatenating key names with a >>> separator on the client side and implementing a custom comparator on the >>> server (as ugly as that is). >>> >>> This discussion is about getting rid of SuperColumns and adding a more >>> generic mechanism that will actually be useful and interesting and will >>> continue to be convenient for the types of use cases for which people use >>> SuperColumns. >>> >>> If there's a particular use case that you feel you can only implement >>> with SuperColumns, please share! I honestly can't think of any. >>> >>> Mike >>> >>> >>>> On Mon, May 10, 2010 at 10:08 AM, Jonathan Shook <jsh...@gmail.com>wrote: >>>> >>>>> Agreed >>>>> >>>>> On Mon, May 10, 2010 at 12:01 PM, Mike Malone <m...@simplegeo.com> >>>>> wrote: >>>>> > On Mon, May 10, 2010 at 9:52 AM, Jonathan Shook <jsh...@gmail.com> >>>>> wrote: >>>>> >> >>>>> >> I have to disagree about the naming of things. The name of something >>>>> >> isn't just a literal identifier. It affects the way people think >>>>> about >>>>> >> it. For new users, the whole naming thing has been a persistent >>>>> >> barrier. >>>>> > >>>>> > I'm saying we shouldn't be worried too much about coming up with >>>>> names and >>>>> > analogies until we've decided what it is we're naming. >>>>> > >>>>> >> >>>>> >> As for your suggestions, I'm all for simplifying or generalizing the >>>>> >> "how it works" part down to a more generalized set of operations. >>>>> I'm >>>>> >> not sure it's a good idea to require users to think in terms >>>>> building >>>>> >> up a fluffy query structure just to thread it through a needle of an >>>>> >> API, even for the simplest of queries. At some point, the level of >>>>> >> generic boilerplate takes away from the semantic hand rails that >>>>> >> developers like. So I guess I'm suggesting that "how it works" and >>>>> >> "how we use it" are not always exactly the same. At least they >>>>> should >>>>> >> both hinge on a common conceptual model, which is where the naming >>>>> >> becomes an important anchoring point. >>>>> > >>>>> > If things are done properly, client libraries could expose simplified >>>>> query >>>>> > interfaces without much effort. Most ORMs these days work by building >>>>> a >>>>> > propositional directed acyclic graph that's serialized to SQL. This >>>>> would >>>>> > work the same way, but it wouldn't be converted into a 4GL. >>>>> > Mike >>>>> > >>>>> >> >>>>> >> Jonathan >>>>> >> >>>>> >> On Mon, May 10, 2010 at 11:37 AM, Mike Malone <m...@simplegeo.com> >>>>> wrote: >>>>> >> > Maybe... but honestly, it doesn't affect the architecture or >>>>> interface >>>>> >> > at >>>>> >> > all. I'm more interested in thinking about how the system should >>>>> work >>>>> >> > than >>>>> >> > what things are called. Naming things are important, but that can >>>>> happen >>>>> >> > later. >>>>> >> > Does anyone have any thoughts or comments on the architecture I >>>>> >> > suggested >>>>> >> > earlier? >>>>> >> > >>>>> >> > Mike >>>>> >> > >>>>> >> > On Mon, May 10, 2010 at 8:36 AM, Schubert Zhang < >>>>> zson...@gmail.com> >>>>> >> > wrote: >>>>> >> >> >>>>> >> >> Yes, the "column" here is not appropriate. >>>>> >> >> Maybe we need not to create new terms, in Google's Bigtable, the >>>>> term >>>>> >> >> "qualifier" is a good one. >>>>> >> >> >>>>> >> >> On Thu, May 6, 2010 at 3:04 PM, David Boxenhorn < >>>>> da...@lookin2.com> >>>>> >> >> wrote: >>>>> >> >>> >>>>> >> >>> That would be a good time to get rid of the confusing "column" >>>>> term, >>>>> >> >>> which incorrectly suggests a two-dimensional tabular structure. >>>>> >> >>> >>>>> >> >>> Suggestions: >>>>> >> >>> >>>>> >> >>> 1. A hypercube (or hypocube, if only two dimensions): replace >>>>> "key" >>>>> >> >>> and >>>>> >> >>> "column" with "1st dimension", "2nd dimension", etc. >>>>> >> >>> >>>>> >> >>> 2. A file system: replace "key" and "column" with "directory" >>>>> and >>>>> >> >>> "subdirectory" >>>>> >> >>> >>>>> >> >>> 3. A tuple tree: "Column family" replaced by top-level tuple, >>>>> whose >>>>> >> >>> value >>>>> >> >>> is the set of keys, whose value is the set of supercolumns of >>>>> the key, >>>>> >> >>> whose >>>>> >> >>> value is the set of columns for the supercolumn, etc. >>>>> >> >>> >>>>> >> >>> 4. Etc. >>>>> >> >>> >>>>> >> >>> On Thu, May 6, 2010 at 2:28 AM, Mike Malone <m...@simplegeo.com >>>>> > >>>>> >> >>> wrote: >>>>> >> >>>> >>>>> >> >>>> Nice, Ed, we're doing something very similar but less generic. >>>>> >> >>>> Now replace all of the various methods for querying with a >>>>> simple >>>>> >> >>>> query >>>>> >> >>>> interface that takes a Predicate, allow the user to specify (in >>>>> >> >>>> storage-conf) which levels of the nested Columns should be >>>>> indexed, >>>>> >> >>>> and >>>>> >> >>>> completely remove Comparators and have people subclass Column / >>>>> >> >>>> implement >>>>> >> >>>> IColumn and we'd really be on to something ;). >>>>> >> >>>> Mock storage-conf.xml: >>>>> >> >>>> <Column Name="ThingThatsNowKey" Indexed="True" >>>>> >> >>>> ClusterPartitioned="True" Type="UTF8"> >>>>> >> >>>> <Column Name="ThingThatsNowColumnFamily" >>>>> DiskPartitioned="True" >>>>> >> >>>> Type="UTF8"> >>>>> >> >>>> <Column Name="ThingThatsNowSuperColumnName" Type="Long"> >>>>> >> >>>> <Column Name="ThingThatsNowColumnName" Indexed="True" >>>>> >> >>>> Type="ASCII"> >>>>> >> >>>> <Column Name="ThingThatCantCurrentlyBeRepresented"/> >>>>> >> >>>> </Column> >>>>> >> >>>> </Column> >>>>> >> >>>> </Column> >>>>> >> >>>> </Column> >>>>> >> >>>> Thrift: >>>>> >> >>>> struct NamePredicate { >>>>> >> >>>> 1: required list<binary> column_names, >>>>> >> >>>> } >>>>> >> >>>> struct SlicePredicate { >>>>> >> >>>> 1: required binary start, >>>>> >> >>>> 2: required binary end, >>>>> >> >>>> } >>>>> >> >>>> struct CountPredicate { >>>>> >> >>>> 1: required struct predicate, >>>>> >> >>>> 2: required i32 count=100, >>>>> >> >>>> } >>>>> >> >>>> struct AndPredicate { >>>>> >> >>>> 1: required Predicate left, >>>>> >> >>>> 2: required Predicate right, >>>>> >> >>>> } >>>>> >> >>>> struct SubColumnsPredicate { >>>>> >> >>>> 1: required Predicate columns, >>>>> >> >>>> 2: required Predicate subcolumns, >>>>> >> >>>> } >>>>> >> >>>> ... OrPredicate, OtherUsefulPredicates ... >>>>> >> >>>> query(predicate, count, consistency_level) # Count here would >>>>> be >>>>> >> >>>> total >>>>> >> >>>> count of leaf values returned, whereas CountPredicate specifies >>>>> a >>>>> >> >>>> column >>>>> >> >>>> count for a particular sub-slice. >>>>> >> >>>> Not fully baked... but I think this could really simplify stuff >>>>> and >>>>> >> >>>> make >>>>> >> >>>> it more flexible. Downside is it may give people enough rope to >>>>> hang >>>>> >> >>>> themselves, but at least the predicate stuff is easily >>>>> distributable. >>>>> >> >>>> I'm thinking I'll play around with implementing some of this >>>>> stuff >>>>> >> >>>> myself if I have any free time in the near future. >>>>> >> >>>> Mike >>>>> >> >>>> >>>>> >> >>>> On Wed, May 5, 2010 at 2:04 PM, Jonathan Ellis < >>>>> jbel...@gmail.com> >>>>> >> >>>> wrote: >>>>> >> >>>>> >>>>> >> >>>>> Very interesting, thanks! >>>>> >> >>>>> >>>>> >> >>>>> On Wed, May 5, 2010 at 1:31 PM, Ed Anuff <e...@anuff.com> >>>>> wrote: >>>>> >> >>>>> > Follow-up from last weeks discussion, I've been playing >>>>> around >>>>> >> >>>>> > with a >>>>> >> >>>>> > simple >>>>> >> >>>>> > column comparator for composite column names that I put up >>>>> on >>>>> >> >>>>> > github. I'd >>>>> >> >>>>> > be interested to hear what people think of this approach. >>>>> >> >>>>> > >>>>> >> >>>>> > http://github.com/edanuff/CassandraCompositeType >>>>> >> >>>>> > >>>>> >> >>>>> > Ed >>>>> >> >>>>> > >>>>> >> >>>>> > On Wed, Apr 28, 2010 at 12:52 PM, Ed Anuff <e...@anuff.com> >>>>> wrote: >>>>> >> >>>>> >> >>>>> >> >>>>> >> It might make sense to create a CompositeType subclass of >>>>> >> >>>>> >> AbstractType for >>>>> >> >>>>> >> the purpose of constructing and comparing these types of >>>>> >> >>>>> >> "composite" >>>>> >> >>>>> >> column >>>>> >> >>>>> >> names so that if you could more easily do that sort of >>>>> thing >>>>> >> >>>>> >> rather >>>>> >> >>>>> >> than >>>>> >> >>>>> >> having to concatenate into one big string. >>>>> >> >>>>> >> >>>>> >> >>>>> >> On Wed, Apr 28, 2010 at 10:25 AM, Mike Malone >>>>> >> >>>>> >> <m...@simplegeo.com> >>>>> >> >>>>> >> wrote: >>>>> >> >>>>> >>> >>>>> >> >>>>> >>> The only thing SuperColumns appear to buy you (as someone >>>>> >> >>>>> >>> pointed >>>>> >> >>>>> >>> out to >>>>> >> >>>>> >>> me at the Cassandra meetup - I think it was Eric >>>>> Florenzano) is >>>>> >> >>>>> >>> that you can >>>>> >> >>>>> >>> use different comparator types for the Super/SubColumns, I >>>>> >> >>>>> >>> guess..? >>>>> >> >>>>> >>> But you >>>>> >> >>>>> >>> should be able to do the same thing by creating your own >>>>> Column >>>>> >> >>>>> >>> comparator. >>>>> >> >>>>> >>> I guess my point is that SuperColumns are mostly a >>>>> convenience >>>>> >> >>>>> >>> mechanism, as >>>>> >> >>>>> >>> far as I can tell. >>>>> >> >>>>> >>> Mike >>>>> >> >>>>> > >>>>> >> >>>>> > >>>>> >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> -- >>>>> >> >>>>> Jonathan Ellis >>>>> >> >>>>> Project Chair, Apache Cassandra >>>>> >> >>>>> co-founder of Riptano, the source for professional Cassandra >>>>> support >>>>> >> >>>>> http://riptano.com >>>>> >> >>>> >>>>> >> >>> >>>>> >> >> >>>>> >> > >>>>> >> > >>>>> > >>>>> > >>>>> >>>> >>>> >>>> >>>> -- >>>> AJ Chen, PhD >>>> Chair, Semantic Web SIG, sdforum.org >>>> http://web2express.org >>>> twitter @web2express >>>> Palo Alto, CA, USA >>>> >>> >>> >> >> >> -- >> AJ Chen, PhD >> Chair, Semantic Web SIG, sdforum.org >> http://web2express.org >> twitter @web2express >> Palo Alto, CA, USA >> > > -- AJ Chen, PhD Chair, Semantic Web SIG, sdforum.org http://web2express.org twitter @web2express Palo Alto, CA, USA