Note as I mentioned mid post, thrift also supports async nowadays (there was a recent discussion on cassandra dev and the choice was not to move to it)
I think the binary protocol is the way forward; CQL3 needs some new features, or there need to be some other types of requests you can make over the binary protocol On Jun 13, 2014, at 5:51 PM, Peter Lin <wool...@gmail.com> wrote: > > without a doubt there's nice features of CQL3 like notifications and async. I > want to see CQL3 mature and handle all the use cases that Thrift handles > easily today. It's to everyone's benefit to work together and improve CQL3. > > Other benefits of Thrift drivers today is being able to use object API with > generics. For tool builders, this is especially useful. Not everyone wants to > write tools, but I do so it matters to me. > > > > > On Fri, Jun 13, 2014 at 6:39 PM, Laing, Michael <michael.la...@nytimes.com> > wrote: > Just to add 2 more cents... :) > > The CQL3 protocol is asynchronous. This can provide a substantial throughput > increase, according to my benchmarking, when one uses non-blocking techniques. > > It is also peer-to-peer. Hence the server can generate events to send to the > client, e.g. schema changes - in general, 'triggers' become possible. > > ml > > > On Fri, Jun 13, 2014 at 6:21 PM, graham sanderson <gra...@vast.com> wrote: > My 2 cents… > > A motivation for CQL3 AFAIK was to make Cassandra more familiar to SQL users. > This is a valid goal, and works well in many cases. > Equally there are use cases (that some might find ugly) where Cassandra is > chosen explicitly because of the sorts of things you can do at the thrift > level, which aren’t (currently) exposed via CQL3 > > To Robert’s point earlier - "Rational people should presume that Thrift > support must eventually disappear”… he is probably right (though frankly I’d > rather the non-blocking thrift version was added instead). However if we do > get rid of the thrift interface, then it needs to be at a time that CQLn is > capable of expressing all the things you could do via the thrift API. Note, I > need to go look and see if the non-blocking thrift version also requires > materializing the entire thrift object in memory. > > On Jun 13, 2014, at 4:55 PM, DuyHai Doan <doanduy...@gmail.com> wrote: > >> There are always the pros and the cons with a querying language, as always. >> >> But as far as I can see, the advantages of Thrift I can see over CQL3 are: >> >> 1) Thrift require a little bit less decoding server-side (a difference >> around 10% in CPU usage). >> >> 2) Thrift use more "compact" storage because CQL3 need to add extra >> "marker" columns to guarantee the existence of primary key. It is worsen >> when you use clustering columns because for each distinct clustering group >> you have a related "marker" columns. >> >> That being said, point 1) is not really an issue since most of the time >> nodes are more I/O bound than CPU bound. Only in extreme cases where you >> have incredible read rate with data that fits entirely in memory that you >> may notice the difference. >> >> For point 2) this is a small trade-off to have access to a query language >> and being able to do slice queries using the WHERE clause. Some like it, >> other hate it, it's just a question of taste. Please note that the "waste" >> in disk space is somehow mitigated by compression. >> >> Long story short I think Thrift may have appropriate usage but only in very >> few use cases. Recently a lot of improvement and features have been added to >> CQL3 so that it shoud be considered as the first choice for most users and >> if they fall into those few use cases then switch back to Thrift >> >> My 2 cents >> >> >> >> >> >> >> On Fri, Jun 13, 2014 at 11:43 PM, Peter Lin <wool...@gmail.com> wrote: >> >> With text based query approach like CQL, you loose the type with dynamic >> columns. Yes, we're storing it as bytes, but it is simpler and easier with >> Thrift to do these types of things. >> >> I like CQL3 and what it does, but text based query languages make certain >> dynamic schema use cases painful. Having used and built ORM's they are >> poorly suited to dynamic schemas. If you've never had to write an ORM to >> handle dynamic user defined schemas at runtime, it's tough to see where the >> problems arise and how that makes life painful. >> >> Just to be clear, I'm not saying "don't use CQL3" or "CQL3 is bad". I'm >> saying CQL3 is good for certain kinds of use cases and Thrift is good at >> certain use cases. People need to look at what and how they're storing data >> and do what makes the most sense to them. Slavishly following CQL3 doesn't >> make any sense to me. >> >> >> >> On Fri, Jun 13, 2014 at 5:30 PM, DuyHai Doan <doanduy...@gmail.com> wrote: >> "the validation type is set to bytes, and my code is type safe, so it knows >> which serializers to use. Those dynamic columns are driven off the types in >> Java." --> Correct. However, you are still bound by the column comparator >> type which should be fixed (unless again you set it to bytes, in this case >> you loose the ordering and sorting feature) >> >> Basically what you are doing is telling Cassandra to save data in the cells >> as raw bytes, the serialization is taken care client side using the >> appropriate serializer. This is perfectly a valid strategy. >> >> But how is it different from using CQL3 and setting the value to "blob" >> (equivalent to bytes) and take care of the serialization client-side also ? >> You can even imagine saving value in JSON format and set the type to "text". >> >> Really, I don't see why CQL3 cannot achieve the scenario you describe. >> >> For the record, when you create a table in CQL3 as follow: >> >> CREATE TABLE user ( >> id bigint PRIMARY KEY, >> firstname text, >> lastname text, >> last_connection timestamp, >> ....); >> >> C* will create a column family with validation type = bytes to accommodate >> the timestamp and text types for the firstname, lastname and last_connection >> columns. Basically the CQL3 engine is doing the serialization server-side >> for you >> >> >> >> >> >> >> On Fri, Jun 13, 2014 at 11:19 PM, Peter Lin <wool...@gmail.com> wrote: >> >> the validation type is set to bytes, and my code is type safe, so it knows >> which serializers to use. Those dynamic columns are driven off the types in >> Java. >> >> Having said that, CQL3 does have a new custom type feature, but the >> documentation is basically non-existent on how that actually works. One >> could also modify CQL such that insert statements gives Cassandra hints >> about what type it is, but I'm not aware of anyone enhancing CQL3 to do that. >> >> I realize my kind of use case is a bit unique, but I do know of others that >> are doing similar kinds of things. >> >> >> >> >> On Fri, Jun 13, 2014 at 5:11 PM, DuyHai Doan <doanduy...@gmail.com> wrote: >> In thrift, when creating a column family, you need to define >> >> 1) the row/partition key type >> 2) the column comparator type >> 3) the validation type for the actual value (cell in CQL3 terminology) >> >> Unless you use "dynamic composites" feature, which does not exist (and >> probably won't) in CQL3, I don't see how you can have columns with >> "different types" on the same row/partition >> >> >> On Fri, Jun 13, 2014 at 11:06 PM, Peter Lin <wool...@gmail.com> wrote: >> >> when I say dynamic column, I mean non-static columns of different types >> within the same row. Some could be an object or one of the defined datatypes. >> >> with thrift I use the appropriate serializer to handle these dynamic columns. >> >> >> On Fri, Jun 13, 2014 at 4:55 PM, DuyHai Doan <doanduy...@gmail.com> wrote: >> Well, before talking and discussing about "dynamic columns", we should first >> define it clearly. What do people mean by "dynamic columns" exactly ? Is it >> the ability to add many columns "of same type" to an existing physical row? >> If yes then CQL3 does support it with clustering columns. >> >> >> On Fri, Jun 13, 2014 at 10:36 PM, Mark Greene <green...@gmail.com> wrote: >> Yeah I don't anticipate more than 1000 properties, well under in fact. I >> guess the trade off of using the clustered columns is that I'd have a table >> that would be tall and skinny which also has its challenges w/r/t memory. >> >> I'll look into your suggestion a bit more and consider some others around a >> hybrid of CQL and Thrift (where necssary). But from a newb's perspective, I >> sense the community is unsettled around this concept of truly dynamic >> columns. Coming from an HBase background, it's a consideration I didn't >> anticipate having to evaluate. >> >> >> -- >> about.me >> >> >> On Fri, Jun 13, 2014 at 4:19 PM, DuyHai Doan <doanduy...@gmail.com> wrote: >> Hi Mark >> >> I believe that in your table you want to have some "common" fields that >> will be there whatever customer is, and other fields that are entirely >> customer-dependent, isn't it ? >> >> In this case, creating a table with static columns for the common fields >> and a clustering column representing all custom fields defined by a customer >> could be a solution (see here for static column: >> https://issues.apache.org/jira/browse/CASSANDRA-6561 ) >> >> CREATE TABLE user_data ( >> user_id bigint, >> user_firstname text static, >> user_lastname text static, >> ... >> custom_property_name text, >> custom_property_value text, >> PRIMARY KEY(user_id, custom_property_name, custom_property_value)); >> >> Please note that with this solution you need to have "at least one" custom >> property per customer to make it work >> >> The only thing to take care of is the type of custom_property_value. You >> need to define it once for all. To accommodate for dynamic types, you can >> either save the value as blob or text(as JSON) and take care of the >> serialization/deserialization yourself at the client side >> >> As an alternative you can save custom properties in a map, provided that >> their number is not too large. But considering the business case of CRM, I >> believe that it's quite rare and user has more than 1000 custom properties >> isn't it ? >> >> >> >> On Fri, Jun 13, 2014 at 10:03 PM, Mark Greene <green...@gmail.com> wrote: >> My use case requires the support of arbitrary columns much like a CRM. My >> users can define 'custom' fields within the application. Ideally I wouldn't >> have to change the schema at all, which is why I like the old thrift >> approach rather than the CQL approach. >> >> Having said all that, I'd be willing to adapt my API to make explicit schema >> changes to Cassandra whenever my user makes a change to their custom fields >> if that's an accepted practice. >> >> Ultimately, I'm trying to figure out of the Cassandra community intends to >> support true schemaless use cases in the future. >> >> -- >> about.me >> >> >> On Fri, Jun 13, 2014 at 3:47 PM, DuyHai Doan <doanduy...@gmail.com> wrote: >> This strikes me as bad practice in the world of multi tenant systems. I >> don't want to create a table per customer. So I'm wondering if dynamically >> modifying the table is an accepted practice? --> Can you give some details >> about your use case ? How would you "alter" a table structure to adapt it to >> a new customer ? >> >> Wouldn't it be better to model your table so that it supports >> addition/removal of customer ? >> >> >> >> On Fri, Jun 13, 2014 at 9:00 PM, Mark Greene <green...@gmail.com> wrote: >> Thanks DuyHai, >> >> I have a follow up question to #2. You mentioned ideally I would create a >> new table instead of mutating an existing one. >> >> This strikes me as bad practice in the world of multi tenant systems. I >> don't want to create a table per customer. So I'm wondering if dynamically >> modifying the table is an accepted practice? >> >> -- >> about.me >> >> >> On Fri, Jun 13, 2014 at 2:54 PM, DuyHai Doan <doanduy...@gmail.com> wrote: >> Hello Mark >> >> Dynamic columns, as you said, are perfectly supported by CQL3 via >> clustering columns. And no, using collections for storing dynamic data is a >> very bad idea if the cardinality is very high (>> 1000 elements) >> >> 1) Is using Thrift a valid approach in the era of CQL? --> Less and less. >> Unless you are looking for extreme performance, you'd better off choosing >> CQL3. The ease of programming and querying with CQL3 does worth the small >> overhead in CPU >> >> 2) If CQL is the best practice, should I alter the schema at runtime when I >> detect I need to do an schema mutation? --> Ideally you should not alter >> schema but create a new table to adapt to your changing requirements. >> >> 3) If I utilize CQL collections, will Cassandra page the entire thing into >> the heap? --> Of course. All collections and maps in Cassandra are eagerly >> loaded entirely in memory on server side. That's why it is recommended to >> limit their cardinality to ~ 1000 elements >> >> >> >> >> On Fri, Jun 13, 2014 at 8:33 PM, Mark Greene <green...@gmail.com> wrote: >> I'm looking for some best practices w/r/t supporting arbitrary columns. It >> seems from the docs I've read around CQL that they are supported in some >> capacity via collections but you can't exceed 64K in size. For my >> requirements that would cause problems. >> >> So my questions are: >> >> 1) Is using Thrift a valid approach in the era of CQL? >> >> 2) If CQL is the best practice, should I alter the schema at runtime when I >> detect I need to do an schema mutation? >> >> 3) If I utilize CQL collections, will Cassandra page the entire thing into >> the heap? >> >> My data model is akin to a CRM, arbitrary column definitions per customer. >> >> >> Cheers, >> Mark >> >> >> >> >> >> >> >> >> >> >> >> >> > > >
smime.p7s
Description: S/MIME cryptographic signature