Re: Proposal: freeze Thrift starting with 2.1.0

Edward Capriolo Wed, 12 Mar 2014 09:11:28 -0700

Great points about the CQL driver and the supposed spec. It shows how a
driver living outside the project poses a problem to open source
development. How could custom types have been implemented without a spec?
In the apache world the saying is "If it did not happen on the list, it did
not happen." Did that happen here?


I still do not understand how and open source apache java database can rely
on third party client software to connect to said database. However the
committers seem comfortable with this arrangement to the point they are
willing to remove support for the other way to connect to the database.

Again, I am glad that the project has officially ended support for thrift
with this clear decree. For years the project kept saying "Thrift is not
going anywhere". It was obviously meant literally like the project would do
the absolute minimum to support it until they could make the case to remove
it completely.




On Wed, Mar 12, 2014 at 11:20 AM, Theo Hultberg <t...@iconara.net> wrote:

> Speaking as a CQL driver maintainer (Ruby) I'm +1 for end-of-lining Thrift.
>
> I agree with Edward that it's unfortunate that there are no official
> drivers being maintained by the Cassandra maintainers -- even though the
> current state with the Datastax drivers is in practice very close (it is
> not the same thing though).
>
> However, I don't agree that not having drivers in the same repo/project is
> a problem. Whether or not there's a Java driver in the Cassandra source or
> not doesn't matter at all to us non-Java developers, and I don't see any
> difference between the situation where there's no driver in the source or
> just a Java driver. I might have misunderstood Edwards point about this,
> though.
>
> The CQL protocol is the key, as others have mentioned. As long as that is
> maintained, and respected I think it's absolutely fine not having any
> drivers shipped as part of Cassandra. However, I feel as this has not been
> the case lately. I'm thinking particularly about the UDT feature of 2.1,
> which is not a part of the CQL spec. There is no documentation on how
> drivers should handle them and what a user should be able to expect from a
> driver, they're completely implemented as custom types.
>
> I hope this will be fixed before 2.1 is released (and there's been good
> discussions on the mailing lists about how a driver should handle UDTs),
> but it shows a problem with the the-spec-is-the-thruth argument. I think
> we'll be fine as long as the spec is the truth, but that requires the spec
> to be the truth and new features to not be bolted on outside of the spec.
>
> T#
>
>
> On Wed, Mar 12, 2014 at 3:23 PM, Peter Lin <wool...@gmail.com> wrote:
>
>> I'm enjoying the discussion also.
>>
>> @Brian
>> I've been looking at spark/shark along with other recent developments the
>> last few years. Berkeley has been doing some interesting stuff. One reason
>> I like Thrift is for type safety and the benefits for query validation and
>> query optimization. One could do similar things with CQL, but it's just
>> more work, especially with dynamic columns. I know others are mixing static
>> with dynamic columns, so I'm not alone. I have no clue how long it will
>> take to get there, but having tools like query explanation is a big time
>> saver. Writing business reports is hard enough, so every bit of help the
>> tool can provide makes it less painful.
>>
>>
>> On Wed, Mar 12, 2014 at 10:12 AM, Brian O'Neill <b...@alumni.brown.edu>wrote:
>>
>>>
>>> just when you thought the thread died...
>>>
>>>
>>> First, let me say we are *WAY* off topic.  But that is a good thing.
>>> I love this community because there are a ton of passionate, smart
>>> people. (often with differing perspectives ;)
>>>
>>> RE: Reporting against C* (@Peter Lin)
>>> We've had the same experience.  Pig + Hadoop is painful.  We are
>>> experimenting with Spark/Shark, operating directly against the data.
>>>
>>> http://brianoneill.blogspot.com/2014/03/spark-on-cassandra-w-calliope.html
>>>
>>> The Shark layer gives you SQL and caching capabilities that make it easy
>>> to use and fast (for smaller data sets).  In front of this, we are going to
>>> add dimensional aggregations so we can operate at larger scales.  (then the
>>> Hive reports will run against the aggregations)
>>>
>>> RE: REST Server (@Russel Bradbury)
>>> We had moderate success with Virgil, which was a REST server built
>>> directly on Thrift.  We built it directly on top of Thrift, so one day it
>>> could be easily embedded in the C* server itself.   It could be deployed
>>> separately, or run an embedded C*.  More often than not, we ended up
>>> running it separately to separate the layers.  (just like Titan and
>>> Rexster)  I've started on a rewrite of Virgil called Memnon that rides on
>>> top of CQL. (I'd love some help)
>>> https://github.com/boneill42/memnon
>>>
>>> RE: CQL vs. Thrift
>>> We've hitched our wagons to CQL.  CQL != Relational.
>>> We've had success translating our "native" schemas into CQL, including
>>> all the NoSQL goodness of wide-rows, etc.  You just need a good
>>> understanding of how things translate into storage and underlying CFs.  If
>>> anything, I think we could add some DESCRIBE information, which would help
>>> users with this, along the lines of:
>>> (https://issues.apache.org/jira/browse/CASSANDRA-6676)
>>>
>>> CQL does open up the *opportunity* for users to articulate more complex
>>> queries using more familiar syntax.  (including future things such as
>>> joins, grouping, etc.)   To me, that is exciting, and again -- one of the
>>> reasons we are leaning on it.
>>>
>>> my two cents,
>>> brian
>>>
>>> ---
>>>
>>> Brian O'Neill
>>>
>>> Chief Technology Officer
>>>
>>>
>>> *Health Market Science*
>>>
>>> *The Science of Better Results*
>>>
>>> 2700 Horizon Drive * King of Prussia, PA * 19406
>>>
>>> M: 215.588.6024 * @boneill42 <http://www.twitter.com/boneill42>  *
>>>
>>> healthmarketscience.com
>>>
>>>
>>> This information transmitted in this email message is for the intended
>>> recipient only and may contain confidential and/or privileged material. If
>>> you received this email in error and are not the intended recipient, or the
>>> person responsible to deliver it to the intended recipient, please contact
>>> the sender at the email above and delete this email and any attachments and
>>> destroy any copies thereof. Any review, retransmission, dissemination,
>>> copying or other use of, or taking any action in reliance upon, this
>>> information by persons or entities other than the intended recipient is
>>> strictly prohibited.
>>>
>>>
>>>
>>>
>>> From: Peter Lin <wool...@gmail.com>
>>> Reply-To: <user@cassandra.apache.org>
>>> Date: Wednesday, March 12, 2014 at 8:44 AM
>>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>>> Subject: Re: Proposal: freeze Thrift starting with 2.1.0
>>>
>>>
>>> yes, I was looking at intravert last nite.
>>>
>>> For the kinds of reports my customers ask us to do, joins and subqueries
>>> are important. Having tried to do a simple join in PIG, the level of pain
>>> is  high. I'm a masochist, so I don't mind breaking a simple join into
>>> multiple MR tasks, though I do find myself asking "why the hell does it
>>> need to be so painful in PIG?" Many of my friends say "what is this crap!"
>>> or "this is better than writing sql queries to run reports?"
>>>
>>> Plus, using ETL techniques to extract summaries only works for cases
>>> where the data is small enough. Once it gets beyond a certain size, it's
>>> not practical, which means we're back to crappy reporting languages that
>>> make life painful. Lots of big healthcare companies have thousands of MOLAP
>>> cubes on dozens of mainframes. The old OLTP -> DW/OLAP creates it's own set
>>> of management headaches.
>>>
>>> being able to report directly on the raw data avoids many of the issues,
>>> but that's my bias perspective.
>>>
>>>
>>>
>>>
>>> On Wed, Mar 12, 2014 at 8:15 AM, DuyHai Doan <doanduy...@gmail.com>wrote:
>>>
>>>> "I would love to see Cassandra get to the point where users can define
>>>> complex queries with subqueries, like, group by and joins" --> Did you have
>>>> a look at Intravert ? I think it does union & intersection on server side
>>>> for you. Not sure about join though..
>>>>
>>>>
>>>> On Wed, Mar 12, 2014 at 12:44 PM, Peter Lin <wool...@gmail.com> wrote:
>>>>
>>>>>
>>>>> Hi Ed,
>>>>>
>>>>> I agree Solr is deeply integrated into DSE. I've looked at Solandra in
>>>>> the past and studied the code.
>>>>>
>>>>> My understanding is DSE uses Cassandra for storage and the user has
>>>>> both API available. I do think it can be integrated further to make
>>>>> moderate to complex queries easier and probably faster. That's why we 
>>>>> built
>>>>> our own JPA-like object query API. I would love to see Cassandra get to 
>>>>> the
>>>>> point where users can define complex queries with subqueries, like, group
>>>>> by and joins. Clearly lots of people want these features and even google
>>>>> built their own tools to do these types of queries.
>>>>>
>>>>> I see lots of people trying to improve this with Presto, Impala,
>>>>> drill, etc. To me, it's a natural progression as NoSql databases mature.
>>>>> For most people, at some point you want to be able to report/analyze the
>>>>> data. Today some people use MapReduce to summarize the data and ETL it 
>>>>> into
>>>>> a relational database or OLAP database for reporting. Even though I don't
>>>>> need CAS or atomic batch for what I do in cassandra today, I'm sure in the
>>>>> future it will be handy. From my experience in the financial and insurance
>>>>> sector, features like CAS and "select for update" are important for the
>>>>> kinds of transactions they handle. I'm bias, these kinds of features are
>>>>> useful and good addition to cassandra.
>>>>>
>>>>> These are interesting times in database land!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 11, 2014 at 10:57 PM, Edward Capriolo <
>>>>> edlinuxg...@gmail.com> wrote:
>>>>>
>>>>>> Peter,
>>>>>> Solr is deeply integrated into DSE. Seemingly this can not
>>>>>> efficiently be done client side (CQL/Thrift whatever) but the Solandra
>>>>>> approach was to embed Solr in Cassandra. I think that is actually the
>>>>>> future client dev, allowing users to embedded custom server side logic 
>>>>>> into
>>>>>> there own API.
>>>>>>
>>>>>> Things like this take a while. Back in the day no one wanted
>>>>>> cassandra to be heavy-weight and rejected ideas like read-before write
>>>>>> operations. The common advice was "do them client side". Now in the case 
>>>>>> of
>>>>>> collections sometimes they do read-before-write and it is the "stuff 
>>>>>> users
>>>>>> want".
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 11, 2014 at 10:07 PM, Peter Lin <wool...@gmail.com>wrote:
>>>>>>
>>>>>>>
>>>>>>> I'll give you a concrete example.
>>>>>>>
>>>>>>> One of the things we often need to do is do a keyword search on
>>>>>>> unstructured text. What we did in our tooling is we combined solr with
>>>>>>> cassandra, but we put an Object API infront of it. The API is inspired 
>>>>>>> by
>>>>>>> JPA, but designed specifically to fit our needs.
>>>>>>>
>>>>>>> the user can do queries with like %blah% and behind the scenes we
>>>>>>> issues a query to solr to find the keys and then query cassandra for the
>>>>>>> records.
>>>>>>>
>>>>>>> With plain Cassandra, the developer has to manually do all of this
>>>>>>> stuff and integrate solr. Then they have to know which system to query 
>>>>>>> and
>>>>>>> in what order.  Our tooling lets the user define the schema in a 
>>>>>>> modeler.
>>>>>>> Once the model is done, it compiles the classes, configuration files, 
>>>>>>> data
>>>>>>> access objects and unit tests.
>>>>>>>
>>>>>>> when the application makes a call, our query classes handle the
>>>>>>> details behind the scene. I know lots of people would like to see Solr
>>>>>>> integrated more deeply into Cassandra and CQL. I hope it happens in the
>>>>>>> future. If DataStax accepts my talk, we will be showing our temporal
>>>>>>> database and modeler in september.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Mar 11, 2014 at 9:54 PM, Steven A Robenalt <
>>>>>>> srobe...@stanford.edu> wrote:
>>>>>>>
>>>>>>>> I should add that I'm not trying to ignite a flame war. Just trying
>>>>>>>> to understand your intentions.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 11, 2014 at 6:50 PM, Steven A Robenalt <
>>>>>>>> srobe...@stanford.edu> wrote:
>>>>>>>>
>>>>>>>>> Okay, I'm officially lost on this thread. If you plan on forking
>>>>>>>>> Cassandra to preserve and continue to enhance the Thrift interface, 
>>>>>>>>> you
>>>>>>>>> would also want to add a bunch of relational features to CQL as part 
>>>>>>>>> of
>>>>>>>>> that same fork?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Mar 11, 2014 at 6:20 PM, Edward Capriolo <
>>>>>>>>> edlinuxg...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> "one of the things I'd like to see happen is for Cassandra to
>>>>>>>>>> support queries with disjunction, exist, subqueries, joins and like. 
>>>>>>>>>> In
>>>>>>>>>> theory CQL could support these features in the future. Cassandra 
>>>>>>>>>> would need
>>>>>>>>>> a new query compiler and query planner. I don't see how the current 
>>>>>>>>>> design
>>>>>>>>>> could do these things without a significant redesign/enhancement. In 
>>>>>>>>>> a past
>>>>>>>>>> life, I implemented an inference rule engine, so I've spent over 
>>>>>>>>>> decade
>>>>>>>>>> studying and implementing query optimizers. All of these things can 
>>>>>>>>>> be
>>>>>>>>>> done, it's just a matter of people finding the time to do it."
>>>>>>>>>>
>>>>>>>>>> I see what your saying. CQL started as a way to make slice easier
>>>>>>>>>> but it is not even a query language, retrofitting these things is 
>>>>>>>>>> going to
>>>>>>>>>> be very hard.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 11, 2014 at 7:45 PM, Peter Lin <wool...@gmail.com>wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I have no problems maintain my own fork :) or joining others
>>>>>>>>>>> forking cassandra.
>>>>>>>>>>>
>>>>>>>>>>> I'd be happy to work with you or anyone else to add features to
>>>>>>>>>>> thrift. That's the great thing about open source. Each person can 
>>>>>>>>>>> scratch a
>>>>>>>>>>> technical itch and do what they love. I see lots of potential for 
>>>>>>>>>>> Cassandra
>>>>>>>>>>> and many of them include improving thrift to make it happen. Some 
>>>>>>>>>>> of the
>>>>>>>>>>> features in theory "could" be done in CQL, but not with the current 
>>>>>>>>>>> design.
>>>>>>>>>>>
>>>>>>>>>>> one of the things I'd like to see happen is for Cassandra to
>>>>>>>>>>> support queries with disjunction, exist, subqueries, joins and 
>>>>>>>>>>> like. In
>>>>>>>>>>> theory CQL could support these features in the future. Cassandra 
>>>>>>>>>>> would need
>>>>>>>>>>> a new query compiler and query planner. I don't see how the current 
>>>>>>>>>>> design
>>>>>>>>>>> could do these things without a significant redesign/enhancement. 
>>>>>>>>>>> In a past
>>>>>>>>>>> life, I implemented an inference rule engine, so I've spent over 
>>>>>>>>>>> decade
>>>>>>>>>>> studying and implementing query optimizers. All of these things can 
>>>>>>>>>>> be
>>>>>>>>>>> done, it's just a matter of people finding the time to do it.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 11, 2014 at 6:17 PM, Edward Capriolo <
>>>>>>>>>>> edlinuxg...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Peter,
>>>>>>>>>>>>
>>>>>>>>>>>> My advice. Do not bother. I have become very active recently in
>>>>>>>>>>>> attempting to add features to thrift. I had 4 open tickets I was 
>>>>>>>>>>>> actively
>>>>>>>>>>>> working on. (I even found two bugs in the Cassandra in the 
>>>>>>>>>>>> process).
>>>>>>>>>>>>
>>>>>>>>>>>> People were aware of this and still called this vote. Several
>>>>>>>>>>>> commit people have voted in a +1 and my -1 vote is non binding. It 
>>>>>>>>>>>> is a
>>>>>>>>>>>> clear message: The committers are unwilling to accept new thrift 
>>>>>>>>>>>> features
>>>>>>>>>>>> even if said features are contributed by others.
>>>>>>>>>>>>
>>>>>>>>>>>> Edward
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Mar 11, 2014 at 5:51 PM, Peter Lin 
>>>>>>>>>>>> <wool...@gmail.com>wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> My bias opinion, just because some member of cassandra develop
>>>>>>>>>>>>> want to abandon Thrift, I see benefits of continuing to improve 
>>>>>>>>>>>>> it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The great thing about open source is that as long as some
>>>>>>>>>>>>> people want to keep working on it and improve it, it can happen. 
>>>>>>>>>>>>> I plan to
>>>>>>>>>>>>> do my best to keep Thrift going, since it gives me fine grain 
>>>>>>>>>>>>> control that
>>>>>>>>>>>>> I want and need. If the ultimate goal of Cassandra is to be "as 
>>>>>>>>>>>>> close to
>>>>>>>>>>>>> SQL" as practical, my bias take is use a NewSQL database that 
>>>>>>>>>>>>> gives you the
>>>>>>>>>>>>> full power of subqueries, like, exists and disjunction.
>>>>>>>>>>>>>
>>>>>>>>>>>>> When customers ask me which database to choose and they really
>>>>>>>>>>>>> want Relational model, I tell them use NewSql. I love that 
>>>>>>>>>>>>> Cassandra sits
>>>>>>>>>>>>> between NoSql and NewSql. There are things I do in Cassandra 
>>>>>>>>>>>>> today that are
>>>>>>>>>>>>> much harder in NewSql or NoSql document databases. NewSql 
>>>>>>>>>>>>> database can
>>>>>>>>>>>>> scale to similar sizes, so the "big" part of big data won't be a
>>>>>>>>>>>>> significant advantage forever. Looking at some of the recent 
>>>>>>>>>>>>> NewSql
>>>>>>>>>>>>> performance numbers, it's clear the gap is closing.
>>>>>>>>>>>>>
>>>>>>>>>>>>> peter
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Mar 11, 2014 at 3:59 PM, Tyler Hobbs <
>>>>>>>>>>>>> ty...@datastax.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Mar 11, 2014 at 2:41 PM, Shao-Chuan Wang <
>>>>>>>>>>>>>> shaochuan.w...@bloomreach.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So, does anyone know how to do "describing the splits" and
>>>>>>>>>>>>>>> "describing the local rings" using native protocol?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For a ring description, you would do something like "select
>>>>>>>>>>>>>> peer, tokens from system.peers".  I'm not sure about 
>>>>>>>>>>>>>> describe_splits().
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also, cqlsh uses python client, which is talking via thrift
>>>>>>>>>>>>>>> protocol too. Does it mean that it will be migrated to native 
>>>>>>>>>>>>>>> protocol soon
>>>>>>>>>>>>>>> as well?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes: https://issues.apache.org/jira/browse/CASSANDRA-6307
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Tyler Hobbs
>>>>>>>>>>>>>> DataStax <http://datastax.com/>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Steve Robenalt
>>>>>>>>> Software Architect
>>>>>>>>> HighWire | Stanford University
>>>>>>>>> 425 Broadway St, Redwood City, CA 94063
>>>>>>>>>
>>>>>>>>> srobe...@stanford.edu
>>>>>>>>> http://highwire.stanford.edu
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Steve Robenalt
>>>>>>>> Software Architect
>>>>>>>> HighWire | Stanford University
>>>>>>>> 425 Broadway St, Redwood City, CA 94063
>>>>>>>>
>>>>>>>> srobe...@stanford.edu
>>>>>>>> http://highwire.stanford.edu
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Proposal: freeze Thrift starting with 2.1.0

Reply via email to