Re: Cassandra data model right definition

2017-03-29 Thread Oskar Kjellin
It's not that easy as I recall this email thread 
https://groups.google.com/forum/m/#!topic/nosql-databases/ZLdgwCT_PNU

/Oskar 

> On 30 Sep 2016, at 18:40, Carlos Alonso  wrote:
> 
> Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra
> 
> Carlos Alonso | Software Engineer | @calonso
> 
>> On 30 September 2016 at 18:24, Mehdi Bada  
>> wrote:
>> Hi all, 
>> 
>> I have a theoritical question: 
>> - Is Apache Cassandra really a column store?
>> Column store mean storing the data as column rather than as a rows. 
>> 
>> In fact C* store the data as row, and data is partionned with row key.
>> 
>> Finally, for me, Cassandra is a row oriented schema less DBMS Is it true 
>> for you also???
>> 
>> Many thanks in advance for your reply
>> 
>> Best Regards 
>> Mehdi Bada
>> 
>> 
>> Mehdi Bada | Consultant
>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>> mehdi.b...@dbi-services.com
>> www.dbi-services.com
>> 
>> 
>> 
>> 
>> ⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team
> 


Re: Cassandra data model right definition

2016-10-14 Thread selcuk mart

unsubscribe


3.10.2016 16:25 tarihinde Edward Capriolo yazdı:
The phrase is defensible, but that is the root of the problem. Take 
for example a skateboard.


"A skateboard is like a bike because it has wheels and you ride on it."

That is true and defensively true. :) However with not much more text 
you can accurately describe what it is, as opposed to something it is 
almost like.


"A skateboard is a thin piece of wood on top of four small wheels that 
you stand on and ride"


The old sentence Cassandra statement was something to the effect of 
"with the storage model of big table and the consistency model of 
dynamo". This accurately described the system and gave reference to 
specific known quantities (bigtable/dynamo) in which white papers 
existed for further reading.


On Mon, Oct 3, 2016 at 6:24 AM, Benedict Elliott Smith 
> wrote:


While that sentence leaves a lot to be desired (for me because it
confers a different meaning on row store), it doesn't say
"Cassandra is like a RDBMS" - it says "like an RDBMS, it organises
data by rows and columns" - i.e., in this regard only it is like
an RDBMS, not more generally.

I believe it was meant to help people, especially those afraid of
the NoSQL thrift world, understand that it still uses the basic
concept of a rows and columns they are used to.  I agree it could
be improved to minimise the chance of misreading it, and I'm
certain contributions would be welcome here.

I don't personally want to get bogged down in analysing every
piece of text anyone has ever written, so I'll bow out of further
discussion on this.  These phrases may all be suboptimal, but they
are certainly defensible.  Column store is not, that's all I
wanted to contribute here.





On 1 October 2016 at 19:35, Peter Lin > wrote:

I'll second Ed's comment.

The documentation should be more careful when using phrases
"like relational databases". When we look at the history of
relational databases, people expect certain things like ACID
transactions, primary/foriegn key constraints, query planners,
joins and relational algebra. Clearly Cassandra's storage
engine does not follow most of those principals for a good reason.

The term row oriented storage would be more descriptive and
appropriate. It avoids conflating Cassandra storage engine
with "traditional" relational storage engines. Those of us
that have spent over a decade using IBM DB2, Oracle, Sql
Server and Sybase tend to think of relational databases in a
certain way. If we go back to 1998, most RDBMS storage engine
had a max row size limit. Databases like Sybase before version
9 preferred RAW disk for optimal performance. I can go on and
on, but there's no point really.

Cassandra's storage engine is "row oriented", but it's not
relational in RDBMS sense. We do everyone a huge disservice by
using confusing terminology and then making fun of those who
get confused. No one wins when that happens. At the end of the
day, what differentiates cassandra's storage engine is it
support static and dynamic columns, which traditional RDBMS
don't support today. Calling Cassandra storage "distributed
tables" doesn't really help in my bias opinion.

For example, if you tell a SqlServer or Oracle RAC admin
"cassandra uses distributed tables" they might answer "so
what, sql server and oracle can do that too." The difference
is with RDBMS the partitioning is optional and requires more
work to configure. Whereas with Cassandra you can have
everything in 1 node, which means there is only 1 partition
and no different to 1 instance of sql server. Where you win is
when you need to add 2 more nodes, Cassandra makes this easier
whereas with SqlServer and Oracle you have to do a little bit
more work. I've lost count of how many times I've to explained
noSql databases to RDBMS admins and had to explain the
official docs are stupid.



On Sat, Oct 1, 2016 at 11:31 AM, Edward Capriolo
> wrote:

https://github.com/apache/cassandra


Row store
 means that
like relational databases, Cassandra organizes data by
rows and columns. The Cassandra Query Language (CQL) is a
close relative of SQL.

I generally do not know what to say about these high level
"oversimplifications" like "firewalls block hackers". Are
there "firewalls" or do they mean IP routers 

Re: Cassandra data model right definition

2016-10-04 Thread Mehdi Bada
Hi all, 

Just to refocus the debat (because I'm the at the origin of this very 
interesting exchanges). 
I think for a good understanding of the data model of any DMBS, we have 
(technical experts) to decompose the data objects of the model and understand 
how the data is precisely stored and what kind of mechanisms is used. 
In this way, I think, Russell has describe very well the situation, and we can 
said that Apache Cassandra data model can be defined as a Partitioned Row Store 
. 

Many thanks for your all feedbacks and contribution 

Best Regards 
Mehdi Bada 

--- 
Mehdi Bada | Consultant 
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 499 96 15 
dbi services, Rue de la Jeunesse 2, CH-2800 Delémont 
mehdi.b...@dbi-services.com 
www.dbi-services.com 




From: "Edward Capriolo" <edlinuxg...@gmail.com> 
To: "user" <user@cassandra.apache.org> 
Sent: Monday, October 3, 2016 4:53:16 PM 
Subject: Re: Cassandra data model right definition 

My original point can be summed up as: 

Do not define cassandra in terms SMILES & METAPHORS. Such words include "like" 
and "close relative". 

For the specifics: 

Any relational db could (and I'm sure one does!) allow for sparse fields as 
well. MySQL can be backed by rocksdb now, does that make it not a row store? 

Lets draw some lines, a relational database is clearly defined. 

https://en.wikipedia.org/wiki/Edgar_F._Codd 


Codd's theorem , a result proven in his seminal work on the relational model, 
equates the expressive power of relational algebra and relational calculus 
(both of which, lacking recursion, are strictly less powerful than first-order 
logic ). [ citation needed ] 

As the relational model started to become fashionable in the early 1980s, Codd 
fought a sometimes bitter campaign to prevent the term being misused by 
database vendors who had merely added a relational veneer to older technology. 
As part of this campaign, he published his 12 rules to define what constituted 
a relational database. This made his position in IBM increasingly difficult, so 
he left to form his own consulting company with Chris Date and others. 

Cassandra is not a relational database. 



I am have attempted to illustrate that a "row store" is defined as well. I do 
not believe Cassandra is a "row store". 

" Just because it uses log structured storage, sparse fields, and semi-flexible 
collections doesn't disqualify it from calling it a "row store"" 

What is the definition of "row store". Is it a logical construct or a physical 
one? 

Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and 
present it as rows and columns. It seems to pass the litmus test being 
presented. 

https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage 







On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad < j...@jonhaddad.com > wrote: 


Sorry Ed, but you're really stretching here. A table in Cassandra is structured 
by a schema with the data for each row stored together in each data file. Just 
because it uses log structured storage, sparse fields, and semi-flexible 
collections doesn't disqualify it from calling it a "row store" 

Postgres added flexible storage through hstore, I don't hear anyone arguing 
that it needs to be renamed. 

Any relational db could (and I'm sure one does!) allow for sparse fields as 
well. MySQL can be backed by rocksdb now, does that make it not a row store? 

You're arguing that everything is wrong but you're not proposing an 
alternative, which is not productive. 
On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo < edlinuxg...@gmail.com > wrote: 

BQ_BEGIN

Also every piece of techincal information that describes a rowstore 

http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf 
https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems 

Does it like this: 
001:10,Smith,Joe,4;
002:12,Jones,Mary,5;
003:11,Johnson,Cathy,44000;
004:22,Jones,Bob,55000; 


The never depict a scenario where a the data looks like this on disk: 

001:10,Smith 
001:10,4; 
Which is much closer to how Cassandra stores it's data. 



On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith < bened...@apache.org > 
wrote: 

BQ_BEGIN

Absolutely. A "partitioned row store" is exactly what I would call it. As it 
happens, our README thinks the same, which is fantastic. 

I thought I'd take a look at the rest of our cohort, and didn't get far before 
disappointment. HBase literally calls itself a " column-oriented store" - which 
is so totally wrong it's simultaneously hilarious and tragic. 

I guess we can't blame the wider internet for misunderstanding/misnaming us 
poor "wide column stores" if even one of the major examples doesn't know what 
it, itself, is! 




On 30 September 2016 at 21:47, Jonathan Haddad < j...@jonhaddad.com 

Re: Cassandra data model right definition

2016-10-03 Thread Benedict Elliott Smith
I did not ascribe blame.  I only empathised with their predicament;  I
don't want to listen to either of us, either!





On 3 October 2016 at 19:45, Edward Capriolo  wrote:

> You know what don't "go low" and suggest the recent un-subscriber on me.
>
> If your so eager to deal with my pull request please review this one:
> I would rather you review this pull request: https://issues.
> apache.org/jira/browse/CASSANDRA-10825
>
>
>
>
>
> On Mon, Oct 3, 2016 at 1:04 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
>> Nobody is disputing that the docs can and should be improved to avoid
>> this misreading.  I've invited Ed to file a JIRA and/or pull request twice
>> now.
>>
>> You are of course just as welcome to do this.  Perhaps you will actually
>> do it, so we can all move on with our lives!
>>
>>
>>
>>
>> On 3 October 2016 at 17:45, Peter Lin  wrote:
>>
>>> I've met clients that read the cassandra docs and then said in a big
>>> meeting "it's just like relational database, it has tables just like
>>> sqlserver/oracle."
>>>
>>> I'm not putting words in other people's mouth either, but I've heard
>>> that said enough times to want to puke. Does the docs claim cassandra is
>>> relational ? it absolutely doesn't make that claim, but the docs play
>>> loosey goosey with terminology. End result is it confuses new users that
>>> aren't experts, or technology managers that try to make a case for
>>> cassandra.
>>>
>>> we can make all the excuses we want, but that doesn't change the fact
>>> the docs aren't user friendly. writing great documentation is tough and
>>> most developers hate it. It's cuz we suck at it. There I said it, "we SUCK
>>> as writing user friendly documentation". As many people have pointed out,
>>> it's not unique to Cassandra. 80% of the tech docs out there suck, starting
>>> with IBM at the top.
>>>
>>> Saying the docs suck isn't an indictment of anyone, it's just the
>>> reality of writing good documentation.
>>>
>>> On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad 
>>> wrote:
>>>
 Nobody is claiming Cassandra is a relational I'm not sure why that
 keeps coming up.
 On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo 
 wrote:

> My original point can be summed up as:
>
> Do not define cassandra in terms SMILES & METAPHORS. Such words
> include "like" and "close relative".
>
> For the specifics:
>
>
> Any relational db could (and I'm sure one does!) allow for sparse
> fields as well. MySQL can be backed by rocksdb now, does that make it not 
> a
> row store?
>
>
> Lets draw some lines, a relational database is clearly defined.
>
> https://en.wikipedia.org/wiki/Edgar_F._Codd
>
> Codd's theorem , a
> result proven in his seminal work on the relational model, equates the
> expressive power of relational algebra
>  and relational
> calculus  (both of
> which, lacking recursion, are strictly less powerful thanfirst-order
> logic ).[*citation
> needed *]
>
> As the relational model started to become fashionable in the early
> 1980s, Codd fought a sometimes bitter campaign to prevent the term being
> misused by database vendors who had merely added a relational veneer to
> older technology. As part of this campaign, he published his 12 rules
>  to define what
> constituted a relational database. This made his position in IBM
> increasingly difficult, so he left to form his own consulting company with
> Chris Date and others.
>
> Cassandra is not a relational database.
>
> I am have attempted to illustrate that a "row store" is defined as
> well. I do not believe Cassandra is a "row store".
>
>
>
> "Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store""
>
> What is the definition of "row store". Is it a logical construct or a
> physical one?
>
> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo
> and present it as rows and columns. It seems to pass the litmus test being
> presented.
>
> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>
>
>
>
>
> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad 
> wrote:
>
> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log 

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo
You know what don't "go low" and suggest the recent un-subscriber on me.

If your so eager to deal with my pull request please review this one:
I would rather you review this pull request:
https://issues.apache.org/jira/browse/CASSANDRA-10825





On Mon, Oct 3, 2016 at 1:04 PM, Benedict Elliott Smith 
wrote:

> Nobody is disputing that the docs can and should be improved to avoid this
> misreading.  I've invited Ed to file a JIRA and/or pull request twice now.
>
> You are of course just as welcome to do this.  Perhaps you will actually
> do it, so we can all move on with our lives!
>
>
>
>
> On 3 October 2016 at 17:45, Peter Lin  wrote:
>
>> I've met clients that read the cassandra docs and then said in a big
>> meeting "it's just like relational database, it has tables just like
>> sqlserver/oracle."
>>
>> I'm not putting words in other people's mouth either, but I've heard that
>> said enough times to want to puke. Does the docs claim cassandra is
>> relational ? it absolutely doesn't make that claim, but the docs play
>> loosey goosey with terminology. End result is it confuses new users that
>> aren't experts, or technology managers that try to make a case for
>> cassandra.
>>
>> we can make all the excuses we want, but that doesn't change the fact the
>> docs aren't user friendly. writing great documentation is tough and most
>> developers hate it. It's cuz we suck at it. There I said it, "we SUCK as
>> writing user friendly documentation". As many people have pointed out, it's
>> not unique to Cassandra. 80% of the tech docs out there suck, starting with
>> IBM at the top.
>>
>> Saying the docs suck isn't an indictment of anyone, it's just the reality
>> of writing good documentation.
>>
>> On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad 
>> wrote:
>>
>>> Nobody is claiming Cassandra is a relational I'm not sure why that keeps
>>> coming up.
>>> On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo 
>>> wrote:
>>>
 My original point can be summed up as:

 Do not define cassandra in terms SMILES & METAPHORS. Such words include
 "like" and "close relative".

 For the specifics:


 Any relational db could (and I'm sure one does!) allow for sparse
 fields as well. MySQL can be backed by rocksdb now, does that make it not a
 row store?


 Lets draw some lines, a relational database is clearly defined.

 https://en.wikipedia.org/wiki/Edgar_F._Codd

 Codd's theorem , a
 result proven in his seminal work on the relational model, equates the
 expressive power of relational algebra
  and relational
 calculus  (both of
 which, lacking recursion, are strictly less powerful thanfirst-order
 logic ).[*citation
 needed *]

 As the relational model started to become fashionable in the early
 1980s, Codd fought a sometimes bitter campaign to prevent the term being
 misused by database vendors who had merely added a relational veneer to
 older technology. As part of this campaign, he published his 12 rules
  to define what
 constituted a relational database. This made his position in IBM
 increasingly difficult, so he left to form his own consulting company with
 Chris Date and others.

 Cassandra is not a relational database.

 I am have attempted to illustrate that a "row store" is defined as
 well. I do not believe Cassandra is a "row store".



 "Just because it uses log structured storage, sparse fields, and
 semi-flexible collections doesn't disqualify it from calling it a "row
 store""

 What is the definition of "row store". Is it a logical construct or a
 physical one?

 Why isn't mongo DB a "row store"? I can drop a schema on top of mongo
 and present it as rows and columns. It seems to pass the litmus test being
 presented.

 https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage





 On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad 
 wrote:

 Sorry Ed, but you're really stretching here. A table in Cassandra is
 structured by a schema with the data for each row stored together in each
 data file. Just because it uses log structured storage, sparse fields, and
 semi-flexible collections doesn't disqualify it from calling it a "row
 store"

 Postgres added flexible storage through hstore, I don't hear anyone
 arguing that it needs to be renamed.

 Any relational db could (and I'm sure one does!) allow for sparse
 fields as 

Re: Cassandra data model right definition

2016-10-03 Thread Benedict Elliott Smith
Nobody is disputing that the docs can and should be improved to avoid this
misreading.  I've invited Ed to file a JIRA and/or pull request twice now.

You are of course just as welcome to do this.  Perhaps you will actually do
it, so we can all move on with our lives!




On 3 October 2016 at 17:45, Peter Lin  wrote:

> I've met clients that read the cassandra docs and then said in a big
> meeting "it's just like relational database, it has tables just like
> sqlserver/oracle."
>
> I'm not putting words in other people's mouth either, but I've heard that
> said enough times to want to puke. Does the docs claim cassandra is
> relational ? it absolutely doesn't make that claim, but the docs play
> loosey goosey with terminology. End result is it confuses new users that
> aren't experts, or technology managers that try to make a case for
> cassandra.
>
> we can make all the excuses we want, but that doesn't change the fact the
> docs aren't user friendly. writing great documentation is tough and most
> developers hate it. It's cuz we suck at it. There I said it, "we SUCK as
> writing user friendly documentation". As many people have pointed out, it's
> not unique to Cassandra. 80% of the tech docs out there suck, starting with
> IBM at the top.
>
> Saying the docs suck isn't an indictment of anyone, it's just the reality
> of writing good documentation.
>
> On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad 
> wrote:
>
>> Nobody is claiming Cassandra is a relational I'm not sure why that keeps
>> coming up.
>> On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo 
>> wrote:
>>
>>> My original point can be summed up as:
>>>
>>> Do not define cassandra in terms SMILES & METAPHORS. Such words include
>>> "like" and "close relative".
>>>
>>> For the specifics:
>>>
>>>
>>> Any relational db could (and I'm sure one does!) allow for sparse fields
>>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>>> store?
>>>
>>>
>>> Lets draw some lines, a relational database is clearly defined.
>>>
>>> https://en.wikipedia.org/wiki/Edgar_F._Codd
>>>
>>> Codd's theorem , a
>>> result proven in his seminal work on the relational model, equates the
>>> expressive power of relational algebra
>>>  and relational
>>> calculus  (both of
>>> which, lacking recursion, are strictly less powerful thanfirst-order
>>> logic ).[*citation
>>> needed *]
>>>
>>> As the relational model started to become fashionable in the early
>>> 1980s, Codd fought a sometimes bitter campaign to prevent the term being
>>> misused by database vendors who had merely added a relational veneer to
>>> older technology. As part of this campaign, he published his 12 rules
>>>  to define what
>>> constituted a relational database. This made his position in IBM
>>> increasingly difficult, so he left to form his own consulting company with
>>> Chris Date and others.
>>>
>>> Cassandra is not a relational database.
>>>
>>> I am have attempted to illustrate that a "row store" is defined as well.
>>> I do not believe Cassandra is a "row store".
>>>
>>>
>>>
>>> "Just because it uses log structured storage, sparse fields, and
>>> semi-flexible collections doesn't disqualify it from calling it a "row
>>> store""
>>>
>>> What is the definition of "row store". Is it a logical construct or a
>>> physical one?
>>>
>>> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo
>>> and present it as rows and columns. It seems to pass the litmus test being
>>> presented.
>>>
>>> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad 
>>> wrote:
>>>
>>> Sorry Ed, but you're really stretching here. A table in Cassandra is
>>> structured by a schema with the data for each row stored together in each
>>> data file. Just because it uses log structured storage, sparse fields, and
>>> semi-flexible collections doesn't disqualify it from calling it a "row
>>> store"
>>>
>>> Postgres added flexible storage through hstore, I don't hear anyone
>>> arguing that it needs to be renamed.
>>>
>>> Any relational db could (and I'm sure one does!) allow for sparse fields
>>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>>> store?
>>>
>>> You're arguing that everything is wrong but you're not proposing an
>>> alternative, which is not productive.
>>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
>>> wrote:
>>>
>>> Also every piece of techincal information that describes a rowstore
>>>
>>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>>> 

Re: Cassandra data model right definition

2016-10-03 Thread Russell Bradberry
"X-store" refers to how data is stored, in almost every case it refers to
what logical constructs are grouped together physically on disk.  It has
nothing to do with whether a database is relational or not.

Cassandra does, in fact meet the definition of row-store, however, I would
like to re-iterate that it goes beyond that and stores all rows for a
single partition together on disk as well.  Therefore row-store does not do
it justice, which is why I like the term "Partitioned row-store"

On Mon, Oct 3, 2016 at 12:37 PM, Benedict Elliott Smith  wrote:

> ... and my response can be summed up as "you are not parsing English
> correctly."  The word "like" does not mean what you think it means in this
> context.  It does not mean "close relative."  It is constrained to the
> similarities expressed, and no others.  You don't seem to be reading any of
> my responses about this, though, so I'm not sure parsing is your issue.
>
> Postgresql has had arrays for years, and all RDBMS (pretty much) avoid
> persisting nulls in exactly the same way C* does - encoding their absence
> in the row header.
>
> I empathise with the recent unsubscriber.
>
>
>
> On 3 October 2016 at 15:53, Edward Capriolo  wrote:
>
>> My original point can be summed up as:
>>
>> Do not define cassandra in terms SMILES & METAPHORS. Such words include
>> "like" and "close relative".
>>
>> For the specifics:
>>
>> Any relational db could (and I'm sure one does!) allow for sparse fields
>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>> store?
>>
>> Lets draw some lines, a relational database is clearly defined.
>>
>> https://en.wikipedia.org/wiki/Edgar_F._Codd
>>
>> Codd's theorem , a
>> result proven in his seminal work on the relational model, equates the
>> expressive power of relational algebra
>>  and relational
>> calculus  (both of
>> which, lacking recursion, are strictly less powerful thanfirst-order
>> logic ).[*citation
>> needed *]
>>
>> As the relational model started to become fashionable in the early 1980s,
>> Codd fought a sometimes bitter campaign to prevent the term being misused
>> by database vendors who had merely added a relational veneer to older
>> technology. As part of this campaign, he published his 12 rules
>>  to define what
>> constituted a relational database. This made his position in IBM
>> increasingly difficult, so he left to form his own consulting company with
>> Chris Date and others.
>>
>> Cassandra is not a relational database.
>>
>> I am have attempted to illustrate that a "row store" is defined as well.
>> I do not believe Cassandra is a "row store".
>>
>> "Just because it uses log structured storage, sparse fields, and
>> semi-flexible collections doesn't disqualify it from calling it a "row
>> store""
>>
>> What is the definition of "row store". Is it a logical construct or a
>> physical one?
>>
>> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
>> present it as rows and columns. It seems to pass the litmus test being
>> presented.
>>
>> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>>
>>
>>
>>
>>
>> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad 
>> wrote:
>>
>>> Sorry Ed, but you're really stretching here. A table in Cassandra is
>>> structured by a schema with the data for each row stored together in each
>>> data file. Just because it uses log structured storage, sparse fields, and
>>> semi-flexible collections doesn't disqualify it from calling it a "row
>>> store"
>>>
>>> Postgres added flexible storage through hstore, I don't hear anyone
>>> arguing that it needs to be renamed.
>>>
>>> Any relational db could (and I'm sure one does!) allow for sparse fields
>>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>>> store?
>>>
>>> You're arguing that everything is wrong but you're not proposing an
>>> alternative, which is not productive.
>>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
>>> wrote:
>>>
 Also every piece of techincal information that describes a rowstore

 http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
 https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems

 Does it like this:

 001:10,Smith,Joe,4;
 002:12,Jones,Mary,5;
 003:11,Johnson,Cathy,44000;
 004:22,Jones,Bob,55000;



 The never depict a scenario where a the data looks like this on disk:

 001:10,Smith

 001:10,4;

 Which is much closer to how Cassandra *stores* it's data.



 On Fri, Sep 30, 2016 at 

Re: Cassandra data model right definition

2016-10-03 Thread Peter Lin
I've met clients that read the cassandra docs and then said in a big
meeting "it's just like relational database, it has tables just like
sqlserver/oracle."

I'm not putting words in other people's mouth either, but I've heard that
said enough times to want to puke. Does the docs claim cassandra is
relational ? it absolutely doesn't make that claim, but the docs play
loosey goosey with terminology. End result is it confuses new users that
aren't experts, or technology managers that try to make a case for
cassandra.

we can make all the excuses we want, but that doesn't change the fact the
docs aren't user friendly. writing great documentation is tough and most
developers hate it. It's cuz we suck at it. There I said it, "we SUCK as
writing user friendly documentation". As many people have pointed out, it's
not unique to Cassandra. 80% of the tech docs out there suck, starting with
IBM at the top.

Saying the docs suck isn't an indictment of anyone, it's just the reality
of writing good documentation.

On Mon, Oct 3, 2016 at 12:33 PM, Jonathan Haddad  wrote:

> Nobody is claiming Cassandra is a relational I'm not sure why that keeps
> coming up.
> On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo 
> wrote:
>
>> My original point can be summed up as:
>>
>> Do not define cassandra in terms SMILES & METAPHORS. Such words include
>> "like" and "close relative".
>>
>> For the specifics:
>>
>>
>> Any relational db could (and I'm sure one does!) allow for sparse fields
>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>> store?
>>
>>
>> Lets draw some lines, a relational database is clearly defined.
>>
>> https://en.wikipedia.org/wiki/Edgar_F._Codd
>>
>> Codd's theorem , a
>> result proven in his seminal work on the relational model, equates the
>> expressive power of relational algebra
>>  and relational
>> calculus  (both of
>> which, lacking recursion, are strictly less powerful thanfirst-order
>> logic ).[*citation
>> needed *]
>>
>> As the relational model started to become fashionable in the early 1980s,
>> Codd fought a sometimes bitter campaign to prevent the term being misused
>> by database vendors who had merely added a relational veneer to older
>> technology. As part of this campaign, he published his 12 rules
>>  to define what
>> constituted a relational database. This made his position in IBM
>> increasingly difficult, so he left to form his own consulting company with
>> Chris Date and others.
>>
>> Cassandra is not a relational database.
>>
>> I am have attempted to illustrate that a "row store" is defined as well.
>> I do not believe Cassandra is a "row store".
>>
>>
>>
>> "Just because it uses log structured storage, sparse fields, and
>> semi-flexible collections doesn't disqualify it from calling it a "row
>> store""
>>
>> What is the definition of "row store". Is it a logical construct or a
>> physical one?
>>
>> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
>> present it as rows and columns. It seems to pass the litmus test being
>> presented.
>>
>> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>>
>>
>>
>>
>>
>> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad 
>> wrote:
>>
>> Sorry Ed, but you're really stretching here. A table in Cassandra is
>> structured by a schema with the data for each row stored together in each
>> data file. Just because it uses log structured storage, sparse fields, and
>> semi-flexible collections doesn't disqualify it from calling it a "row
>> store"
>>
>> Postgres added flexible storage through hstore, I don't hear anyone
>> arguing that it needs to be renamed.
>>
>> Any relational db could (and I'm sure one does!) allow for sparse fields
>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>> store?
>>
>> You're arguing that everything is wrong but you're not proposing an
>> alternative, which is not productive.
>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
>> wrote:
>>
>> Also every piece of techincal information that describes a rowstore
>>
>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>
>> Does it like this:
>>
>> 001:10,Smith,Joe,4;
>> 002:12,Jones,Mary,5;
>> 003:11,Johnson,Cathy,44000;
>> 004:22,Jones,Bob,55000;
>>
>>
>>
>> The never depict a scenario where a the data looks like this on disk:
>>
>> 001:10,Smith
>>
>> 001:10,4;
>>
>> Which is much closer to how Cassandra *stores* it's data.
>>
>>
>>
>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>> 

Re: Cassandra data model right definition

2016-10-03 Thread Benedict Elliott Smith
... and my response can be summed up as "you are not parsing English
correctly."  The word "like" does not mean what you think it means in this
context.  It does not mean "close relative."  It is constrained to the
similarities expressed, and no others.  You don't seem to be reading any of
my responses about this, though, so I'm not sure parsing is your issue.

Postgresql has had arrays for years, and all RDBMS (pretty much) avoid
persisting nulls in exactly the same way C* does - encoding their absence
in the row header.

I empathise with the recent unsubscriber.



On 3 October 2016 at 15:53, Edward Capriolo  wrote:

> My original point can be summed up as:
>
> Do not define cassandra in terms SMILES & METAPHORS. Such words include
> "like" and "close relative".
>
> For the specifics:
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> Lets draw some lines, a relational database is clearly defined.
>
> https://en.wikipedia.org/wiki/Edgar_F._Codd
>
> Codd's theorem , a result
> proven in his seminal work on the relational model, equates the expressive
> power of relational algebra
>  and relational calculus
>  (both of which,
> lacking recursion, are strictly less powerful thanfirst-order logic
> ).[*citation needed
> *]
>
> As the relational model started to become fashionable in the early 1980s,
> Codd fought a sometimes bitter campaign to prevent the term being misused
> by database vendors who had merely added a relational veneer to older
> technology. As part of this campaign, he published his 12 rules
>  to define what
> constituted a relational database. This made his position in IBM
> increasingly difficult, so he left to form his own consulting company with
> Chris Date and others.
>
> Cassandra is not a relational database.
>
> I am have attempted to illustrate that a "row store" is defined as well. I
> do not believe Cassandra is a "row store".
>
> "Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store""
>
> What is the definition of "row store". Is it a logical construct or a
> physical one?
>
> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
> present it as rows and columns. It seems to pass the litmus test being
> presented.
>
> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>
>
>
>
>
> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad 
> wrote:
>
>> Sorry Ed, but you're really stretching here. A table in Cassandra is
>> structured by a schema with the data for each row stored together in each
>> data file. Just because it uses log structured storage, sparse fields, and
>> semi-flexible collections doesn't disqualify it from calling it a "row
>> store"
>>
>> Postgres added flexible storage through hstore, I don't hear anyone
>> arguing that it needs to be renamed.
>>
>> Any relational db could (and I'm sure one does!) allow for sparse fields
>> as well. MySQL can be backed by rocksdb now, does that make it not a row
>> store?
>>
>> You're arguing that everything is wrong but you're not proposing an
>> alternative, which is not productive.
>> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
>> wrote:
>>
>>> Also every piece of techincal information that describes a rowstore
>>>
>>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>>
>>> Does it like this:
>>>
>>> 001:10,Smith,Joe,4;
>>> 002:12,Jones,Mary,5;
>>> 003:11,Johnson,Cathy,44000;
>>> 004:22,Jones,Bob,55000;
>>>
>>>
>>>
>>> The never depict a scenario where a the data looks like this on disk:
>>>
>>> 001:10,Smith
>>>
>>> 001:10,4;
>>>
>>> Which is much closer to how Cassandra *stores* it's data.
>>>
>>>
>>>
>>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>>> bened...@apache.org> wrote:
>>>
>>> Absolutely.  A "partitioned row store" is exactly what I would call it.
>>> As it happens, our README thinks the same, which is fantastic.
>>>
>>> I thought I'd take a look at the rest of our cohort, and didn't get far
>>> before disappointment.  HBase literally calls itself a "
>>> *column-oriented* store" - which is so totally wrong it's
>>> simultaneously hilarious and tragic.
>>>
>>> I guess we can't blame the wider internet for misunderstanding/misnaming
>>> us poor "wide column stores" if even one of the major examples doesn't know
>>> what it, itself, is!
>>>
>>>
>>>
>>>
>>> On 30 September 2016 at 21:47, Jonathan Haddad 

Re: Cassandra data model right definition

2016-10-03 Thread Jonathan Haddad
It's a row store because its schemed (vs ad hoc documents), and data (rows)
are stored together. What would you call the things you iterate over when
you query a partition? Rows. That makes it a thing that stores "rows" of
data, row store isn't some crazy stretch.
On Mon, Oct 3, 2016 at 12:33 PM Jonathan Haddad  wrote:

> Nobody is claiming Cassandra is a relational I'm not sure why that keeps
> coming up.
> On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo 
> wrote:
>
> My original point can be summed up as:
>
> Do not define cassandra in terms SMILES & METAPHORS. Such words include
> "like" and "close relative".
>
> For the specifics:
>
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
>
> Lets draw some lines, a relational database is clearly defined.
>
> https://en.wikipedia.org/wiki/Edgar_F._Codd
>
> Codd's theorem , a result
> proven in his seminal work on the relational model, equates the expressive
> power of relational algebra
>  and relational calculus
>  (both of which,
> lacking recursion, are strictly less powerful thanfirst-order logic
> ).[*citation needed
> *]
>
> As the relational model started to become fashionable in the early 1980s,
> Codd fought a sometimes bitter campaign to prevent the term being misused
> by database vendors who had merely added a relational veneer to older
> technology. As part of this campaign, he published his 12 rules
>  to define what
> constituted a relational database. This made his position in IBM
> increasingly difficult, so he left to form his own consulting company with
> Chris Date and others.
>
> Cassandra is not a relational database.
>
> I am have attempted to illustrate that a "row store" is defined as well. I
> do not believe Cassandra is a "row store".
>
>
>
> "Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store""
>
> What is the definition of "row store". Is it a logical construct or a
> physical one?
>
> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
> present it as rows and columns. It seems to pass the litmus test being
> presented.
>
> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>
>
>
>
>
> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad 
> wrote:
>
> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
> wrote:
>
> Also every piece of techincal information that describes a rowstore
>
> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>
> Does it like this:
>
> 001:10,Smith,Joe,4;
> 002:12,Jones,Mary,5;
> 003:11,Johnson,Cathy,44000;
> 004:22,Jones,Bob,55000;
>
>
>
> The never depict a scenario where a the data looks like this on disk:
>
> 001:10,Smith
>
> 001:10,4;
>
> Which is much closer to how Cassandra *stores* it's data.
>
>
>
> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
> Absolutely.  A "partitioned row store" is exactly what I would call it.
> As it happens, our README thinks the same, which is fantastic.
>
> I thought I'd take a look at the rest of our cohort, and didn't get far
> before disappointment.  HBase literally calls itself a "*column-oriented* 
> store"
> - which is so totally wrong it's simultaneously hilarious and tragic.
>
> I guess we can't blame the wider internet for misunderstanding/misnaming
> us poor "wide column stores" if even one of the major examples doesn't know
> what it, itself, is!
>
>
>
>
> On 30 September 2016 at 21:47, Jonathan Haddad  wrote:
>
> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era 

Re: Cassandra data model right definition

2016-10-03 Thread Jonathan Haddad
Nobody is claiming Cassandra is a relational I'm not sure why that keeps
coming up.
On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo 
wrote:

> My original point can be summed up as:
>
> Do not define cassandra in terms SMILES & METAPHORS. Such words include
> "like" and "close relative".
>
> For the specifics:
>
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
>
> Lets draw some lines, a relational database is clearly defined.
>
> https://en.wikipedia.org/wiki/Edgar_F._Codd
>
> Codd's theorem , a result
> proven in his seminal work on the relational model, equates the expressive
> power of relational algebra
>  and relational calculus
>  (both of which,
> lacking recursion, are strictly less powerful thanfirst-order logic
> ).[*citation needed
> *]
>
> As the relational model started to become fashionable in the early 1980s,
> Codd fought a sometimes bitter campaign to prevent the term being misused
> by database vendors who had merely added a relational veneer to older
> technology. As part of this campaign, he published his 12 rules
>  to define what
> constituted a relational database. This made his position in IBM
> increasingly difficult, so he left to form his own consulting company with
> Chris Date and others.
>
> Cassandra is not a relational database.
>
> I am have attempted to illustrate that a "row store" is defined as well. I
> do not believe Cassandra is a "row store".
>
>
>
> "Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store""
>
> What is the definition of "row store". Is it a logical construct or a
> physical one?
>
> Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
> present it as rows and columns. It seems to pass the litmus test being
> presented.
>
> https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
>
>
>
>
>
> On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad 
> wrote:
>
> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
> wrote:
>
> Also every piece of techincal information that describes a rowstore
>
> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>
> Does it like this:
>
> 001:10,Smith,Joe,4;
> 002:12,Jones,Mary,5;
> 003:11,Johnson,Cathy,44000;
> 004:22,Jones,Bob,55000;
>
>
>
> The never depict a scenario where a the data looks like this on disk:
>
> 001:10,Smith
>
> 001:10,4;
>
> Which is much closer to how Cassandra *stores* it's data.
>
>
>
> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
> Absolutely.  A "partitioned row store" is exactly what I would call it.
> As it happens, our README thinks the same, which is fantastic.
>
> I thought I'd take a look at the rest of our cohort, and didn't get far
> before disappointment.  HBase literally calls itself a "*column-oriented* 
> store"
> - which is so totally wrong it's simultaneously hilarious and tragic.
>
> I guess we can't blame the wider internet for misunderstanding/misnaming
> us poor "wide column stores" if even one of the major examples doesn't know
> what it, itself, is!
>
>
>
>
> On 30 September 2016 at 21:47, Jonathan Haddad  wrote:
>
> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era terminology people still use.
> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:
>
> I used to present Cassandra as a NoSQL datastore with "distributed" table.
> This definition is closer to CQL and has some academic background
> (distributed hash table).
>
>
> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
> 

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo
My original point can be summed up as:

Do not define cassandra in terms SMILES & METAPHORS. Such words include
"like" and "close relative".

For the specifics:

Any relational db could (and I'm sure one does!) allow for sparse fields as
well. MySQL can be backed by rocksdb now, does that make it not a row store?

Lets draw some lines, a relational database is clearly defined.

https://en.wikipedia.org/wiki/Edgar_F._Codd

Codd's theorem , a result
proven in his seminal work on the relational model, equates the expressive
power of relational algebra
 and relational calculus
 (both of which, lacking
recursion, are strictly less powerful thanfirst-order logic
).[*citation needed
*]

As the relational model started to become fashionable in the early 1980s,
Codd fought a sometimes bitter campaign to prevent the term being misused
by database vendors who had merely added a relational veneer to older
technology. As part of this campaign, he published his 12 rules
 to define what
constituted a relational database. This made his position in IBM
increasingly difficult, so he left to form his own consulting company with
Chris Date and others.

Cassandra is not a relational database.

I am have attempted to illustrate that a "row store" is defined as well. I
do not believe Cassandra is a "row store".

"Just because it uses log structured storage, sparse fields, and
semi-flexible collections doesn't disqualify it from calling it a "row
store""

What is the definition of "row store". Is it a logical construct or a
physical one?

Why isn't mongo DB a "row store"? I can drop a schema on top of mongo and
present it as rows and columns. It seems to pass the litmus test being
presented.

https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage





On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad  wrote:

> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
> wrote:
>
>> Also every piece of techincal information that describes a rowstore
>>
>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>
>> Does it like this:
>>
>> 001:10,Smith,Joe,4;
>> 002:12,Jones,Mary,5;
>> 003:11,Johnson,Cathy,44000;
>> 004:22,Jones,Bob,55000;
>>
>>
>>
>> The never depict a scenario where a the data looks like this on disk:
>>
>> 001:10,Smith
>>
>> 001:10,4;
>>
>> Which is much closer to how Cassandra *stores* it's data.
>>
>>
>>
>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>> Absolutely.  A "partitioned row store" is exactly what I would call it.
>> As it happens, our README thinks the same, which is fantastic.
>>
>> I thought I'd take a look at the rest of our cohort, and didn't get far
>> before disappointment.  HBase literally calls itself a "*column-oriented* 
>> store"
>> - which is so totally wrong it's simultaneously hilarious and tragic.
>>
>> I guess we can't blame the wider internet for misunderstanding/misnaming
>> us poor "wide column stores" if even one of the major examples doesn't know
>> what it, itself, is!
>>
>>
>>
>>
>> On 30 September 2016 at 21:47, Jonathan Haddad  wrote:
>>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:
>>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really 

Re: Cassandra data model right definition

2016-10-03 Thread Russell Bradberry
A couple things I would like to note:

1. Cassandra does not determine how data is stored on disk, the compaction
strategy does.  One could, in theory, (and I believe some are trying) could
create a column-store compaction strategy.  There is a large effort in the
database community overall to separate the query execution from the storage
engine, it is becoming increasingly more incorrect to say a database is an
"X store" database.

2. "X-store" is not used, and never has been, to describe how data is
represented or queried.  When most database storage engines describe their
storage as "X-store" they are referring to contiguous bytes on disk.  In
traditional rows-store engines, on a single node, the definition is as
follows: "All data for a row is stored as a single block of contiguous
bytes on disk".  Traditional column-stores are also defined as "All data
for a column is stored contiguously on disk".  Old-style Cassandra was a
key-value column-family store in that "all data for a family of columns
belonging to a given key were stored contiguously on disk"

So when talking about Cassandra and all currently merged compaction
strategies, yes, it fits the definition of a row-store in that "All data
for a row is stored as contiguous bytes on disk", however, it goes further
because "All data for all rows in a given partition are stored as
contiguous bytes on disk".  So at the highest level one could say it is a
"Partition-store" but that is pretty vague.   I think it is deserving of a
different naming definition which is why I like the term
"Partitioned-row-store"  which gives insight into the fact that it is rows
being stored on disk, in a partitioned format.

PS.
To address the pedants, yes, by these definitions you would have to assume
that a partition resides in a single SSTable. While most compaction
strategies try hard to achieve this it currently only exists in one that I
know. You could call it a
"Partitioned-row-depenendent-upon-compaction-strategy-store" but that is
just terrible.



On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad  wrote:

> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
>
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
> wrote:
>
>> Also every piece of techincal information that describes a rowstore
>>
>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>
>> Does it like this:
>>
>> 001:10,Smith,Joe,4;
>> 002:12,Jones,Mary,5;
>> 003:11,Johnson,Cathy,44000;
>> 004:22,Jones,Bob,55000;
>>
>>
>>
>> The never depict a scenario where a the data looks like this on disk:
>>
>> 001:10,Smith
>>
>> 001:10,4;
>>
>> Which is much closer to how Cassandra *stores* it's data.
>>
>>
>>
>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>> Absolutely.  A "partitioned row store" is exactly what I would call it.
>> As it happens, our README thinks the same, which is fantastic.
>>
>> I thought I'd take a look at the rest of our cohort, and didn't get far
>> before disappointment.  HBase literally calls itself a "*column-oriented* 
>> store"
>> - which is so totally wrong it's simultaneously hilarious and tragic.
>>
>> I guess we can't blame the wider internet for misunderstanding/misnaming
>> us poor "wide column stores" if even one of the major examples doesn't know
>> what it, itself, is!
>>
>>
>>
>>
>> On 30 September 2016 at 21:47, Jonathan Haddad  wrote:
>>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:
>>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term 

Re: Cassandra data model right definition

2016-10-03 Thread Peter Lin
Whether a storage engine requires schema isn't really critical for row
oriented storage. How about CSV that doesn't have a header row? CSV is
probably the most commonly used row oriented storage and tons of businesses
still use it for B2B transactions.

As you pointed out, some traditional RDBMS have been adding
"non-traditional" storage options, which is good for everyone. What RDBMS
still don't support is dynamic columns and I really doubt the SQL working
group would add it in the near future. Though SqlServer and Oracle both
support XML datatype, which one could argue "kind of" achieves the similar
flexibility to dynamic columns.

Then there's RDBMS that are adding native support for JSON, which muddies
the water even more. As an english major, being precise and concise with
language is important even if 80% of the people in the IT field abuse it.
I've been on countless sales calls with management. More often than not,
they read the documentation written by developers and feel like they're
reading gibberish. It's best to avoid loaded terms like "row store". Just
because some people like it, doesn't mean it achieves the goal of clear
communication.


On Mon, Oct 3, 2016 at 10:02 AM, Jonathan Haddad  wrote:

> Sorry Ed, but you're really stretching here. A table in Cassandra is
> structured by a schema with the data for each row stored together in each
> data file. Just because it uses log structured storage, sparse fields, and
> semi-flexible collections doesn't disqualify it from calling it a "row
> store"
>
> Postgres added flexible storage through hstore, I don't hear anyone
> arguing that it needs to be renamed.
>
> Any relational db could (and I'm sure one does!) allow for sparse fields
> as well. MySQL can be backed by rocksdb now, does that make it not a row
> store?
>
> You're arguing that everything is wrong but you're not proposing an
> alternative, which is not productive.
>
> On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
> wrote:
>
>> Also every piece of techincal information that describes a rowstore
>>
>> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
>> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>>
>> Does it like this:
>>
>> 001:10,Smith,Joe,4;
>> 002:12,Jones,Mary,5;
>> 003:11,Johnson,Cathy,44000;
>> 004:22,Jones,Bob,55000;
>>
>>
>>
>> The never depict a scenario where a the data looks like this on disk:
>>
>> 001:10,Smith
>>
>> 001:10,4;
>>
>> Which is much closer to how Cassandra *stores* it's data.
>>
>>
>>
>> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>> Absolutely.  A "partitioned row store" is exactly what I would call it.
>> As it happens, our README thinks the same, which is fantastic.
>>
>> I thought I'd take a look at the rest of our cohort, and didn't get far
>> before disappointment.  HBase literally calls itself a "*column-oriented* 
>> store"
>> - which is so totally wrong it's simultaneously hilarious and tragic.
>>
>> I guess we can't blame the wider internet for misunderstanding/misnaming
>> us poor "wide column stores" if even one of the major examples doesn't know
>> what it, itself, is!
>>
>>
>>
>>
>> On 30 September 2016 at 21:47, Jonathan Haddad  wrote:
>>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:
>>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term "wide column store" with
>> fire.  It seems to have never meant anything beyond "schema-less,
>> row-oriented", and a "column store" means literally the opposite of this.
>>
>> Not only that, but people don't even seem to realise the term "column
>> store" existed long before "wide column store" and the latter is often
>> abbreviated to the former, as here: http://www.planetcassandra.
>> org/what-is-nosql/
>>
>> Since it no longer applies, let's all agree as a community to forget this
>> awful nomenclature ever existed.
>>
>>
>>
>> On 30 September 2016 at 18:09, Joaquin Casares > > wrote:
>>
>> Hi Mehdi,
>>
>> I can help clarify a few things.
>>
>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
>> have 2 billion columns, but in practice it shouldn't have more than 100
>> million columns.
>>

Re: Cassandra data model right definition

2016-10-03 Thread Jonathan Haddad
Sorry Ed, but you're really stretching here. A table in Cassandra is
structured by a schema with the data for each row stored together in each
data file. Just because it uses log structured storage, sparse fields, and
semi-flexible collections doesn't disqualify it from calling it a "row
store"

Postgres added flexible storage through hstore, I don't hear anyone arguing
that it needs to be renamed.

Any relational db could (and I'm sure one does!) allow for sparse fields as
well. MySQL can be backed by rocksdb now, does that make it not a row store?

You're arguing that everything is wrong but you're not proposing an
alternative, which is not productive.
On Mon, Oct 3, 2016 at 9:40 AM Edward Capriolo 
wrote:

> Also every piece of techincal information that describes a rowstore
>
> http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf
> https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems
>
> Does it like this:
>
> 001:10,Smith,Joe,4;
> 002:12,Jones,Mary,5;
> 003:11,Johnson,Cathy,44000;
> 004:22,Jones,Bob,55000;
>
>
>
> The never depict a scenario where a the data looks like this on disk:
>
> 001:10,Smith
>
> 001:10,4;
>
> Which is much closer to how Cassandra *stores* it's data.
>
>
>
> On Fri, Sep 30, 2016 at 5:12 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
> Absolutely.  A "partitioned row store" is exactly what I would call it.
> As it happens, our README thinks the same, which is fantastic.
>
> I thought I'd take a look at the rest of our cohort, and didn't get far
> before disappointment.  HBase literally calls itself a "*column-oriented* 
> store"
> - which is so totally wrong it's simultaneously hilarious and tragic.
>
> I guess we can't blame the wider internet for misunderstanding/misnaming
> us poor "wide column stores" if even one of the major examples doesn't know
> what it, itself, is!
>
>
>
>
> On 30 September 2016 at 21:47, Jonathan Haddad  wrote:
>
> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era terminology people still use.
> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:
>
> I used to present Cassandra as a NoSQL datastore with "distributed" table.
> This definition is closer to CQL and has some academic background
> (distributed hash table).
>
>
> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
> thrift users no longer think they have a schema (though they do), and
> thrift is being deprecated.
>
> I really wish everyone would kill the term "wide column store" with fire.
> It seems to have never meant anything beyond "schema-less, row-oriented",
> and a "column store" means literally the opposite of this.
>
> Not only that, but people don't even seem to realise the term "column
> store" existed long before "wide column store" and the latter is often
> abbreviated to the former, as here:
> http://www.planetcassandra.org/what-is-nosql/
>
> Since it no longer applies, let's all agree as a community to forget this
> awful nomenclature ever existed.
>
>
>
> On 30 September 2016 at 18:09, Joaquin Casares 
> wrote:
>
> Hi Mehdi,
>
> I can help clarify a few things.
>
> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
> have 2 billion columns, but in practice it shouldn't have more than 100
> million columns.
>
> Cassandra partitions data to certain nodes based on the partition key(s),
> but does provide the option of setting zero or more clustering keys.
> Together, the partition key(s) and clustering key(s) form the primary key.
>
> When writing to Cassandra, you will need to provide the full primary key,
> however, when reading from Cassandra, you only need to provide the full
> partition key.
>
> When you only provide the partition key for a read operation, you're able
> to return all columns that exist on that partition with low latency. These
> columns are displayed as "CQL rows" to make it easier to reason about.
>
> Consider the schema:
>
> CREATE TABLE foo (
>   bar uuid,
>
>   boz uuid,
>
>   baz timeuuid,
>   data1 text,
>
>   data2 text,
>
>   PRIMARY KEY ((bar, boz), baz)
>
> );
>
>
> When you write to Cassandra you will need to send bar, boz, and baz and
> optionally data*, if it's relevant for that CQL row. If you chose not to
> define a data* field for a particular CQL row, then nothing is stored nor
> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>
> However, all writes to the same bar/boz will end up on the same Cassandra
> replica set (a configurable number of nodes) and be stored on the same
> place(s) on disk within the SSTable(s). And on disk, each field that's not
> a partition key is stored as a column, 

Re: Cassandra data model right definition

2016-10-01 Thread Peter Lin
I'll second Ed's comment.

The documentation should be more careful when using phrases "like
relational databases". When we look at the history of relational databases,
people expect certain things like ACID transactions, primary/foriegn key
constraints, query planners, joins and relational algebra. Clearly
Cassandra's storage engine does not follow most of those principals for a
good reason.

The term row oriented storage would be more descriptive and appropriate. It
avoids conflating Cassandra storage engine with "traditional" relational
storage engines. Those of us that have spent over a decade using IBM DB2,
Oracle, Sql Server and Sybase tend to think of relational databases in a
certain way. If we go back to 1998, most RDBMS storage engine had a max row
size limit. Databases like Sybase before version 9 preferred RAW disk for
optimal performance. I can go on and on, but there's no point really.

Cassandra's storage engine is "row oriented", but it's not relational in
RDBMS sense. We do everyone a huge disservice by using confusing
terminology and then making fun of those who get confused. No one wins when
that happens. At the end of the day, what differentiates cassandra's
storage engine is it support static and dynamic columns, which traditional
RDBMS don't support today. Calling Cassandra storage "distributed tables"
doesn't really help in my bias opinion.

For example, if you tell a SqlServer or Oracle RAC admin "cassandra uses
distributed tables" they might answer "so what, sql server and oracle can
do that too." The difference is with RDBMS the partitioning is optional and
requires more work to configure. Whereas with Cassandra you can have
everything in 1 node, which means there is only 1 partition and no
different to 1 instance of sql server. Where you win is when you need to
add 2 more nodes, Cassandra makes this easier whereas with SqlServer and
Oracle you have to do a little bit more work. I've lost count of how many
times I've to explained noSql databases to RDBMS admins and had to explain
the official docs are stupid.



On Sat, Oct 1, 2016 at 11:31 AM, Edward Capriolo 
wrote:

> https://github.com/apache/cassandra
>
> Row store  means that like
> relational databases, Cassandra organizes data by rows and columns. The
> Cassandra Query Language (CQL) is a close relative of SQL.
>
> I generally do not know what to say about these high level
> "oversimplifications" like "firewalls block hackers". Are there "firewalls"
> or do they mean IP routers with layer 4 packet inspections and layer 3
> Access Control Lists?
>
> We say (and I catch myself doing it all the time) "like relational
> databases" often as if all relational databases work alike. A columnar
> store like HP Vertica is a relational database.MySql has different storage
> engines does MyIsam work like InnoDB?
>
> Google docs organizes data by rows and columns as well. You can wrap any
> storage system into an API that makes them look like rows and columns.
> Microsoft LINQ can enumerate your network cars and query them
> https://msdn.microsoft.com/en-us/library/bb308959.aspx , that really does
> not make your network cards a "row store"
>
> "Theoretically a row can have 2 billion columns, but in practice it
> shouldn't have more than 100 million columns."
> In practice (In my experience) the number is much lower than 100 million,
> and if the data actually is deleted and readded frequently the number of
> live columns(rows, whatever) you can use happily is even lower
>
>
> I believe on twitter (I am unable to find the tweet) someone was trying to
> convince me Cassandra was a "columnar analytic database".  ROFL
>
> I believe telling someone it "row store" "like a database", is not a good
> idea. They might away content with that explanation. You are setting them
> up to walk into an anti-pattern. Like a case where the user is attempting
> to write and deleting 1 row and 1 column 6 billion times a day. Then you
> end up explaining to them http://stackoverflow.com/
> questions/21755286/what-exactly-happens-when-tombstone-limit-is-reached
>
> and how the cassandra storage model is not "like a relational database".
>
> On Fri, Sep 30, 2016 at 9:22 PM, Edward Capriolo 
> wrote:
>
>> I can iterate over JSON data stored in mongo and present it as a table
>> with rows and columns. It does not make mongo a rowstore.
>>
>> On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo 
>> wrote:
>>
>>> The problem with calling it a row store:
>>>
>>> https://en.wikipedia.org/wiki/Row_(database)
>>>
>>> In the context of a relational database
>>> , a *row*—also
>>> called a record
>>>  or tuple
>>> —represents a single, implicitly
>>> structured data  item in a table

Re: Cassandra data model right definition

2016-10-01 Thread Edward Capriolo
https://github.com/apache/cassandra

Row store  means that like
relational databases, Cassandra organizes data by rows and columns. The
Cassandra Query Language (CQL) is a close relative of SQL.

I generally do not know what to say about these high level
"oversimplifications" like "firewalls block hackers". Are there "firewalls"
or do they mean IP routers with layer 4 packet inspections and layer 3
Access Control Lists?

We say (and I catch myself doing it all the time) "like relational
databases" often as if all relational databases work alike. A columnar
store like HP Vertica is a relational database.MySql has different storage
engines does MyIsam work like InnoDB?

Google docs organizes data by rows and columns as well. You can wrap any
storage system into an API that makes them look like rows and columns.
Microsoft LINQ can enumerate your network cars and query them
https://msdn.microsoft.com/en-us/library/bb308959.aspx , that really does
not make your network cards a "row store"

"Theoretically a row can have 2 billion columns, but in practice it
shouldn't have more than 100 million columns."
In practice (In my experience) the number is much lower than 100 million,
and if the data actually is deleted and readded frequently the number of
live columns(rows, whatever) you can use happily is even lower


I believe on twitter (I am unable to find the tweet) someone was trying to
convince me Cassandra was a "columnar analytic database".  ROFL

I believe telling someone it "row store" "like a database", is not a good
idea. They might away content with that explanation. You are setting them
up to walk into an anti-pattern. Like a case where the user is attempting
to write and deleting 1 row and 1 column 6 billion times a day. Then you
end up explaining to them
http://stackoverflow.com/questions/21755286/what-exactly-happens-when-tombstone-limit-is-reached


and how the cassandra storage model is not "like a relational database".

On Fri, Sep 30, 2016 at 9:22 PM, Edward Capriolo 
wrote:

> I can iterate over JSON data stored in mongo and present it as a table
> with rows and columns. It does not make mongo a rowstore.
>
> On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo 
> wrote:
>
>> The problem with calling it a row store:
>>
>> https://en.wikipedia.org/wiki/Row_(database)
>>
>> In the context of a relational database
>> , a *row*—also called
>> a record  or
>> tuple —represents a single,
>> implicitly structured data  item in
>> a table . In simple
>> terms, a database table can be thought of as consisting of *rows* and
>> columns  or fields
>> .[1]
>>  Each row in a
>> table represents a set of related data, and every row in the table has the
>> same structure.
>>
>> When you have static columns and rows with maps, and lists, it is hard to
>> argue that every row has the same structure. Physically at the storage
>> layer they do not have the same structure and logically when accessing the
>> data they barely have the same structure, as the static column is just
>> appearing inside each row it is actually not contained in.
>>
>> On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad 
>> wrote:
>>
>>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>>> which usually needs some extra explanation but is more accurate than
>>> "column family" or whatever other thrift era terminology people still use.
>>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan 
>>> wrote:
>>>
 I used to present Cassandra as a NoSQL datastore with "distributed"
 table. This definition is closer to CQL and has some academic background
 (distributed hash table).


 On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
 bened...@apache.org> wrote:

> Cassandra is not a "wide column store" anymore.  It has a schema.
> Only thrift users no longer think they have a schema (though they do), and
> thrift is being deprecated.
>
> I really wish everyone would kill the term "wide column store" with
> fire.  It seems to have never meant anything beyond "schema-less,
> row-oriented", and a "column store" means literally the opposite of this.
>
> Not only that, but people don't even seem to realise the term "column
> store" existed long before "wide column store" and the latter is often
> abbreviated to the former, as here: http://www.planetcassandra.org
> /what-is-nosql/
>
> Since it no longer applies, let's all agree as a 

Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo
I can iterate over JSON data stored in mongo and present it as a table with
rows and columns. It does not make mongo a rowstore.

On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo 
wrote:

> The problem with calling it a row store:
>
> https://en.wikipedia.org/wiki/Row_(database)
>
> In the context of a relational database
> , a *row*—also called
> a record  or
> tuple —represents a single,
> implicitly structured data  item in a
> table . In simple terms,
> a database table can be thought of as consisting of *rows* andcolumns
>  or fields
> .[1]
>  Each row in a
> table represents a set of related data, and every row in the table has the
> same structure.
>
> When you have static columns and rows with maps, and lists, it is hard to
> argue that every row has the same structure. Physically at the storage
> layer they do not have the same structure and logically when accessing the
> data they barely have the same structure, as the static column is just
> appearing inside each row it is actually not contained in.
>
> On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad 
> wrote:
>
>> +1000 to what Benedict says. I usually call it a "partitioned row store"
>> which usually needs some extra explanation but is more accurate than
>> "column family" or whatever other thrift era terminology people still use.
>> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:
>>
>>> I used to present Cassandra as a NoSQL datastore with "distributed"
>>> table. This definition is closer to CQL and has some academic background
>>> (distributed hash table).
>>>
>>>
>>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>>> bened...@apache.org> wrote:
>>>
 Cassandra is not a "wide column store" anymore.  It has a schema.  Only
 thrift users no longer think they have a schema (though they do), and
 thrift is being deprecated.

 I really wish everyone would kill the term "wide column store" with
 fire.  It seems to have never meant anything beyond "schema-less,
 row-oriented", and a "column store" means literally the opposite of this.

 Not only that, but people don't even seem to realise the term "column
 store" existed long before "wide column store" and the latter is often
 abbreviated to the former, as here: http://www.planetcassandra.org
 /what-is-nosql/

 Since it no longer applies, let's all agree as a community to forget
 this awful nomenclature ever existed.



 On 30 September 2016 at 18:09, Joaquin Casares <
 joaq...@thelastpickle.com> wrote:

> Hi Mehdi,
>
> I can help clarify a few things.
>
> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
> can have 2 billion columns, but in practice it shouldn't have more than 
> 100
> million columns.
>
> Cassandra partitions data to certain nodes based on the partition
> key(s), but does provide the option of setting zero or more clustering
> keys. Together, the partition key(s) and clustering key(s) form the 
> primary
> key.
>
> When writing to Cassandra, you will need to provide the full primary
> key, however, when reading from Cassandra, you only need to provide the
> full partition key.
>
> When you only provide the partition key for a read operation, you're
> able to return all columns that exist on that partition with low latency.
> These columns are displayed as "CQL rows" to make it easier to reason 
> about.
>
> Consider the schema:
>
> CREATE TABLE foo (
>   bar uuid,
>
>   boz uuid,
>
>   baz timeuuid,
>   data1 text,
>
>   data2 text,
>
>   PRIMARY KEY ((bar, boz), baz)
>
> );
>
>
> When you write to Cassandra you will need to send bar, boz, and baz
> and optionally data*, if it's relevant for that CQL row. If you chose not
> to define a data* field for a particular CQL row, then nothing is stored
> nor allocated on disk. But I wouldn't consider that caveat to be
> "schema-less".
>
> However, all writes to the same bar/boz will end up on the same
> Cassandra replica set (a configurable number of nodes) and be stored on 
> the
> same place(s) on disk within the SSTable(s). And on disk, each field 
> that's
> not a partition key is stored as a column, including clustering keys (this
> is optimized in Cassandra 3+, but now we're getting deep into internals).
>
> In this way you can get fast 

Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo
The problem with calling it a row store:

https://en.wikipedia.org/wiki/Row_(database)

In the context of a relational database
, a *row*—also called a
record  or tuple
—represents a single, implicitly
structured data  item in a table
. In simple terms, a
database table can be thought of as consisting of *rows* andcolumns
 or fields
.[1]
 Each row in a
table represents a set of related data, and every row in the table has the
same structure.

When you have static columns and rows with maps, and lists, it is hard to
argue that every row has the same structure. Physically at the storage
layer they do not have the same structure and logically when accessing the
data they barely have the same structure, as the static column is just
appearing inside each row it is actually not contained in.

On Fri, Sep 30, 2016 at 4:47 PM, Jonathan Haddad  wrote:

> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era terminology people still use.
> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:
>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>>> thrift users no longer think they have a schema (though they do), and
>>> thrift is being deprecated.
>>>
>>> I really wish everyone would kill the term "wide column store" with
>>> fire.  It seems to have never meant anything beyond "schema-less,
>>> row-oriented", and a "column store" means literally the opposite of this.
>>>
>>> Not only that, but people don't even seem to realise the term "column
>>> store" existed long before "wide column store" and the latter is often
>>> abbreviated to the former, as here: http://www.planetcassandra.
>>> org/what-is-nosql/
>>>
>>> Since it no longer applies, let's all agree as a community to forget
>>> this awful nomenclature ever existed.
>>>
>>>
>>>
>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>> joaq...@thelastpickle.com> wrote:
>>>
 Hi Mehdi,

 I can help clarify a few things.

 As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
 can have 2 billion columns, but in practice it shouldn't have more than 100
 million columns.

 Cassandra partitions data to certain nodes based on the partition
 key(s), but does provide the option of setting zero or more clustering
 keys. Together, the partition key(s) and clustering key(s) form the primary
 key.

 When writing to Cassandra, you will need to provide the full primary
 key, however, when reading from Cassandra, you only need to provide the
 full partition key.

 When you only provide the partition key for a read operation, you're
 able to return all columns that exist on that partition with low latency.
 These columns are displayed as "CQL rows" to make it easier to reason 
 about.

 Consider the schema:

 CREATE TABLE foo (
   bar uuid,

   boz uuid,

   baz timeuuid,
   data1 text,

   data2 text,

   PRIMARY KEY ((bar, boz), baz)

 );


 When you write to Cassandra you will need to send bar, boz, and baz and
 optionally data*, if it's relevant for that CQL row. If you chose not to
 define a data* field for a particular CQL row, then nothing is stored nor
 allocated on disk. But I wouldn't consider that caveat to be "schema-less".

 However, all writes to the same bar/boz will end up on the same
 Cassandra replica set (a configurable number of nodes) and be stored on the
 same place(s) on disk within the SSTable(s). And on disk, each field that's
 not a partition key is stored as a column, including clustering keys (this
 is optimized in Cassandra 3+, but now we're getting deep into internals).

 In this way you can get fast responses for all activity for bar/boz
 either over time, or for a specific time, with roughly the same number of
 disk seeks, with varying lengths on the disk scans.

 Hope that helps!

 Joaquin Casares
 Consultant
 Austin, TX

 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On Fri, Sep 30, 2016 at 11:40 AM, Carlos 

Re: Cassandra data model right definition

2016-09-30 Thread Russell Bradberry
I agree 100%, this misunderstanding really bothers me as well.  I like the term 
“Partitioned Row Store” even though I am guilty of using the legacy 
“Column-Family Store” from darker times.  Even databases like Scylla which is 
supposed to be an Apache Cassandra clone tout themselves as a column-store, 
which is just utterly backwards as you mentioned.

 

From: Benedict Elliott Smith <bened...@apache.org>
Reply-To: <user@cassandra.apache.org>
Date: Friday, September 30, 2016 at 5:12 PM
To: <user@cassandra.apache.org>
Subject: Re: Cassandra data model right definition

 

Absolutely.  A "partitioned row store" is exactly what I would call it.  As it 
happens, our README thinks the same, which is fantastic.  

 

I thought I'd take a look at the rest of our cohort, and didn't get far before 
disappointment.  HBase literally calls itself a "column-oriented store" - which 
is so totally wrong it's simultaneously hilarious and tragic.  

 

I guess we can't blame the wider internet for misunderstanding/misnaming us 
poor "wide column stores" if even one of the major examples doesn't know what 
it, itself, is!

 

 

 

 

On 30 September 2016 at 21:47, Jonathan Haddad <j...@jonhaddad.com> wrote:

+1000 to what Benedict says. I usually call it a "partitioned row store" which 
usually needs some extra explanation but is more accurate than "column family" 
or whatever other thrift era terminology people still use. 

On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan <doanduy...@gmail.com> wrote:

I used to present Cassandra as a NoSQL datastore with "distributed" table. This 
definition is closer to CQL and has some academic background (distributed hash 
table).

 

 

On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <bened...@apache.org> 
wrote:

Cassandra is not a "wide column store" anymore.  It has a schema.  Only thrift 
users no longer think they have a schema (though they do), and thrift is being 
deprecated.

 

I really wish everyone would kill the term "wide column store" with fire.  It 
seems to have never meant anything beyond "schema-less, row-oriented", and a 
"column store" means literally the opposite of this.

 

Not only that, but people don't even seem to realise the term "column store" 
existed long before "wide column store" and the latter is often abbreviated to 
the former, as here: http://www.planetcassandra.org/what-is-nosql/ 

 

Since it no longer applies, let's all agree as a community to forget this awful 
nomenclature ever existed.

 

 

 

On 30 September 2016 at 18:09, Joaquin Casares <joaq...@thelastpickle.com> 
wrote:

Hi Mehdi,

 

I can help clarify a few things.

 

As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can have 
2 billion columns, but in practice it shouldn't have more than 100 million 
columns.

 

Cassandra partitions data to certain nodes based on the partition key(s), but 
does provide the option of setting zero or more clustering keys. Together, the 
partition key(s) and clustering key(s) form the primary key.

 

When writing to Cassandra, you will need to provide the full primary key, 
however, when reading from Cassandra, you only need to provide the full 
partition key.

 

When you only provide the partition key for a read operation, you're able to 
return all columns that exist on that partition with low latency. These columns 
are displayed as "CQL rows" to make it easier to reason about.

 

Consider the schema:

 

CREATE TABLE foo (

  bar uuid,

  boz uuid,

  baz timeuuid,

  data1 text,

  data2 text,

  PRIMARY KEY ((bar, boz), baz)

);

 

When you write to Cassandra you will need to send bar, boz, and baz and 
optionally data*, if it's relevant for that CQL row. If you chose not to define 
a data* field for a particular CQL row, then nothing is stored nor allocated on 
disk. But I wouldn't consider that caveat to be "schema-less".

 

However, all writes to the same bar/boz will end up on the same Cassandra 
replica set (a configurable number of nodes) and be stored on the same place(s) 
on disk within the SSTable(s). And on disk, each field that's not a partition 
key is stored as a column, including clustering keys (this is optimized in 
Cassandra 3+, but now we're getting deep into internals).

 

In this way you can get fast responses for all activity for bar/boz either over 
time, or for a specific time, with roughly the same number of disk seeks, with 
varying lengths on the disk scans.

 

Hope that helps!


Joaquin Casares

Consultant

Austin, TX

 

Apache Cassandra Consulting

http://www.thelastpickle.com

 

On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso <i...@mrcalonso.com> wrote:

Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra


Carlos Alonso | Software Engineer | @calonso

 

On 30 September 20

Re: Cassandra data model right definition

2016-09-30 Thread Benedict Elliott Smith
Absolutely.  A "partitioned row store" is exactly what I would call it.  As
it happens, our README thinks the same, which is fantastic.

I thought I'd take a look at the rest of our cohort, and didn't get far
before disappointment.  HBase literally calls itself a
"*column-oriented* store"
- which is so totally wrong it's simultaneously hilarious and tragic.

I guess we can't blame the wider internet for misunderstanding/misnaming us
poor "wide column stores" if even one of the major examples doesn't know
what it, itself, is!




On 30 September 2016 at 21:47, Jonathan Haddad  wrote:

> +1000 to what Benedict says. I usually call it a "partitioned row store"
> which usually needs some extra explanation but is more accurate than
> "column family" or whatever other thrift era terminology people still use.
> On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:
>
>> I used to present Cassandra as a NoSQL datastore with "distributed"
>> table. This definition is closer to CQL and has some academic background
>> (distributed hash table).
>>
>>
>> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
>> bened...@apache.org> wrote:
>>
>>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>>> thrift users no longer think they have a schema (though they do), and
>>> thrift is being deprecated.
>>>
>>> I really wish everyone would kill the term "wide column store" with
>>> fire.  It seems to have never meant anything beyond "schema-less,
>>> row-oriented", and a "column store" means literally the opposite of this.
>>>
>>> Not only that, but people don't even seem to realise the term "column
>>> store" existed long before "wide column store" and the latter is often
>>> abbreviated to the former, as here: http://www.planetcassandra.
>>> org/what-is-nosql/
>>>
>>> Since it no longer applies, let's all agree as a community to forget
>>> this awful nomenclature ever existed.
>>>
>>>
>>>
>>> On 30 September 2016 at 18:09, Joaquin Casares <
>>> joaq...@thelastpickle.com> wrote:
>>>
 Hi Mehdi,

 I can help clarify a few things.

 As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
 can have 2 billion columns, but in practice it shouldn't have more than 100
 million columns.

 Cassandra partitions data to certain nodes based on the partition
 key(s), but does provide the option of setting zero or more clustering
 keys. Together, the partition key(s) and clustering key(s) form the primary
 key.

 When writing to Cassandra, you will need to provide the full primary
 key, however, when reading from Cassandra, you only need to provide the
 full partition key.

 When you only provide the partition key for a read operation, you're
 able to return all columns that exist on that partition with low latency.
 These columns are displayed as "CQL rows" to make it easier to reason 
 about.

 Consider the schema:

 CREATE TABLE foo (
   bar uuid,

   boz uuid,

   baz timeuuid,
   data1 text,

   data2 text,

   PRIMARY KEY ((bar, boz), baz)

 );


 When you write to Cassandra you will need to send bar, boz, and baz and
 optionally data*, if it's relevant for that CQL row. If you chose not to
 define a data* field for a particular CQL row, then nothing is stored nor
 allocated on disk. But I wouldn't consider that caveat to be "schema-less".

 However, all writes to the same bar/boz will end up on the same
 Cassandra replica set (a configurable number of nodes) and be stored on the
 same place(s) on disk within the SSTable(s). And on disk, each field that's
 not a partition key is stored as a column, including clustering keys (this
 is optimized in Cassandra 3+, but now we're getting deep into internals).

 In this way you can get fast responses for all activity for bar/boz
 either over time, or for a specific time, with roughly the same number of
 disk seeks, with varying lengths on the disk scans.

 Hope that helps!

 Joaquin Casares
 Consultant
 Austin, TX

 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso 
 wrote:

> Cassandra is a Wide Column Store http://db-engines.com/
> en/system/Cassandra
>
> Carlos Alonso | Software Engineer | @calonso
> 
>
> On 30 September 2016 at 18:24, Mehdi Bada  > wrote:
>
>> Hi all,
>>
>> I have a theoritical question:
>> - Is Apache Cassandra really a column store?
>> Column store mean storing the data as column rather than as a rows.
>>
>> In fact C* store the data as row, and data is partionned with row key.
>>
>> Finally, for me, Cassandra is a row 

Re: Cassandra data model right definition

2016-09-30 Thread Jonathan Haddad
+1000 to what Benedict says. I usually call it a "partitioned row store"
which usually needs some extra explanation but is more accurate than
"column family" or whatever other thrift era terminology people still use.
On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan  wrote:

> I used to present Cassandra as a NoSQL datastore with "distributed" table.
> This definition is closer to CQL and has some academic background
> (distributed hash table).
>
>
> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term "wide column store" with
>> fire.  It seems to have never meant anything beyond "schema-less,
>> row-oriented", and a "column store" means literally the opposite of this.
>>
>> Not only that, but people don't even seem to realise the term "column
>> store" existed long before "wide column store" and the latter is often
>> abbreviated to the former, as here:
>> http://www.planetcassandra.org/what-is-nosql/
>>
>> Since it no longer applies, let's all agree as a community to forget this
>> awful nomenclature ever existed.
>>
>>
>>
>> On 30 September 2016 at 18:09, Joaquin Casares > > wrote:
>>
>>> Hi Mehdi,
>>>
>>> I can help clarify a few things.
>>>
>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>> million columns.
>>>
>>> Cassandra partitions data to certain nodes based on the partition
>>> key(s), but does provide the option of setting zero or more clustering
>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>> key.
>>>
>>> When writing to Cassandra, you will need to provide the full primary
>>> key, however, when reading from Cassandra, you only need to provide the
>>> full partition key.
>>>
>>> When you only provide the partition key for a read operation, you're
>>> able to return all columns that exist on that partition with low latency.
>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>
>>> Consider the schema:
>>>
>>> CREATE TABLE foo (
>>>   bar uuid,
>>>
>>>   boz uuid,
>>>
>>>   baz timeuuid,
>>>   data1 text,
>>>
>>>   data2 text,
>>>
>>>   PRIMARY KEY ((bar, boz), baz)
>>>
>>> );
>>>
>>>
>>> When you write to Cassandra you will need to send bar, boz, and baz and
>>> optionally data*, if it's relevant for that CQL row. If you chose not to
>>> define a data* field for a particular CQL row, then nothing is stored nor
>>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>>
>>> However, all writes to the same bar/boz will end up on the same
>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>> not a partition key is stored as a column, including clustering keys (this
>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>
>>> In this way you can get fast responses for all activity for bar/boz
>>> either over time, or for a specific time, with roughly the same number of
>>> disk seeks, with varying lengths on the disk scans.
>>>
>>> Hope that helps!
>>>
>>> Joaquin Casares
>>> Consultant
>>> Austin, TX
>>>
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso 
>>> wrote:
>>>
 Cassandra is a Wide Column Store
 http://db-engines.com/en/system/Cassandra

 Carlos Alonso | Software Engineer | @calonso
 

 On 30 September 2016 at 18:24, Mehdi Bada 
 wrote:

> Hi all,
>
> I have a theoritical question:
> - Is Apache Cassandra really a column store?
> Column store mean storing the data as column rather than as a rows.
>
> In fact C* store the data as row, and data is partionned with row key.
>
> Finally, for me, Cassandra is a row oriented schema less DBMS Is
> it true for you also???
>
> Many thanks in advance for your reply
>
> Best Regards
> Mehdi Bada
> 
>
> *Mehdi Bada* | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422
> 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com
> www.dbi-services.com
>
>
>
>
> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
> team
> *
>


>>>
>>
>


Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo
Then:
Physically: A data store which physically structured-log-merge of SSTables
(see) https://cloud.google.com/bigtable/.
Now:
One of the change made in Apache Cassandra 3.0 is a relatively
important refactor
of the storage engine .
I say refactor because the basics have not changed: data is still inserted
in a memtable which get flushed over time to a sstable with compaction
baby-sitting the set of sstables on disk, and reads uses both memtable and
sstables to retrieve results. But the internal structure of the objects
manipulated in those phases has changed, and that entails a significant
amount of refactoring in the code. The principal motivation is that new
storage engine more directly manipulate the structure that is exposed
through CQL, and knowing that structure at the storage engine level has
many advantages: some features are easier to add and the engine has more
information to optimize.

http://www.datastax.com/2015/12/storage-engine-30

Then:
An RPC abstraction over he data with methods like get_slice which selected
columns from a single 'row key'
Now:
A Query based abstraction over the data with queries like SELECT * FROM
table WHERE x=y in which most language features works over single
'partitions'

And 3? implementations of secondary index like things:
Secondary Indexes
Materialized Views
SasiIndex

Which add to query functionality typically by storing an index (or
secondary form) in a way optimized for given query functionality.






On Fri, Sep 30, 2016 at 1:52 PM, DuyHai Doan  wrote:

> I used to present Cassandra as a NoSQL datastore with "distributed" table.
> This definition is closer to CQL and has some academic background
> (distributed hash table).
>
>
> On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
>
>> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
>> thrift users no longer think they have a schema (though they do), and
>> thrift is being deprecated.
>>
>> I really wish everyone would kill the term "wide column store" with
>> fire.  It seems to have never meant anything beyond "schema-less,
>> row-oriented", and a "column store" means literally the opposite of this.
>>
>> Not only that, but people don't even seem to realise the term "column
>> store" existed long before "wide column store" and the latter is often
>> abbreviated to the former, as here: http://www.planetcassandra.org
>> /what-is-nosql/
>>
>> Since it no longer applies, let's all agree as a community to forget this
>> awful nomenclature ever existed.
>>
>>
>>
>> On 30 September 2016 at 18:09, Joaquin Casares > > wrote:
>>
>>> Hi Mehdi,
>>>
>>> I can help clarify a few things.
>>>
>>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row
>>> can have 2 billion columns, but in practice it shouldn't have more than 100
>>> million columns.
>>>
>>> Cassandra partitions data to certain nodes based on the partition
>>> key(s), but does provide the option of setting zero or more clustering
>>> keys. Together, the partition key(s) and clustering key(s) form the primary
>>> key.
>>>
>>> When writing to Cassandra, you will need to provide the full primary
>>> key, however, when reading from Cassandra, you only need to provide the
>>> full partition key.
>>>
>>> When you only provide the partition key for a read operation, you're
>>> able to return all columns that exist on that partition with low latency.
>>> These columns are displayed as "CQL rows" to make it easier to reason about.
>>>
>>> Consider the schema:
>>>
>>> CREATE TABLE foo (
>>>   bar uuid,
>>>
>>>   boz uuid,
>>>
>>>   baz timeuuid,
>>>   data1 text,
>>>
>>>   data2 text,
>>>
>>>   PRIMARY KEY ((bar, boz), baz)
>>>
>>> );
>>>
>>>
>>> When you write to Cassandra you will need to send bar, boz, and baz and
>>> optionally data*, if it's relevant for that CQL row. If you chose not to
>>> define a data* field for a particular CQL row, then nothing is stored nor
>>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>>
>>> However, all writes to the same bar/boz will end up on the same
>>> Cassandra replica set (a configurable number of nodes) and be stored on the
>>> same place(s) on disk within the SSTable(s). And on disk, each field that's
>>> not a partition key is stored as a column, including clustering keys (this
>>> is optimized in Cassandra 3+, but now we're getting deep into internals).
>>>
>>> In this way you can get fast responses for all activity for bar/boz
>>> either over time, or for a specific time, with roughly the same number of
>>> disk seeks, with varying lengths on the disk scans.
>>>
>>> Hope that helps!
>>>
>>> Joaquin Casares
>>> Consultant
>>> Austin, TX
>>>
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso 
>>> wrote:
>>>
 Cassandra is a 

Re: Cassandra data model right definition

2016-09-30 Thread DuyHai Doan
I used to present Cassandra as a NoSQL datastore with "distributed" table.
This definition is closer to CQL and has some academic background
(distributed hash table).


On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith  wrote:

> Cassandra is not a "wide column store" anymore.  It has a schema.  Only
> thrift users no longer think they have a schema (though they do), and
> thrift is being deprecated.
>
> I really wish everyone would kill the term "wide column store" with fire.
> It seems to have never meant anything beyond "schema-less, row-oriented",
> and a "column store" means literally the opposite of this.
>
> Not only that, but people don't even seem to realise the term "column
> store" existed long before "wide column store" and the latter is often
> abbreviated to the former, as here: http://www.planetcassandra.
> org/what-is-nosql/
>
> Since it no longer applies, let's all agree as a community to forget this
> awful nomenclature ever existed.
>
>
>
> On 30 September 2016 at 18:09, Joaquin Casares 
> wrote:
>
>> Hi Mehdi,
>>
>> I can help clarify a few things.
>>
>> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
>> have 2 billion columns, but in practice it shouldn't have more than 100
>> million columns.
>>
>> Cassandra partitions data to certain nodes based on the partition key(s),
>> but does provide the option of setting zero or more clustering keys.
>> Together, the partition key(s) and clustering key(s) form the primary key.
>>
>> When writing to Cassandra, you will need to provide the full primary key,
>> however, when reading from Cassandra, you only need to provide the full
>> partition key.
>>
>> When you only provide the partition key for a read operation, you're able
>> to return all columns that exist on that partition with low latency. These
>> columns are displayed as "CQL rows" to make it easier to reason about.
>>
>> Consider the schema:
>>
>> CREATE TABLE foo (
>>   bar uuid,
>>
>>   boz uuid,
>>
>>   baz timeuuid,
>>   data1 text,
>>
>>   data2 text,
>>
>>   PRIMARY KEY ((bar, boz), baz)
>>
>> );
>>
>>
>> When you write to Cassandra you will need to send bar, boz, and baz and
>> optionally data*, if it's relevant for that CQL row. If you chose not to
>> define a data* field for a particular CQL row, then nothing is stored nor
>> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>>
>> However, all writes to the same bar/boz will end up on the same Cassandra
>> replica set (a configurable number of nodes) and be stored on the same
>> place(s) on disk within the SSTable(s). And on disk, each field that's not
>> a partition key is stored as a column, including clustering keys (this is
>> optimized in Cassandra 3+, but now we're getting deep into internals).
>>
>> In this way you can get fast responses for all activity for bar/boz
>> either over time, or for a specific time, with roughly the same number of
>> disk seeks, with varying lengths on the disk scans.
>>
>> Hope that helps!
>>
>> Joaquin Casares
>> Consultant
>> Austin, TX
>>
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso 
>> wrote:
>>
>>> Cassandra is a Wide Column Store http://db-engines.com/en
>>> /system/Cassandra
>>>
>>> Carlos Alonso | Software Engineer | @calonso
>>> 
>>>
>>> On 30 September 2016 at 18:24, Mehdi Bada 
>>> wrote:
>>>
 Hi all,

 I have a theoritical question:
 - Is Apache Cassandra really a column store?
 Column store mean storing the data as column rather than as a rows.

 In fact C* store the data as row, and data is partionned with row key.

 Finally, for me, Cassandra is a row oriented schema less DBMS Is it
 true for you also???

 Many thanks in advance for your reply

 Best Regards
 Mehdi Bada
 

 *Mehdi Bada* | Consultant
 Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422
 96 15
 dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
 mehdi.b...@dbi-services.com
 www.dbi-services.com




 *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
 team
 *

>>>
>>>
>>
>


Re: Cassandra data model right definition

2016-09-30 Thread Benedict Elliott Smith
Cassandra is not a "wide column store" anymore.  It has a schema.  Only
thrift users no longer think they have a schema (though they do), and
thrift is being deprecated.

I really wish everyone would kill the term "wide column store" with fire.
It seems to have never meant anything beyond "schema-less, row-oriented",
and a "column store" means literally the opposite of this.

Not only that, but people don't even seem to realise the term "column
store" existed long before "wide column store" and the latter is often
abbreviated to the former, as here:
http://www.planetcassandra.org/what-is-nosql/

Since it no longer applies, let's all agree as a community to forget this
awful nomenclature ever existed.



On 30 September 2016 at 18:09, Joaquin Casares 
wrote:

> Hi Mehdi,
>
> I can help clarify a few things.
>
> As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
> have 2 billion columns, but in practice it shouldn't have more than 100
> million columns.
>
> Cassandra partitions data to certain nodes based on the partition key(s),
> but does provide the option of setting zero or more clustering keys.
> Together, the partition key(s) and clustering key(s) form the primary key.
>
> When writing to Cassandra, you will need to provide the full primary key,
> however, when reading from Cassandra, you only need to provide the full
> partition key.
>
> When you only provide the partition key for a read operation, you're able
> to return all columns that exist on that partition with low latency. These
> columns are displayed as "CQL rows" to make it easier to reason about.
>
> Consider the schema:
>
> CREATE TABLE foo (
>   bar uuid,
>
>   boz uuid,
>
>   baz timeuuid,
>   data1 text,
>
>   data2 text,
>
>   PRIMARY KEY ((bar, boz), baz)
>
> );
>
>
> When you write to Cassandra you will need to send bar, boz, and baz and
> optionally data*, if it's relevant for that CQL row. If you chose not to
> define a data* field for a particular CQL row, then nothing is stored nor
> allocated on disk. But I wouldn't consider that caveat to be "schema-less".
>
> However, all writes to the same bar/boz will end up on the same Cassandra
> replica set (a configurable number of nodes) and be stored on the same
> place(s) on disk within the SSTable(s). And on disk, each field that's not
> a partition key is stored as a column, including clustering keys (this is
> optimized in Cassandra 3+, but now we're getting deep into internals).
>
> In this way you can get fast responses for all activity for bar/boz either
> over time, or for a specific time, with roughly the same number of disk
> seeks, with varying lengths on the disk scans.
>
> Hope that helps!
>
> Joaquin Casares
> Consultant
> Austin, TX
>
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso 
> wrote:
>
>> Cassandra is a Wide Column Store http://db-engines.com/en
>> /system/Cassandra
>>
>> Carlos Alonso | Software Engineer | @calonso
>> 
>>
>> On 30 September 2016 at 18:24, Mehdi Bada 
>> wrote:
>>
>>> Hi all,
>>>
>>> I have a theoritical question:
>>> - Is Apache Cassandra really a column store?
>>> Column store mean storing the data as column rather than as a rows.
>>>
>>> In fact C* store the data as row, and data is partionned with row key.
>>>
>>> Finally, for me, Cassandra is a row oriented schema less DBMS Is it
>>> true for you also???
>>>
>>> Many thanks in advance for your reply
>>>
>>> Best Regards
>>> Mehdi Bada
>>> 
>>>
>>> *Mehdi Bada* | Consultant
>>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
>>> 15
>>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>>> mehdi.b...@dbi-services.com
>>> www.dbi-services.com
>>>
>>>
>>>
>>>
>>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>>> team
>>> *
>>>
>>
>>
>


Re: Cassandra data model right definition

2016-09-30 Thread Joaquin Casares
Hi Mehdi,

I can help clarify a few things.

As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can
have 2 billion columns, but in practice it shouldn't have more than 100
million columns.

Cassandra partitions data to certain nodes based on the partition key(s),
but does provide the option of setting zero or more clustering keys.
Together, the partition key(s) and clustering key(s) form the primary key.

When writing to Cassandra, you will need to provide the full primary key,
however, when reading from Cassandra, you only need to provide the full
partition key.

When you only provide the partition key for a read operation, you're able
to return all columns that exist on that partition with low latency. These
columns are displayed as "CQL rows" to make it easier to reason about.

Consider the schema:

CREATE TABLE foo (
  bar uuid,

  boz uuid,

  baz timeuuid,
  data1 text,

  data2 text,

  PRIMARY KEY ((bar, boz), baz)

);


When you write to Cassandra you will need to send bar, boz, and baz and
optionally data*, if it's relevant for that CQL row. If you chose not to
define a data* field for a particular CQL row, then nothing is stored nor
allocated on disk. But I wouldn't consider that caveat to be "schema-less".

However, all writes to the same bar/boz will end up on the same Cassandra
replica set (a configurable number of nodes) and be stored on the same
place(s) on disk within the SSTable(s). And on disk, each field that's not
a partition key is stored as a column, including clustering keys (this is
optimized in Cassandra 3+, but now we're getting deep into internals).

In this way you can get fast responses for all activity for bar/boz either
over time, or for a specific time, with roughly the same number of disk
seeks, with varying lengths on the disk scans.

Hope that helps!

Joaquin Casares
Consultant
Austin, TX

Apache Cassandra Consulting
http://www.thelastpickle.com

On Fri, Sep 30, 2016 at 11:40 AM, Carlos Alonso  wrote:

> Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra
>
> Carlos Alonso | Software Engineer | @calonso 
>
> On 30 September 2016 at 18:24, Mehdi Bada 
> wrote:
>
>> Hi all,
>>
>> I have a theoritical question:
>> - Is Apache Cassandra really a column store?
>> Column store mean storing the data as column rather than as a rows.
>>
>> In fact C* store the data as row, and data is partionned with row key.
>>
>> Finally, for me, Cassandra is a row oriented schema less DBMS Is it
>> true for you also???
>>
>> Many thanks in advance for your reply
>>
>> Best Regards
>> Mehdi Bada
>> 
>>
>> *Mehdi Bada* | Consultant
>> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96
>> 15
>> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
>> mehdi.b...@dbi-services.com
>> www.dbi-services.com
>>
>>
>>
>>
>> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
>> team
>> *
>>
>
>


Re: Cassandra data model right definition

2016-09-30 Thread Carlos Alonso
Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra

Carlos Alonso | Software Engineer | @calonso 

On 30 September 2016 at 18:24, Mehdi Bada 
wrote:

> Hi all,
>
> I have a theoritical question:
> - Is Apache Cassandra really a column store?
> Column store mean storing the data as column rather than as a rows.
>
> In fact C* store the data as row, and data is partionned with row key.
>
> Finally, for me, Cassandra is a row oriented schema less DBMS Is it
> true for you also???
>
> Many thanks in advance for your reply
>
> Best Regards
> Mehdi Bada
> 
>
> *Mehdi Bada* | Consultant
> Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15
> dbi services, Rue de la Jeunesse 2, CH-2800 Delémont
> mehdi.b...@dbi-services.com
> www.dbi-services.com
>
>
>
>
> *⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the
> team
> *
>


Cassandra data model right definition

2016-09-30 Thread Mehdi Bada
Hi all, 

I have a theoritical question: 
- Is Apache Cassandra really a column store? 
Column store mean storing the data as column rather than as a rows. 

In fact C* store the data as row, and data is partionned with row key. 

Finally, for me, Cassandra is a row oriented schema less DBMS Is it true 
for you also??? 

Many thanks in advance for your reply 

Best Regards 
Mehdi Bada 
 

Mehdi Bada | Consultant 
Phone: +41 32 422 96 00 | Mobile: +41 79 928 75 48 | Fax: +41 32 422 96 15 
dbi services, Rue de la Jeunesse 2, CH-2800 Delémont 
mehdi.b...@dbi-services.com 
www.dbi-services.com 



⇒ dbi services is recruiting Oracle & SQL Server experts ! – Join the team