Re: CQL Composite Key Seen After Table Creation

2016-01-15 Thread Chris Burroughs

On 01/06/2016 04:47 PM, Robert Coli wrote:

On Wed, Jan 6, 2016 at 12:54 PM, Chris Burroughs 
wrote:

The problem with that approach is that manually editing the local schema
tables in live cluster is wildly dangerous. I *think* this would work:


  * Make triple sure no schema changes are happening on the cluster.

  * Update schema tables on each node --> drain --> restart


I think that would work too, and probably be lower risk than modifying on
one and trying to get the others to pull via resetlocalschema. But I agree
it seems "wildly dangerous".


We did this, and a day later it appears successful.

I am still fuzzy on how schema "changes" propagate when you edit the 
schema tables directly and am unsure if the drain/restart rain dance was 
strictly necessary, but it felt safer. (Obviously even if I was sure 
now, that would not be behavior to count on, and I hope not to need to 
do this gain.)




Re: CQL Composite Key Seen After Table Creation

2016-01-06 Thread Robert Coli
On Wed, Jan 6, 2016 at 12:54 PM, Chris Burroughs 
wrote:

> I work with Amir and further experimentation I can shed a little more
> light on what exactly is going on under the hood.  For background our goal
> is to take data that is currently being read and written to via thrift,
> switch reads to CQL, and then switch writes to CQL.  This is in alternative
> to deleting all of our data and starting over, or being forever struck on
> super old thrift clients (both of those options obviously suck.)  The data
> models involved are absurdly simple (and single key with a handful of
> static columns).

 ...

> The problem with that approach is that manually editing the local schema
> tables in live cluster is wildly dangerous. I *think* this would work:

 * Make triple sure no schema changes are happening on the cluster.

 * Update schema tables on each node --> drain --> restart


I think that would work too, and probably be lower risk than modifying on
one and trying to get the others to pull via resetlocalschema. But I agree
it seems "wildly dangerous".

FWIW, I think your case may be the case the project hopes to handle in
https://issues.apache.org/jira/browse/CASSANDRA-10857 ?

=Rob


Re: CQL Composite Key Seen After Table Creation

2016-01-06 Thread Chris Burroughs
I work with Amir and further experimentation I can shed a little more light on 
what exactly is going on under the hood.  For background our goal is to take 
data that is currently being read and written to via thrift, switch reads to 
CQL, and then switch writes to CQL.  This is in alternative to deleting all of 
our data and starting over, or being forever struck on super old thrift clients 
(both of those options obviously suck.)  The data models involved are absurdly 
simple (and single key with a handful of static columns).

TLDR: Metadata is complicated.  What is the least dangerous way to make direct 
changes to system.schema_columnfamilies and system.schema_columns?

Anyway, given some super simple Foo and Bar column families:

create keyspace Test with  placement_strategy = 
'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = 
{replication_factor:1};
use Test;
create column family Foo with comparator = UTF8Type and 
key_validation_class=UTF8Type and column_metadata = [ {column_name: title, 
validation_class: UTF8Type}];
create column family Bar with comparator = UTF8Type and 
key_validation_class=UTF8Type;
update column family Bar with column_metadata = [ {column_name: title, 
validation_class: UTF8Type}];

(The salient difference as described by Amir is when the column_metadata is 
set; at the same time as creation or later.)

Now we can inject a little data and see that from thrift everything looks fine:

[default@Test] set Foo['testkey']['title']='mytitle';
Value inserted.
Elapsed time: 19 msec(s).
[default@Test] set Bar['testkey']['title']='mytitle';
Value inserted.
Elapsed time: 4.47 msec(s).

[default@Test] list Foo;
Using default limit of 100
Using default cell limit of 100
---
RowKey: testkey
=> (name=title, value=mytitle, timestamp=1452108082972000)

1 Row Returned.
Elapsed time: 268 msec(s).
[default@Test] list Bar;
Using default limit of 100
Using default cell limit of 100
---
RowKey: testkey
=> (name=title, value=mytitle, timestamp=1452108093739000)

1 Row Returned.
Elapsed time: 9.3 msec(s).

But from cql the Bar column does not look like the data we wrote:

cqlsh> select * from "Test"."Foo";

 key | title
-+-
 testkey | mytitle

(1 rows)


cqlsh> select * from "Test"."Bar";

 key | column1 | value| title
-+-+--+-
 testkey |   title | 0x6d797469746c65 | mytitle


It's not just that these phantom columns are ugly, cql thinks column1 is part 
of a composite primary key.  Since there **is no column1**, that renderes the 
data un-query-able with WHERE clauses.

Just to make sure it's not thrift that is doing something unexpected, the 
sstables show the expected structure:

$ ./tools/bin/sstable2json 
/data/sstables/data/Test/Foo-d3348860b4af11e5b456639406f48f1b/Test-Foo-ka-1-Data.db
 
[
{"key": "testkey",
 "cells": [["title","mytitle",1452110466924000]]}
]


$ ./tools/bin/sstable2json 
/data/sstables/data/Test/Foo-d3348860b4af11e5b456639406f48f1b/Test-Foo-ka-1-Data.db
 
[
{"key": "testkey",
 "cells": [["title","mytitle",1452110466924000]]}
]


So, what appeared as innocent variation made years ago when the thrift schema 
was written causes very different results to cql.

Digging into the schema tables shows what is going on in more detail:

> select 
> keyspace_name,columnfamily_name,column_aliases,comparator,is_dense,key_aliases,value_alias
>  from system.schema_columnfamilies where keyspace_name='Test';

 keyspace_name | columnfamily_name | column_aliases | comparator
  | is_dense | key_aliases | value_alias
---+---++ 
+--+-+-
  Test |   Bar | ["column1"]   | 
org.apache.cassandra.db.marshal.UTF8Type | True | ["key"] |   value
  Test |   Foo |  []   | 
org.apache.cassandra.db.marshal.UTF8Type |False | ["key"] |null

> select keyspace_name,columnfamily_name,column_name,validator from 
> system.schema_columns where keyspace_name='Test';

 keyspace_name | columnfamily_name | column_name | validator
---+---+-+---
  Test |   Bar | column1 |  
org.apache.cassandra.db.marshal.UTF8Type
  Test |   Bar | key |  
org.apache.cassandra.db.marshal.UTF8Type
  Test |   Bar |   title |  
org.apache.cassandra.db.marshal.UTF8Type
  Test |   Bar |   value | 
org.apache.cassandra.db.marshal.BytesType
  Test |   Foo | key |  
org.apache.cassandra.db.marshal.UTF8Type
  Test |   Foo |   title |  
org.apache.cassandra.db.marshal.UTF8Type


Now the interesting bit is that the metadata can  be manually "fixed":

UPDATE 

Re: CQL Composite Key Seen After Table Creation

2015-12-28 Thread Robert Coli
On Tue, Dec 22, 2015 at 6:55 AM, Amir Hassani 
wrote:

> Defining the “column_metadata” after table creation vs during table
> creation seems to affect the cql schema.  We are running a large
> high-throughput Cassandra 2.1.10 cluster. During table creation, the client
> Cassandra-cli was used with the following commands:
>

In general, tables created via cli should be used via thrift clients, and
vice versa for CQL.

Why do you believe you have a case where you should access a thrift-created
dynamic table via CQL? Why are you using cassandra-cli at all in Cassandra
2.1.10?

=Rob


CQL Composite Key Seen After Table Creation

2015-12-22 Thread Amir Hassani
Hello,

Defining the “column_metadata” after table creation vs during table
creation seems to affect the cql schema.  We are running a large
high-throughput Cassandra 2.1.10 cluster. During table creation, the client
Cassandra-cli was used with the following commands:

“create column family Test with comparator = UTF8Type and
key_validation_class=UTF8Type;”
“update column family Test with column_metadata = [ {column_name: title,
validation_class: UTF8Type}]”

Though these commands had the desirable effect in thrift, in cql it causes
the composite key ((key), column1) to be seen, where column1 is the “title”
value.  Despite the composite key issue in cql, the underlying data on disk
is correct.  This article seemed to best describe what might be happening
on the backend: http://www.datastax.com/dev/blog/thrift-to-cql3.  Instead
of recreating the table with an explicit column definition, which would
look like this:

create column family Test with comparator = UTF8Type
and key_validation_class=UTF8Type
and column_metadata = [ {column_name: title, validation_class: UTF8Type}]

Is there a way to repair the cql schema to recognize that the table is not
dynamic, ultimately removing the composite key?

Thanks,
Amir