RE: Interesting use case

2016-06-08 Thread Peer, Oded
Why do you think the amount of partitions is different in these tables? The 
partition key is the same (system_name and event_name). The number of rows per 
partition is different.



From: kurt Greaves [mailto:k...@instaclustr.com]
Sent: Thursday, June 09, 2016 7:52 AM
To: user@cassandra.apache.org
Subject: Re: Interesting use case

I would say it's probably due to a significantly larger number of partitions 
when using the overwrite method - but really you should be seeing similar 
performance unless one of the schemas ends up generating a lot more disk IO.
If you're planning to read the last N values for an event at the same time the 
widerow schema would be better, otherwise reading N events using the overwrite 
schema will result in you hitting N partitions. You really need to take into 
account how you're going to read the data when you design a schema, not only 
how many writes you can push through.

On 8 June 2016 at 19:02, John Thomas 
> wrote:
We have a use case where we are storing event data for a given system and only 
want to retain the last N values.  Storing extra values for some time, as long 
as it isn’t too long, is fine but never less than N.  We can't use TTLs to 
delete the data because we can't be sure how frequently events will arrive and 
could end up losing everything.  Is there any built in mechanism to accomplish 
this or a known pattern that we can follow?  The events will be read and 
written at a pretty high frequency so the solution would have to be performant 
and not fragile under stress.

We’ve played with a schema that just has N distinct columns with one value in 
each but have found overwrites seem to perform much poorer than wide rows.  The 
use case we tested only required we store the most recent value:

CREATE TABLE eventyvalue_overwrite(
system_name text,
event_name text,
event_time timestamp,
event_value blob,
PRIMARY KEY (system_name,event_name))

CREATE TABLE eventvalue_widerow (
system_name text,
event_name text,
event_time timestamp,
event_value blob,
PRIMARY KEY ((system_name, event_name), event_time))
WITH CLUSTERING ORDER BY (event_time DESC)

We tested it against the DataStax AMI on EC2 with 6 nodes, replication 3, write 
consistency 2, and default settings with a write only workload and got 190K/s 
for wide row and 150K/s for overwrite.  Thinking through the write path it 
seems the performance should be pretty similar, with probably smaller sstables 
for the overwrite schema, can anyone explain the big difference?

The wide row solution is more complex in that it requires a separate clean up 
thread that will handle deleting the extra values.  If that’s the path we have 
to follow we’re thinking we’d add a bucket of some sort so that we can delete 
an entire partition at a time after copying some values forward, on the 
assumption that deleting the whole partition is much better than deleting some 
slice of the partition.  Is that true?  Also, is there any difference between 
setting a really short ttl and doing a delete?

I know there are a lot of questions in there but we’ve been going back and 
forth on this for a while and I’d really appreciate any help you could give.

Thanks,
John



--
Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


RE: Mapping a continuous range to a discrete value

2016-04-10 Thread Peer, Oded
Thanks for responding. I was able to hack it out eventually.

I removed the ‘upper’ column from the PK so it is no longer a clustering 
column, and I added a secondary index over it. This assumes there is no overlap 
in the a continuous ranges.
I had to add a ‘dummy’ column with a constant value to be able to use an non-eq 
operator on the ‘upper’ column.
Now that it’s not a clustering column and it has an index I can use the 
following map a continuous range to a discrete value.

CREATE TABLE range_mapping (k int, lower int, upper int, dummy int, 
mapped_value int, PRIMARY KEY (k, lower));
CREATE INDEX upper_index on range_mapping(upper);
CREATE INDEX dummy_index on range_mapping(dummy);

INSERT INTO range_mapping (k, dummy, lower, upper, mapped_value) VALUES (0, 0, 
0, 99, 0);
INSERT INTO range_mapping (k, dummy, lower, upper, mapped_value) VALUES (0, 0, 
100, 199, 100);
INSERT INTO range_mapping (k, dummy, lower, upper, mapped_value) VALUES (0, 0, 
200, 299, 200);

Now for the query:

select * from range_mapping where k = 0 and dummy = 0 and lower <= 150 and 
upper >= 150 allow filtering;

k | lower | dummy | mapped_value | upper
---+---+---+--+---
0 |   100 | 0 |  100 |   199


Oded

From: Henry M [mailto:henrymanm...@gmail.com]
Sent: Friday, April 08, 2016 6:38 AM
To: user@cassandra.apache.org
Subject: Re: Mapping a continuous range to a discrete value

I had to do something similar (in my case it was an IN  query)... I ended up 
writing hack in java to create a custom Expression and injecting into the 
RowFilter of a dummy secondary index (not advisable and very short term but it 
keeps my application code clean). I am keeping my eyes open for the evolution 
of SASI indexes (starting with cassandra 3.4 
https://github.com/apache/cassandra/blob/trunk/doc/SASI.md) which should do 
what you are looking.


On Thu, Apr 7, 2016 at 11:06 AM Mitch Gitman 
<mgit...@gmail.com<mailto:mgit...@gmail.com>> wrote:
I just happened to run into a similar situation myself and I can see it's 
through a bad schema design (and query design) on my part. What I wanted to do 
was narrow down by the range on one clustering column and then by another range 
on the next clustering column. Failing to adequately think through how 
Cassandra stores its sorted rows on disk, I just figured, hey, why not?

The result? The same error message you got. But then, going back over some old 
notes from a DataStax CQL webinar, I came across this (my words):

"You can do selects with combinations of the different primary keys including 
ranges on individual columns. The range will only work if you've narrowed 
things down already by equality on all the prior columns. Cassandra creates a 
composite type to store the column name."

My new solution in response. Create two tables: one that's sorted by (in my 
situation) a high timestamp, the other that's sorted by (in my situation) a low 
timestamp. What had been two clustering columns gets broken up into one 
clustering column each in two different tables. Then I do two queries, one with 
the one range, the other with the other, and I programmatically merge the 
results.

The funny thing is, that was my original design which my most recent, and 
failed, design is replacing. My new solution goes back to my old solution.

On Thu, Apr 7, 2016 at 1:37 AM, Peer, Oded 
<oded.p...@rsa.com<mailto:oded.p...@rsa.com>> wrote:
I have a table mapping continuous ranges to discrete values.

CREATE TABLE range_mapping (k int, lower int, upper int, mapped_value int, 
PRIMARY KEY (k, lower, upper));
INSERT INTO range_mapping (k, lower, upper, mapped_value) VALUES (0, 0, 99, 0);
INSERT INTO range_mapping (k, lower, upper, mapped_value) VALUES (0, 100, 199, 
100);
INSERT INTO range_mapping (k, lower, upper, mapped_value) VALUES (0, 200, 299, 
200);

I then want to query this table to find mapping of a specific value.
In SQL I would use: select mapped_value from range_mapping where k=0 and ? 
between lower and upper

If the variable is bound to the value 150 then the mapped_value returned is 100.

I can’t use the same type of query in CQL.
Using the query “select * from range_mapping where k = 0 and lower <= 150 and 
upper >= 150;” returns an error "Clustering column "upper" cannot be restricted 
(preceding column "lower" is restricted by a non-EQ relation)"

I thought of using multi-column restrictions but they don’t work as I expected 
as the following query returns two rows instead of the one I expected:

select * from range_mapping where k = 0 and (lower,upper) <= (150,999) and 
(lower,upper) >= (-999,150);

k | lower | upper | mapped_value
---+---+---+--
0 | 0 |99 |0
0 |   100 |   199 |  100

I’d appreciate any thoughts on the subject.




Mapping a continuous range to a discrete value

2016-04-07 Thread Peer, Oded
I have a table mapping continuous ranges to discrete values.

CREATE TABLE range_mapping (k int, lower int, upper int, mapped_value int, 
PRIMARY KEY (k, lower, upper));
INSERT INTO range_mapping (k, lower, upper, mapped_value) VALUES (0, 0, 99, 0);
INSERT INTO range_mapping (k, lower, upper, mapped_value) VALUES (0, 100, 199, 
100);
INSERT INTO range_mapping (k, lower, upper, mapped_value) VALUES (0, 200, 299, 
200);

I then want to query this table to find mapping of a specific value.
In SQL I would use: select mapped_value from range_mapping where k=0 and ? 
between lower and upper

If the variable is bound to the value 150 then the mapped_value returned is 100.

I can't use the same type of query in CQL.
Using the query "select * from range_mapping where k = 0 and lower <= 150 and 
upper >= 150;" returns an error "Clustering column "upper" cannot be restricted 
(preceding column "lower" is restricted by a non-EQ relation)"

I thought of using multi-column restrictions but they don't work as I expected 
as the following query returns two rows instead of the one I expected:

select * from range_mapping where k = 0 and (lower,upper) <= (150,999) and 
(lower,upper) >= (-999,150);

k | lower | upper | mapped_value
---+---+---+--
0 | 0 |99 |0
0 |   100 |   199 |  100

I'd appreciate any thoughts on the subject.



RE: Large number of tombstones without delete or update

2016-03-24 Thread Peer, Oded
You are right, I missed the JSON part.
According to the 
docs<http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-2-json-support> 
“Columns which are omitted from the JSON value map are treated as a null insert 
(which results in an existing value being deleted, if one is present).”
So “unset” doesn’t help you out.
You can open a Jira ticket asking for “unset” support with JSON values and 
omitted columns so you can control is omitted columns have a “null” value or an 
“unset” value.




From: Ralf Steppacher [mailto:ralf.viva...@gmail.com]
Sent: Thursday, March 24, 2016 11:36 AM
To: user@cassandra.apache.org
Subject: Re: Large number of tombstones without delete or update

How does this improvement apply to inserting JSON? The prepared statement has 
exactly one parameter and it is always bound to the JSON message:

INSERT INTO event_by_patient_timestamp JSON ?

How would I “unset” a field inside the JSON message written to the 
event_by_patient_timestamp table?


Ralf


On 24.03.2016, at 10:22, Peer, Oded 
<oded.p...@rsa.com<mailto:oded.p...@rsa.com>> wrote:

http://www.datastax.com/dev/blog/datastax-java-driver-3-0-0-released#unset-values

“For Protocol V3 or below, all variables in a statement must be bound. With 
Protocol V4, variables can be left “unset”, in which case they will be ignored 
server-side (no tombstones will be generated).”


From: Ralf Steppacher [mailto:ralf.viva...@gmail.com]
Sent: Thursday, March 24, 2016 11:19 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Large number of tombstones without delete or update

I did some more tests with my particular schema/message structure:

A null text field inside a UDT instance does NOT yield tombstones.
A null map does NOT yield tombstones.
A null text field does yield tombstones.


Ralf

On 24.03.2016, at 09:42, Ralf Steppacher 
<ralf.viva...@gmail.com<mailto:ralf.viva...@gmail.com>> wrote:

I can confirm that if I send JSON messages that always cover all schema fields 
the tombstone issue is not reported by Cassandra.
So, is there a way to work around this issue other than to always populate 
every column of the schema with every insert? That would be a pain in the 
backside, really.

Why would C* not warn about the excessive number of tombstones if queried from 
cqlsh?


Thanks!
Ralf



On 23.03.2016, at 19:09, Robert Coli 
<rc...@eventbrite.com<mailto:rc...@eventbrite.com>> wrote:

On Wed, Mar 23, 2016 at 9:50 AM, Ralf Steppacher 
<ralf.viva...@gmail.com<mailto:ralf.viva...@gmail.com>> wrote:
How come I end up with that large a number of tombstones?

Are you inserting NULLs?

=Rob



RE: Large number of tombstones without delete or update

2016-03-24 Thread Peer, Oded
http://www.datastax.com/dev/blog/datastax-java-driver-3-0-0-released#unset-values

"For Protocol V3 or below, all variables in a statement must be bound. With 
Protocol V4, variables can be left "unset", in which case they will be ignored 
server-side (no tombstones will be generated)."


From: Ralf Steppacher [mailto:ralf.viva...@gmail.com]
Sent: Thursday, March 24, 2016 11:19 AM
To: user@cassandra.apache.org
Subject: Re: Large number of tombstones without delete or update

I did some more tests with my particular schema/message structure:

A null text field inside a UDT instance does NOT yield tombstones.
A null map does NOT yield tombstones.
A null text field does yield tombstones.


Ralf

On 24.03.2016, at 09:42, Ralf Steppacher 
> wrote:

I can confirm that if I send JSON messages that always cover all schema fields 
the tombstone issue is not reported by Cassandra.
So, is there a way to work around this issue other than to always populate 
every column of the schema with every insert? That would be a pain in the 
backside, really.

Why would C* not warn about the excessive number of tombstones if queried from 
cqlsh?


Thanks!
Ralf



On 23.03.2016, at 19:09, Robert Coli 
> wrote:

On Wed, Mar 23, 2016 at 9:50 AM, Ralf Steppacher 
> wrote:
How come I end up with that large a number of tombstones?

Are you inserting NULLs?

=Rob





RE: DataModelling to query date range

2016-03-24 Thread Peer, Oded
You can change the table to support Multi-column slice restrictions

CREATE TABLE routes (
start text,
end text,
year int,
month int,
day int,
PRIMARY KEY (start, end, year, month, day)
);

Then using Multi-column slice restrictions you can query:

SELECT * from routes where start = 'New York' and end = 'Washington' and 
(year,month,day) >= (2016,1,1) and (year,month,day) <= (2016,1,31);

For more details about Multi-column slice restrictions read 
http://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause

Oded

From: Chris Martin [mailto:ch...@cmartinit.co.uk]
Sent: Thursday, March 24, 2016 9:40 AM
To: user@cassandra.apache.org
Subject: Re: DataModelling to query date range

Ah- that looks interesting!  I'm actaully still on cassandra 2.x but I was 
planning on updgrading anyway.  Once I do so I'll check this one out.


Chris


On Thu, Mar 24, 2016 at 2:57 AM, Henry M 
> wrote:
I haven't tried the new SASI indexer but it may help: 
https://github.com/apache/cassandra/blob/trunk/doc/SASI.md


On Wed, Mar 23, 2016 at 2:08 PM, Chris Martin 
> wrote:
Hi all,

I have a table that represents a train timetable and looks a bit like this:


CREATE TABLE routes (

start text,

end text,

validFrom timestamp,

validTo timestamp,

PRIMARY KEY (start, end, validFrom, validTo)

);

In this case validFrom is the date that the route becomes valid and validTo is 
the date that the route that stops becoming valid.

If this was SQL I could write a query to find all valid routes between New York 
and Washington from Jan 1st 2016 to Jan 31st 2016 using something like:

SELECT * from routes where start = 'New York' and end = 'Washington' and 
validFrom <= 2016-01-31 and validTo >= 2016-01-01.

As far as I can tell such a query is impossible with CQL and my current table 
structure.  I'm considering running a query like:

SELECT * from routes where start = 'New York' and end = 'Washington' and 
validFrom <= 2016-01-31
And then filtering the rest of the data app side.  This doesn't seem ideal 
though as I'm going to end up fetching much more data (probably around an order 
of magnitude more) from Cassandra than I really want.

Is there a better way to model the data?

thanks,

Chris









Restoring a snapshot into a new cluster - thoughts on replica placement

2015-12-02 Thread Peer, Oded
I read the documentation for restoring a snapshot into a new cluster.
It got me thinking about replica placement in that context. 
"NetworkTopologyStrategy places replicas in the same data center by walking the 
ring clockwise until reaching the first node in another rack."
It seems it is not enough to restore the token ranges on an equal-size cluster 
since you also need to restore the rack information.

Assume I have two 6-node clusters with three racks for each cluster represented 
by "lower" "UPPER" and "numeric".
In the first cluster the ring is represented by: a -> B -> 3 -> d -> E -> 6 ->
In the second cluster the ring uses the same token ranges and  is represented 
by: a -> b -> C -> D -> 5 -> 6 ->

In this case data restored from the first cluster matches the token 
distribution on the second cluster but does not match the expected replica 
placement.
Token t will reside on nodes a,B,3 on the first cluster but should reside on 
nodes a,C,5 on the second cluster.

Does this make sense? Did I miss something?



RE: No query results while expecting results

2015-11-24 Thread Peer, Oded
Ramon,

Have you tried another driver to determine if the problem is in the Python 
driver?

You can deserialize your composite key using the following code:

  ByteBuffer t = 
ByteBufferUtil.hexToBytes("0008000e70451f6404000500");

  short periodlen = t.getShort();
  byte[] periodbuf = new byte[periodlen];
  t.get(periodbuf);
  BigInteger period = new BigInteger(periodbuf);
  System.out.println( "period " + period );
  t.get(); // null marker

  short tnt_idlen = t.getShort();
  byte[] tnt_idbuf = new byte[tnt_idlen];
  t.get(tnt_idbuf);
  BigInteger tnt_id = new BigInteger(tnt_idbuf);
  System.out.println( "tnt_id " + tnt_id.toString() );
  t.get();

The output is:
period 62013120356
tnt_id 5



From: Carlos Alonso [mailto:i...@mrcalonso.com]
Sent: Monday, November 23, 2015 9:00 PM
To: user@cassandra.apache.org
Subject: Re: No query results while expecting results

Did you tried to observe it using cassandra-cli? (the thrift client)
It shows the 'disk-layout' of the data and may help as well.

Otherwise, if you can reproduce it having a varint as the last part of the 
partition key (or at any other location), this may well be a bug.

Carlos Alonso | Software Engineer | @calonso

On 23 November 2015 at 18:48, Ramon Rockx 
> wrote:
Hello Carlos,

On Mon, Nov 23, 2015 at 3:31 PM, Carlos Alonso 
> wrote:
Well, this makes me wonder how varints are compared in java vs python because 
the problem may be there.

I'd suggest getting the token, to know which server contains the missing data. 
Go there and convert sstables to json, find the record and see what's there as 
the tnt_id. You could also use the thrift client to list it and see how it 
looks on disk and see if there's something wrong.

If the data is there and looks fine, probably there's a problem managing 
varints somewhere in the read path.

Thanks for your input.
I converted the sstables to json and found the record, it starts with:

{"key": "0008000e70451f6404000500","columns": 
[["cba56260-5c1c-11e3-bf53-402d20524153:1","{\"v\":1383400,\"s\":2052461,\"r\"...8<...

It's a composite key and I don't know how to deserialize it properly.
Maybe I can setup a test case to reproduce the problem.

Thanks,
Ramon



RE: No query results while expecting results

2015-11-23 Thread Peer, Oded
It might be a consistency issue.
Assume your data for tnt 5 should be on nodes 1 and 2, but actually never got 
to node 1 for various reasons, and the hint wasn’t replayed for some reason and 
you didn’t run repairs. The data for tnt 5 is only on node 2.
A query without restrictions on the partition key goes out to all nodes. So you 
get tnt 5.
The default consistency level in cqlsh is ONE. If your query hits node 1 it 
returns empty.
Try setting cqlsh consistency to ALL and retry your query with tnt_id=5.


From: Ramon Rockx [mailto:ra...@iqnomy.com]
Sent: Monday, November 23, 2015 11:32 AM
To: user@cassandra.apache.org
Subject: No query results while expecting results

Hello,

On our Cassandra 1.2.15 test cluster I'm stuck with querying data on one of our 
Cassandra tables.
This is the table:


cqlsh> describe table mls.te;

CREATE TABLE te (
  period bigint,
  tnt_id varint,
  evt_id timeuuid,
  evt_type varint,
  data text,
  PRIMARY KEY ((period, tnt_id), evt_id, evt_type)
) WITH COMPACT STORAGE AND
  CLUSTERING ORDER BY (evt_id DESC, evt_type ASC) AND
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='tenantevent' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};

Notice that the partition key is a composite one.
Now I will simply do a select all on this table with a limit:

cqlsh> select * from mls.te limit 330;

 period  | tnt_id | evt_id   | evt_type | 
data
-++--+--+-
...
 62013120356 |  5 | 3f33f950-5c1b-11e3-bf53-402d20524153 |0 | 
{"v":1383387,"s":2052457,"r":95257,"pvs":3610245,"u":"http://www.example.com"}
 62013120356 |  5 | ec15e5d0-5c1a-11e3-bf53-402d20524153 |0 | 
{"v":1383387,"s":2052457,"r":95257,"pvs":3610243,"u":"http://www.example.com"}
 62015032164 | 2063819251 | 
63d5c920-cfdb-11e4-85e9-000c2981ebb4 |0 | 
{"v":1451223,"s":2130306,"r":104667,"u":"http://www.example.com"}
 62015032164 | 2063819251 | 
111ce010-cfdb-11e4-85e9-000c2981ebb4 |0 | 
{"v":1451222,"s":2130305,"r":104769,"u":"http://www.example.com"}
 62015032164 | 2063819251 | 
105e7210-cfdb-11e4-85e9-000c2981ebb4 |0 | 
{"v":1451221,"s":2130304,"r":104769,"u":"http://www.example.com"}
 62015061055 | 2147429759 | 
35b97470-0f68-11e5-8cc3-000c2981ebb4 |1 | 
{"v":1453821,"s":2134354,"r":105462,"q":"13082ede-0843-47ee-8126-ba3767eae547"}
 62015061055 | 2147429759 | 
35a0bc50-0f68-11e5-8cc3-000c2981ebb4 |0 | 
{"v":1453821,"s":2134354,"r":105462,"u":"http://www.example.com"}

So far so good... Now I will try to query a few of these by using the composite 
partition key (period, tnt_id):

cqlsh> select * from mls.te where period=62013120356 and tnt_id=5;
cqlsh> select * from mls.te where period=62015032164 and 
tnt_id=2063819251;

 period  | tnt_id | evt_id   | evt_type | 
data
-++--+--+-
 62015032164 | 2063819251 | 
63d5c920-cfdb-11e4-85e9-000c2981ebb4 |0 | 
{"v":1451223,"s":2130306,"r":104667,"u":"http://www.example.com"}
 62015032164 | 2063819251 | 
111ce010-cfdb-11e4-85e9-000c2981ebb4 |0 | 
{"v":1451222,"s":2130305,"r":104769,"u":"http://www.example.com"}
 62015032164 | 2063819251 | 
105e7210-cfdb-11e4-85e9-000c2981ebb4 |0 | 
{"v":1451221,"s":2130304,"r":104769,"u":"http://www.example.com"}

As you can see, the last query returned the results as expected (see also the 
'select all' query). However the query "select * from mls.te where 
period=62013120356 and tnt_id=5;" does not return anything, I did expect 
results, since there are results based on this where clause.

Does anybody know what is going on, or what am I doing wrong?

Thanks!

Ramon Rockx


RE: Strategy tools for taking snapshots to load in another cluster instance

2015-11-19 Thread Peer, Oded
Have you read the DataStax documentation?
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html


From: Romain Hardouin [mailto:romainh...@yahoo.fr]
Sent: Wednesday, November 18, 2015 3:59 PM
To: user@cassandra.apache.org
Subject: Re: Strategy tools for taking snapshots to load in another cluster 
instance

You can take a snapshot via nodetool then load sstables on your test cluster 
with sstableloader:
docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsBulkloader_t.html



Sent from Yahoo Mail on 
Android


From:"Anishek Agarwal" >
Date:Wed, Nov 18, 2015 at 11:24
Subject:Strategy tools for taking snapshots to load in another cluster instance
Hello

We have 5 node prod cluster and 3 node test cluster. Is there a way i can take 
snapshot of a table in prod and load it test cluster. The cassandra versions 
are same.

Even if there is a tool that can help with this it will be great.

If not, how do people handle scenarios where data in prod is required in 
staging/test clusters for testing to make sure things are correct ? Does the 
cluster size have to be same to allow copying of relevant snapshot data etc?


thanks
anishek





RE: PrepareStatement BUG

2015-08-26 Thread Peer, Oded
See https://issues.apache.org/jira/browse/CASSANDRA-7910


From: joseph gao [mailto:gaojf.bok...@gmail.com]
Sent: Wednesday, August 26, 2015 6:15 AM
To: user@cassandra.apache.org
Subject: Re: PrepareStatement BUG

Hi, anybody knows how to resolve this problem?

2015-08-23 1:35 GMT+08:00 joseph gao 
gaojf.bok...@gmail.commailto:gaojf.bok...@gmail.com:

I'm using cassandra 2.1.7 and datastax java drive 2.1.6
Here is the problem:

I use PrepareStatement for query like : SELECT * FROM somespace.sometable where 
id = ?
And I Cached the PrepareStatement in my jvm;
When the table metadata has changed like a column was added;
And I use the cached PrepareStament , the data and the metadata(column 
definations) don't match.
So I re-prepare the sql using session.prepare(sql) again, but i see the code in 
the async-prepare callback part:

stmt = cluster.manager.addPrepare(stmt); in the SessionManager.java

this will return the previous PrepareStatement.
So it neither re-prepare automatically nor allow user to re-prepare!
Is this a bug or I use it like a fool?
--
--
Joseph Gao
PhoneNum:15210513582
QQ: 409343351



--
--
Joseph Gao
PhoneNum:15210513582
QQ: 409343351


RE: Can't connect to Cassandra server

2015-07-27 Thread Peer, Oded
It’s noticeable from the log file that you have many sstable files.
For instance there are over 11,000 sstable files for table “word_usage”, and 
over 10,500 of those are less than one MB in size.
I am guessing this has to be part of the reason bootstrap is taking so long.

What type of compaction are you using and how did you configure compaction for 
“word_usage”?
Which version of Cassandra are you using?


From: Surbhi Gupta [mailto:surbhi.gupt...@gmail.com]
Sent: Thursday, July 23, 2015 10:17 PM
To: user@cassandra.apache.org
Cc: Erick Ramirez
Subject: Re: Can't connect to Cassandra server

What is the output you are getting if you are issuing nodetool status command 
...

On 23 July 2015 at 11:30, Chamila Wijayarathna 
cdwijayarat...@gmail.commailto:cdwijayarat...@gmail.com wrote:
Hi Peer,

I changed cassandra-env.sh and following are the parameters I used,'

MAX_HEAP_SIZE=8G
HEAP_NEWSIZE=1600M

But I am still unable to start the server properly. But this time system.log 
has bit different logs.
https://gist.github.com/cdwijayarathna/75f65a34d9e71829adaa

Any idea on how to proceed?

Thanks


On Wed, Jul 22, 2015 at 11:54 AM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
Setting system_memory_in_mb to 16 GB means the Cassandra heap size you are 
using is 4 GB.
If you meant to use a 16GB heap you should uncomment the line
#MAX_HEAP_SIZE=4G
And set
MAX_HEAP_SIZE=16G

You should uncomment the HEAP_NEWSIZE setting as well. I would leave it with 
the default setting 800M until you are certain it needs to be changed.


From: Chamila Wijayarathna 
[mailto:cdwijayarat...@gmail.commailto:cdwijayarat...@gmail.com]
Sent: Tuesday, July 21, 2015 9:21 PM
To: Erick Ramirez
Cc: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Can't connect to Cassandra server

Hi Erick,

In cassandra-env.sh,  system_memory_in_mb was set to 2GB, I changed it into 
16GB, but I still get the same issue. Following are my complete system.log 
after changing cassandra-env.sh, and new cassandra-env.sh.

https://gist.githubusercontent.com/cdwijayarathna/5e7e69c62ac09b45490b/raw/f73f043a6cd68eb5e7f93cf597ec514df7ac61ae/log
https://gist.github.com/cdwijayarathna/2665814a9bd3c47ba650

I can't find ant output.log in my cassandra installation.

Thanks

On Tue, Jul 21, 2015 at 4:31 AM, Erick Ramirez 
er...@ramirez.com.aumailto:er...@ramirez.com.au wrote:
Chamila,

As you can see from the netstat/lsof output, there is nothing listening on port 
9042 because Cassandra has not started yet. This is the reason you are unable 
to connect via cqlsh.

You need to work out first why Cassandra has not started.

With regards to JVM, Oded is referring to the max heap size and new heap size 
you have configured. The suspicion is that you have max heap size set too low 
which is apparent from the heap pressure and GC pattern in the log you provided.

Please provide the gist for the following so we can assist:
- updated system.log
- copy of output.log
- cassandra-env.sh

Cheers,
Erick

Erick Ramirez
About Me about.me/erickramirezonlinehttp://about.me/erickramirezonline




--
Chamila Dilshan Wijayarathna,
Software Engineer
Mobile:(+94)788193620tel:%28%2B94%29788193620
WSO2 Inc., http://wso2.com/




--
Chamila Dilshan Wijayarathna,
Software Engineer
Mobile:(+94)788193620
WSO2 Inc., http://wso2.com/




RE: Can't connect to Cassandra server

2015-07-22 Thread Peer, Oded
Setting system_memory_in_mb to 16 GB means the Cassandra heap size you are 
using is 4 GB.
If you meant to use a 16GB heap you should uncomment the line
#MAX_HEAP_SIZE=4G
And set
MAX_HEAP_SIZE=16G

You should uncomment the HEAP_NEWSIZE setting as well. I would leave it with 
the default setting 800M until you are certain it needs to be changed.


From: Chamila Wijayarathna [mailto:cdwijayarat...@gmail.com]
Sent: Tuesday, July 21, 2015 9:21 PM
To: Erick Ramirez
Cc: user@cassandra.apache.org
Subject: Re: Can't connect to Cassandra server

Hi Erick,

In cassandra-env.sh,  system_memory_in_mb was set to 2GB, I changed it into 
16GB, but I still get the same issue. Following are my complete system.log 
after changing cassandra-env.sh, and new cassandra-env.sh.

https://gist.githubusercontent.com/cdwijayarathna/5e7e69c62ac09b45490b/raw/f73f043a6cd68eb5e7f93cf597ec514df7ac61ae/log
https://gist.github.com/cdwijayarathna/2665814a9bd3c47ba650

I can't find ant output.log in my cassandra installation.

Thanks

On Tue, Jul 21, 2015 at 4:31 AM, Erick Ramirez 
er...@ramirez.com.aumailto:er...@ramirez.com.au wrote:
Chamila,

As you can see from the netstat/lsof output, there is nothing listening on port 
9042 because Cassandra has not started yet. This is the reason you are unable 
to connect via cqlsh.

You need to work out first why Cassandra has not started.

With regards to JVM, Oded is referring to the max heap size and new heap size 
you have configured. The suspicion is that you have max heap size set too low 
which is apparent from the heap pressure and GC pattern in the log you provided.

Please provide the gist for the following so we can assist:
- updated system.log
- copy of output.log
- cassandra-env.sh

Cheers,
Erick

Erick Ramirez
About Me about.me/erickramirezonlinehttp://about.me/erickramirezonline




--
Chamila Dilshan Wijayarathna,
Software Engineer
Mobile:(+94)788193620
WSO2 Inc., http://wso2.com/



RE: howto do sql query like in a relational database

2015-07-21 Thread Peer, Oded
Cassandra is a highly scalable, eventually consistent, distributed, structured 
key-value store http://wiki.apache.org/cassandra/
It is intended for searching by key. It has more querying options but it really 
shines when querying by key.

Not all databases offer the same functionality. Both a knife and a fork are 
eating utensils, but you wouldn't want to cut a tomato with a fork.
There are text-indexing databases out there that might suit your needs better. 
Try elasticsearch.

-Original Message-
From: anton [mailto:anto...@gmx.de] 
Sent: Tuesday, July 21, 2015 7:54 PM
To: user@cassandra.apache.org
Subject: howto do sql query like in a relational database

Hi,

I have a simple (perhaps stupid) question.

If I want to *search* data in cassandra, how could find in a text field all 
records which start with 'Cas' 
( in sql I do select * from table where field like 'Cas%')

I know that this is not directly possible.

 - But how is it possible?

 - Do nobody have the need to search text fragments,
   and if not is there a small example to explain
   *why* this is not needed?

As far as I understand, databases are great for *searching* data. Concerning 
numerical data in cassandra I can use   = all that operators.

Is cassandra intended to be used for mostly numerical data?

I did not catch the point up to now, sorry.

 Anton




RE: Can't connect to Cassandra server

2015-07-20 Thread Peer, Oded
Cassandra does some housekeeping before it starts accepting requests from 
clients.
For instance, as you can see from the log file, it replays commit log.
When the node is ready to accept requests from clients it logs a message to the 
log file similar to “Starting listening for CQL clients on 
localhost/127.0.0.1:9042...”
This log message does not appear in your log file. It appears the node is still 
doing housekeeping.
If you look in your log file you see a lot of “Enqueuing flush of XXX” near the 
end of the log file, then the StatusLogger prints the tasks status. It shows 
you have 69 pending flushes and only 1 completed.
Another finding in the log is the GCInspector message showing that GC occurred 
and barely removed data from your Old Gen (from 3GB to 2.7 GB) in 11 seconds.

What are your JVM heap settings?
Have you changed the default configuration of flushing parameters or memTable 
parameters?



From: erickramirezonl...@gmail.com [mailto:erickramirezonl...@gmail.com] On 
Behalf Of Erick Ramirez
Sent: Monday, July 20, 2015 4:08 AM
To: user@cassandra.apache.org
Subject: Re: Can't connect to Cassandra server

Hello, Chamila.

From the information you have supplied so far, it is not clear whether 
Cassandra is in fact running.

Please provide the output of the following:

$ netstat -a -n | grep LISTEN
$ sudo lsof -i -n | grep LISTEN | grep java

It would also be good if you could provide output.log

Cheers,
Erick

Erick Ramirez
About Me about.me/erickramirezonlinehttp://about.me/erickramirezonline

Make a difference today!
* Reduce your carbon footprinthttp://on.mash.to/1vZL7fX
* Give back to the communityhttp://www.govolunteer.com.au
* Write free softwarehttp://www.opensource.org


On Sun, Jul 19, 2015 at 11:22 PM, Chamila Wijayarathna 
cdwijayarat...@gmail.commailto:cdwijayarat...@gmail.com wrote:
Hi Peer,

https://gist.githubusercontent.com/cdwijayarathna/a14586a9e39a943f89a0/raw/system%20log
 This is the log of the last time I started the server, I couldn't found any 
error there.

Thanks

On Sun, Jul 19, 2015 at 5:56 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
Try looking in the log file (/var/log/cassandra/system.log) for errors that 
prevent your node from starting.


From: Chamila Wijayarathna 
[mailto:cdwijayarat...@gmail.commailto:cdwijayarat...@gmail.com]
Sent: Sunday, July 19, 2015 2:29 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Can't connect to Cassandra server

Hi,

I'm getting following error when running nodetool status.

maduranga@ubuntu:/etc/cassandra$ nodetool status
error: No nodes present in the cluster. Has this node finished starting up?
-- StackTrace --
java.lang.RuntimeException: No nodes present in the cluster. Has this node 
finished starting up?
at 
org.apache.cassandra.dht.Murmur3Partitioner.describeOwnership(Murmur3Partitioner.java:129)
at 
org.apache.cassandra.service.StorageService.getOwnership(StorageService.java:3702)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at 
com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
at 
com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1464)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
at 
javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:657

RE: Can't connect to Cassandra server

2015-07-19 Thread Peer, Oded
Are you sure your node is up? Do you get a result when running “nodetool –h 
192.248.15.219 status”?

From: Chamila Wijayarathna [mailto:cdwijayarat...@gmail.com]
Sent: Sunday, July 19, 2015 1:53 PM
To: user@cassandra.apache.org
Subject: Re: Can't connect to Cassandra server

Hi Umang,

Tried your suggestion, but still getting the same error.

Thanks.

On Sun, Jul 19, 2015 at 3:12 PM, Umang Shah 
shahuma...@gmail.commailto:shahuma...@gmail.com wrote:
You also have to change the same IP which is 192.248.15.219 for seeds inside 
cassandra.yaml file.

then try to connect, it will work.

Thanks,
Umang Shah

On Sun, Jul 19, 2015 at 1:52 AM, Chamila Wijayarathna 
cdwijayarat...@gmail.commailto:cdwijayarat...@gmail.com wrote:
Hi Ajay,

I tried that also, but still getting the same result.

On Sun, Jul 19, 2015 at 2:08 PM, Ajay 
ajay.ga...@gmail.commailto:ajay.ga...@gmail.com wrote:
Try with the correct IP address as below:

cqlsh 192.248.15.219 -u sinmin -p xx
CQL documentation - 
http://docs.datastax.com/en/cql/3.0/cql/cql_reference/cqlsh.html

On Sun, Jul 19, 2015 at 2:00 PM, Chamila Wijayarathna 
cdwijayarat...@gmail.commailto:cdwijayarat...@gmail.com wrote:
Hello all,

After starting cassandra, I tried to connect to cassandra from cqlsh and java, 
but it fails to do so.

Following is the error I get while trying to connect to cqlsh.

cqlsh -u sinmin -p xx
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, 
Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused)})

I have set listen_address and rpc_address in cassandra.yaml to the ip address 
of server address like follows.

listen_address:192.248.15.219
rpc_address:192.248.15.219

Following is what I found from cassandra system.log.
https://gist.githubusercontent.com/cdwijayarathna/a14586a9e39a943f89a0/raw/system%20log

Following is the netstat result I got.

maduranga@ubuntu:/var/log/cassandra$ netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address   Foreign Address State
tcp0  0 ubuntu:ssh  
103.21.166.35:54417http://103.21.166.35:54417 ESTABLISHED
tcp0  0 ubuntu:1522 ubuntu:30820ESTABLISHED
tcp0  0 ubuntu:30820ubuntu:1522 ESTABLISHED
tcp0256 ubuntu:ssh  
175.157.41.209:42435http://175.157.41.209:42435ESTABLISHED
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags   Type   State I-Node   Path
unix  9  [ ] DGRAM7936 /dev/log
unix  3  [ ] STREAM CONNECTED 11737
unix  3  [ ] STREAM CONNECTED 11736
unix  3  [ ] STREAM CONNECTED 10949
/var/run/dbus/system_bus_socket
unix  3  [ ] STREAM CONNECTED 10948
unix  2  [ ] DGRAM10947
unix  2  [ ] STREAM CONNECTED 10801
unix  3  [ ] STREAM CONNECTED 10641
unix  3  [ ] STREAM CONNECTED 10640
unix  3  [ ] STREAM CONNECTED 10444
/var/run/dbus/system_bus_socket
unix  3  [ ] STREAM CONNECTED 10443
unix  3  [ ] STREAM CONNECTED 10437
/var/run/dbus/system_bus_socket
unix  3  [ ] STREAM CONNECTED 10436
unix  3  [ ] STREAM CONNECTED 10430
/var/run/dbus/system_bus_socket
unix  3  [ ] STREAM CONNECTED 10429
unix  2  [ ] DGRAM10424
unix  3  [ ] STREAM CONNECTED 10422
/var/run/dbus/system_bus_socket
unix  3  [ ] STREAM CONNECTED 10421
unix  2  [ ] DGRAM10420
unix  2  [ ] STREAM CONNECTED 10215
unix  2  [ ] STREAM CONNECTED 10296
unix  2  [ ] STREAM CONNECTED 9988
unix  2  [ ] DGRAM9520
unix  3  [ ] STREAM CONNECTED 8769 
/var/run/dbus/system_bus_socket
unix  3  [ ] STREAM CONNECTED 8768
unix  2  [ ] DGRAM8753
unix  2  [ ] DGRAM9422
unix  3  [ ] STREAM CONNECTED 7000 @/com/ubuntu/upstart
unix  3  [ ] STREAM CONNECTED 8485
unix  2  [ ] DGRAM7947
unix  3  [ ] STREAM CONNECTED 6712 
/var/run/dbus/system_bus_socket
unix  3  [ ] STREAM CONNECTED 6711
unix  3  [ ] STREAM CONNECTED 7760 
/var/run/dbus/system_bus_socket
unix  3  [ ] STREAM CONNECTED 7759
unix  3  [ ] STREAM CONNECTED 7754
unix  3  [ ] STREAM CONNECTED 7753
unix  3  [ ] DGRAM7661
unix  3  [ ] DGRAM7660
unix  3  [ ] 

RE: Can't connect to Cassandra server

2015-07-19 Thread Peer, Oded
Try looking in the log file (/var/log/cassandra/system.log) for errors that 
prevent your node from starting.


From: Chamila Wijayarathna [mailto:cdwijayarat...@gmail.com]
Sent: Sunday, July 19, 2015 2:29 PM
To: user@cassandra.apache.org
Subject: Re: Can't connect to Cassandra server

Hi,

I'm getting following error when running nodetool status.

maduranga@ubuntu:/etc/cassandra$ nodetool status
error: No nodes present in the cluster. Has this node finished starting up?
-- StackTrace --
java.lang.RuntimeException: No nodes present in the cluster. Has this node 
finished starting up?
at 
org.apache.cassandra.dht.Murmur3Partitioner.describeOwnership(Murmur3Partitioner.java:129)
at 
org.apache.cassandra.service.StorageService.getOwnership(StorageService.java:3702)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at 
com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
at 
com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1464)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
at 
javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:657)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at sun.rmi.transport.Transport$1.run(Transport.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

What is the reason for this? How can I fix this?

Thank You!

On Sun, Jul 19, 2015 at 4:52 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
Are you sure your node is up? Do you get a result when running “nodetool –h 
192.248.15.219 status”?

From: Chamila Wijayarathna 
[mailto:cdwijayarat...@gmail.commailto:cdwijayarat...@gmail.com]
Sent: Sunday, July 19, 2015 1:53 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Can't connect to Cassandra server

Hi Umang,

Tried your suggestion, but still getting the same error.

Thanks.

On Sun, Jul 19, 2015 at 3:12 PM, Umang Shah 
shahuma...@gmail.commailto:shahuma...@gmail.com wrote:
You also have to change the same IP which is 192.248.15.219 for seeds inside 
cassandra.yaml file.

then try to connect, it will work.

Thanks,
Umang Shah

On Sun, Jul 19, 2015 at 1:52 AM, Chamila Wijayarathna 
cdwijayarat...@gmail.commailto:cdwijayarat...@gmail.com wrote:
Hi Ajay,

I tried that also, but still getting the same

RE: Example Data Modelling

2015-07-07 Thread Peer, Oded
The data model suggested isn’t optimal for the “end of month” query you want to 
run since you are not querying by partition key.
The query would look like “select EmpID, FN, LN, basic from salaries where 
month = 1” which requires filtering and has unpredictable performance.

For this type of query to be fast you can use the “month” column as the 
partition key and the “EmpID” and the clustering column.
This approach also has drawbacks:
1. This data model creates a wide row. Depending on the number of employees 
this partition might be very large. You should limit partition sizes to 25MB
2. Distributing data according to month means that only a small number of nodes 
will hold all of the salary data for a specific month which might cause 
hotspots on those nodes.

Choose the approach that works best for you.


From: Carlos Alonso [mailto:i...@mrcalonso.com]
Sent: Monday, July 06, 2015 7:04 PM
To: user@cassandra.apache.org
Subject: Re: Example Data Modelling

Hi Srinivasa,

I think you're right, In Cassandra you should favor denormalisation when in 
RDBMS you find a relationship like this.

I'd suggest a cf like this
CREATE TABLE salaries (
  EmpID varchar,
  FN varchar,
  LN varchar,
  Phone varchar,
  Address varchar,
  month integer,
  basic integer,
  flexible_allowance float,
  PRIMARY KEY(EmpID, month)
)

That way the salaries will be partitioned by EmpID and clustered by month, 
which I guess is the natural sorting you want.

Hope it helps,
Cheers!

Carlos Alonso | Software Engineer | @calonsohttps://twitter.com/calonso

On 6 July 2015 at 13:01, Srinivasa T N 
seen...@gmail.commailto:seen...@gmail.com wrote:
Hi,
   I have basic doubt: I have an RDBMS with the following two tables:

   Emp - EmpID, FN, LN, Phone, Address
   Sal - Month, Empid, Basic, Flexible Allowance

   My use case is to print the Salary slip at the end of each month and the 
slip contains emp name and his other details.

   Now, if I want to have the same in cassandra, I will have a single cf with 
emp personal details and his salary details.  Is this the right approach?  
Should we have the employee personal details duplicated each month?

Regards,
Seenu.



RE: PrepareStatement problem

2015-06-16 Thread Peer, Oded
When you alter a table the Cassandra server invalidates the prepared statements 
it is holding, so when clients (like your own) execute the prepared statement 
the server informs the client it needs to be re-prepared and the client does it 
automatically.
If this isn’t working for you then you should comment with your use case on the 
jira issue.

From: joseph gao [mailto:gaojf.bok...@gmail.com]
Sent: Tuesday, June 16, 2015 10:31 AM
To: user@cassandra.apache.org
Subject: Re: PrepareStatement problem

But I'm using 2.1.6, I still get this bug. So, I should discard that 
PrepareStatement when I get the alter table message?  How can I get and deal 
that message?

2015-06-15 16:45 GMT+08:00 Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com:
This only applies to “select *” queries where you don’t specify the column 
names.
There is a reported bug and fixed in 2.1.3. See 
https://issues.apache.org/jira/browse/CASSANDRA-7910

From: joseph gao [mailto:gaojf.bok...@gmail.commailto:gaojf.bok...@gmail.com]
Sent: Monday, June 15, 2015 10:52 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: PrepareStatement problem

hi, all
  I'm using PrepareStatement. If I prepare a sql everytime I use, cassandra 
will give me a warning tell me NOT PREPARE EVERYTIME. So I Cache the 
PrepareStatement locally . But when other client change the table's schema, 
like, add a new Column, If I still use the former Cached PrepareStatement, the 
metadata will dismatch the data. The metadata tells n column, and the data 
tells n+1 column. So what should I do to avoid this problem?

--
--
Joseph Gao
PhoneNum:15210513582
QQ: 409343351



--
--
Joseph Gao
PhoneNum:15210513582
QQ: 409343351


RE: PrepareStatement problem

2015-06-15 Thread Peer, Oded
This only applies to “select *” queries where you don’t specify the column 
names.
There is a reported bug and fixed in 2.1.3. See 
https://issues.apache.org/jira/browse/CASSANDRA-7910

From: joseph gao [mailto:gaojf.bok...@gmail.com]
Sent: Monday, June 15, 2015 10:52 AM
To: user@cassandra.apache.org
Subject: PrepareStatement problem

hi, all
  I'm using PrepareStatement. If I prepare a sql everytime I use, cassandra 
will give me a warning tell me NOT PREPARE EVERYTIME. So I Cache the 
PrepareStatement locally . But when other client change the table's schema, 
like, add a new Column, If I still use the former Cached PrepareStatement, the 
metadata will dismatch the data. The metadata tells n column, and the data 
tells n+1 column. So what should I do to avoid this problem?

--
--
Joseph Gao
PhoneNum:15210513582
QQ: 409343351


RE: Insert Vs Updates - Both create tombstones

2015-05-14 Thread Peer, Oded
If this how you update then you are not creating tombstones.

If you used UPDATE it’s the same behavior. You are simply inserting a new value 
for the cell which does not create a tombstone.
When you modify data by using either the INSERT or the UPDATE command the value 
is stored along with a timestamp indicating the timestamp of the value.
Assume timestamp T1 is before T2 (T1  T2) and you stored value V2 with 
timestamp T2. Then you store V1 with timestamp T1.
Now you have two values of V in the DB: V2,T2, V1,T1
When you read the value of V from the DB you read both V2,T2, V1,T1, which 
may be in different sstables, Cassandra resolves the conflict by comparing the 
timestamp and returns V2.
Compaction will later take care and remove V1,T1 from the DB.


From: Walsh, Stephen [mailto:stephen.wa...@aspect.com]
Sent: Thursday, May 14, 2015 11:39 AM
To: user@cassandra.apache.org
Subject: RE: Insert Vs Updates - Both create tombstones

Thank you,

We are updating the entire row (all columns) each second via the “insert” 
command.
So if we did updates – no tombstones would be created?
But because we are doing inserts- we are creating tombstones for each column 
each insert?


From: Ali Akhtar [mailto:ali.rac...@gmail.com]
Sent: 13 May 2015 12:10
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Insert Vs Updates - Both create tombstones

Sorry, wrong thread. Disregard the above

On Wed, May 13, 2015 at 4:08 PM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:
If specifying 'using' timestamp, the docs say to provide microseconds, but 
where are these microseconds obtained from? I have regular java.util.Date 
objects, I can get the time in milliseconds (i.e the unix timestamp), how would 
I convert that to microseconds?

On Wed, May 13, 2015 at 3:45 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
Under the assumption that when you update the columns you also update the TTL 
for the columns then a tombstone won’t be created for those columns.
Remember that TTL is set on columns (or “cells”), not on rows, so your 
description of updating a row is slightly misleading. If every query updates 
different columns then different columns might expire at different times.

From: Walsh, Stephen 
[mailto:stephen.wa...@aspect.commailto:stephen.wa...@aspect.com]
Sent: Wednesday, May 13, 2015 1:35 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Insert Vs Updates - Both create tombstones

Quick Question,

Our team is under much debate, we are trying to find out if an Update on a row 
with a TTL will create a tombstone.

E.G

We have one row with a TTL, if we keep “updating” that row before the TTL is 
hit, will a tombstone be created.
I believe it will, but want to confirm.

So if that’s is  true,
And if our TTL is 10 seconds and we “update” the row every second, will 10 
tombstones be created after 10 seconds? Or just 1?
(and does the same apply for “insert”)

Regards
Stephen Walsh


This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Updating only modified records (where lastModified current date)

2015-05-13 Thread Peer, Oded
You can use the “last modified” value as the TIMESTAMP for your UPDATE 
operation.
This way the values will only be updated if lastModified date  the 
lastModified you have in the DB.

Updates to values don’t create tombstones. Only deletes (either by executing 
delete, inserting a null value or by setting a TTL) create tombstones.


From: Ali Akhtar [mailto:ali.rac...@gmail.com]
Sent: Wednesday, May 13, 2015 1:27 PM
To: user@cassandra.apache.org
Subject: Updating only modified records (where lastModified  current date)

I'm running some ETL jobs, where the pattern is the following:

1- Get some records from an external API,

2- For each record, see if its lastModified date  the lastModified i have in 
db (or if I don't have that record in db)

3- If lastModified  dbLastModified, the item wasn't changed, ignore it. 
Otherwise, run an update query and update that record.

(It is rare for existing records to get updated, so I'm not that concerned 
about tombstones).

The problem however is, since I have to query each record's lastModified, one 
at a time, that's adding a major bottleneck to my job.

E.g if I have 6k records, I have to run a total of 6k 'select lastModified from 
myTable where id = ?' queries.

Is there a better way, am I doing anything wrong, etc? Any suggestions would be 
appreciated.

Thanks.


RE: Insert Vs Updates - Both create tombstones

2015-05-13 Thread Peer, Oded
Under the assumption that when you update the columns you also update the TTL 
for the columns then a tombstone won't be created for those columns.
Remember that TTL is set on columns (or cells), not on rows, so your 
description of updating a row is slightly misleading. If every query updates 
different columns then different columns might expire at different times.

From: Walsh, Stephen [mailto:stephen.wa...@aspect.com]
Sent: Wednesday, May 13, 2015 1:35 PM
To: user@cassandra.apache.org
Subject: Insert Vs Updates - Both create tombstones

Quick Question,

Our team is under much debate, we are trying to find out if an Update on a row 
with a TTL will create a tombstone.

E.G

We have one row with a TTL, if we keep updating that row before the TTL is 
hit, will a tombstone be created.
I believe it will, but want to confirm.

So if that's is  true,
And if our TTL is 10 seconds and we update the row every second, will 10 
tombstones be created after 10 seconds? Or just 1?
(and does the same apply for insert)

Regards
Stephen Walsh


This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Updating only modified records (where lastModified current date)

2015-05-13 Thread Peer, Oded
The cost of issuing an UPDATE that won’t update anything is compaction 
overhead. Since you stated it’s rare for rows to be updated then the overhead 
should be negligible.

The easiest way to convert a milliseconds timestamp long value to microseconds 
is to multiply by 1000.

From: Ali Akhtar [mailto:ali.rac...@gmail.com]
Sent: Wednesday, May 13, 2015 2:15 PM
To: user@cassandra.apache.org
Subject: Re: Updating only modified records (where lastModified  current date)

Would TimeUnit.MILLISECONDS.toMicros(  myDate.getTime() ) work for producing 
the microsecond timestamp ?

On Wed, May 13, 2015 at 4:09 PM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:
If specifying 'using' timestamp, the docs say to provide microseconds, but 
where are these microseconds obtained from? I have regular java.util.Date 
objects, I can get the time in milliseconds (i.e the unix timestamp), how would 
I convert that to microseconds?

On Wed, May 13, 2015 at 3:56 PM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:
Thanks Peter, that's interesting. I didn't know of that option.

If updates don't create tombstones (and i'm already taking pains to ensure no 
nulls are present in queries), then is there no cost to just submitting an 
update for everything regardless of whether lastModified has changed?

Thanks.

On Wed, May 13, 2015 at 3:38 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
You can use the “last modified” value as the TIMESTAMP for your UPDATE 
operation.
This way the values will only be updated if lastModified date  the 
lastModified you have in the DB.

Updates to values don’t create tombstones. Only deletes (either by executing 
delete, inserting a null value or by setting a TTL) create tombstones.


From: Ali Akhtar [mailto:ali.rac...@gmail.commailto:ali.rac...@gmail.com]
Sent: Wednesday, May 13, 2015 1:27 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Updating only modified records (where lastModified  current date)

I'm running some ETL jobs, where the pattern is the following:

1- Get some records from an external API,

2- For each record, see if its lastModified date  the lastModified i have in 
db (or if I don't have that record in db)

3- If lastModified  dbLastModified, the item wasn't changed, ignore it. 
Otherwise, run an update query and update that record.

(It is rare for existing records to get updated, so I'm not that concerned 
about tombstones).

The problem however is, since I have to query each record's lastModified, one 
at a time, that's adding a major bottleneck to my job.

E.g if I have 6k records, I have to run a total of 6k 'select lastModified from 
myTable where id = ?' queries.

Is there a better way, am I doing anything wrong, etc? Any suggestions would be 
appreciated.

Thanks.





RE: Updating only modified records (where lastModified current date)

2015-05-13 Thread Peer, Oded
USING TIMESTAMP doesn’t avoid compaction overhead.
When you modify data the value is stored along with a timestamp indicating the 
timestamp of the value.
Assume timestamp T1  T2 and you stored value V with timestamp T2. Then you 
store V’ with timestamp T1.
Now you have two values of V in the DB: V,T2, V’,T1
When you read the value of V from the DB you read both V,T2, V’,T1, 
Cassandra resolves the conflict by comparing the timestamp and returns V.
Compaction will later take care and remove V’,T1 from the DB.

I don’t understand the ETL use case and its relevance here. Can you provide 
more details?

UPDATE in Cassandra updates specific rows. All of them are updated, nothing is 
ignored.


From: Ali Akhtar [mailto:ali.rac...@gmail.com]
Sent: Wednesday, May 13, 2015 2:43 PM
To: user@cassandra.apache.org
Subject: Re: Updating only modified records (where lastModified  current date)

Its rare for an existing record to have changes, but the etl job runs every 
hour, therefore it will send updates each time, regardless of whether there 
were changes or not.

(I'm assuming that USING TIMESTAMP here will avoid the compaction overhead, 
since that will cause it to not run any updates unless the timestamp is 
actually  last update timestamp?)

Also, is there a way to get the number of rows which were updated / ignored?

On Wed, May 13, 2015 at 4:37 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
The cost of issuing an UPDATE that won’t update anything is compaction 
overhead. Since you stated it’s rare for rows to be updated then the overhead 
should be negligible.

The easiest way to convert a milliseconds timestamp long value to microseconds 
is to multiply by 1000.

From: Ali Akhtar [mailto:ali.rac...@gmail.commailto:ali.rac...@gmail.com]
Sent: Wednesday, May 13, 2015 2:15 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Updating only modified records (where lastModified  current date)

Would TimeUnit.MILLISECONDS.toMicros(  myDate.getTime() ) work for producing 
the microsecond timestamp ?

On Wed, May 13, 2015 at 4:09 PM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:
If specifying 'using' timestamp, the docs say to provide microseconds, but 
where are these microseconds obtained from? I have regular java.util.Date 
objects, I can get the time in milliseconds (i.e the unix timestamp), how would 
I convert that to microseconds?

On Wed, May 13, 2015 at 3:56 PM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:
Thanks Peter, that's interesting. I didn't know of that option.

If updates don't create tombstones (and i'm already taking pains to ensure no 
nulls are present in queries), then is there no cost to just submitting an 
update for everything regardless of whether lastModified has changed?

Thanks.

On Wed, May 13, 2015 at 3:38 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
You can use the “last modified” value as the TIMESTAMP for your UPDATE 
operation.
This way the values will only be updated if lastModified date  the 
lastModified you have in the DB.

Updates to values don’t create tombstones. Only deletes (either by executing 
delete, inserting a null value or by setting a TTL) create tombstones.


From: Ali Akhtar [mailto:ali.rac...@gmail.commailto:ali.rac...@gmail.com]
Sent: Wednesday, May 13, 2015 1:27 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Updating only modified records (where lastModified  current date)

I'm running some ETL jobs, where the pattern is the following:

1- Get some records from an external API,

2- For each record, see if its lastModified date  the lastModified i have in 
db (or if I don't have that record in db)

3- If lastModified  dbLastModified, the item wasn't changed, ignore it. 
Otherwise, run an update query and update that record.

(It is rare for existing records to get updated, so I'm not that concerned 
about tombstones).

The problem however is, since I have to query each record's lastModified, one 
at a time, that's adding a major bottleneck to my job.

E.g if I have 6k records, I have to run a total of 6k 'select lastModified from 
myTable where id = ?' queries.

Is there a better way, am I doing anything wrong, etc? Any suggestions would be 
appreciated.

Thanks.






RE: Updating only modified records (where lastModified current date)

2015-05-13 Thread Peer, Oded
It will cause an overhead (compaction and read) as I described in the previous 
email.

From: Ali Akhtar [mailto:ali.rac...@gmail.com]
Sent: Wednesday, May 13, 2015 3:13 PM
To: user@cassandra.apache.org
Subject: Re: Updating only modified records (where lastModified  current date)

 I don’t understand the ETL use case and its relevance here. Can you provide 
 more details?

Basically, every 1 hour a job runs which queries an external API and gets some 
records. Then, I want to take only new or updated records, and insert / update 
them in cassandra. For records that are already in cassandra and aren't 
modified, I want to ignore them.

Each record returns a lastModified datetime, I want to use that to determine 
whether a record was changed or not (if it was, it'd be updated, if not, it'd 
be ignored).

The issue was, I'm having to do a 'select lastModified from table where id = ?' 
query for every record, in order to determine if db lastModified  api 
lastModified or not. I was wondering if there was a way to avoid that.

If I use 'USING TIMESTAMP', would subsequent updates where lastModified is a 
value that was previously used, still create that overhead, or will they be 
ignored?

E.g if I issued an update where TIMESTAMP is X, then 1 hour later I issued 
another update where TIMESTAMP is still X, will that 2nd update essentially get 
ignored, or will it cause any overhead?

On Wed, May 13, 2015 at 5:02 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
USING TIMESTAMP doesn’t avoid compaction overhead.
When you modify data the value is stored along with a timestamp indicating the 
timestamp of the value.
Assume timestamp T1  T2 and you stored value V with timestamp T2. Then you 
store V’ with timestamp T1.
Now you have two values of V in the DB: V,T2, V’,T1
When you read the value of V from the DB you read both V,T2, V’,T1, 
Cassandra resolves the conflict by comparing the timestamp and returns V.
Compaction will later take care and remove V’,T1 from the DB.

I don’t understand the ETL use case and its relevance here. Can you provide 
more details?

UPDATE in Cassandra updates specific rows. All of them are updated, nothing is 
ignored.


From: Ali Akhtar [mailto:ali.rac...@gmail.commailto:ali.rac...@gmail.com]
Sent: Wednesday, May 13, 2015 2:43 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Updating only modified records (where lastModified  current date)

Its rare for an existing record to have changes, but the etl job runs every 
hour, therefore it will send updates each time, regardless of whether there 
were changes or not.

(I'm assuming that USING TIMESTAMP here will avoid the compaction overhead, 
since that will cause it to not run any updates unless the timestamp is 
actually  last update timestamp?)

Also, is there a way to get the number of rows which were updated / ignored?

On Wed, May 13, 2015 at 4:37 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
The cost of issuing an UPDATE that won’t update anything is compaction 
overhead. Since you stated it’s rare for rows to be updated then the overhead 
should be negligible.

The easiest way to convert a milliseconds timestamp long value to microseconds 
is to multiply by 1000.

From: Ali Akhtar [mailto:ali.rac...@gmail.commailto:ali.rac...@gmail.com]
Sent: Wednesday, May 13, 2015 2:15 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Updating only modified records (where lastModified  current date)

Would TimeUnit.MILLISECONDS.toMicros(  myDate.getTime() ) work for producing 
the microsecond timestamp ?

On Wed, May 13, 2015 at 4:09 PM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:
If specifying 'using' timestamp, the docs say to provide microseconds, but 
where are these microseconds obtained from? I have regular java.util.Date 
objects, I can get the time in milliseconds (i.e the unix timestamp), how would 
I convert that to microseconds?

On Wed, May 13, 2015 at 3:56 PM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:
Thanks Peter, that's interesting. I didn't know of that option.

If updates don't create tombstones (and i'm already taking pains to ensure no 
nulls are present in queries), then is there no cost to just submitting an 
update for everything regardless of whether lastModified has changed?

Thanks.

On Wed, May 13, 2015 at 3:38 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
You can use the “last modified” value as the TIMESTAMP for your UPDATE 
operation.
This way the values will only be updated if lastModified date  the 
lastModified you have in the DB.

Updates to values don’t create tombstones. Only deletes (either by executing 
delete, inserting a null value or by setting a TTL) create tombstones.


From: Ali Akhtar [mailto:ali.rac...@gmail.commailto:ali.rac...@gmail.com]
Sent: Wednesday, May 13, 2015 1:27 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org

RE: Inserting null values

2015-05-07 Thread Peer, Oded
I’ve added an option to prevent tombstone creation when using 
PreparedStatements to trunk, see CASSANDRA-7304.

The problem is having tombstones in regular columns.
When you perform a read request (range query or by PK):
- Cassandra iterates over all the cells (all, not only the cells specified in 
the query) in the relevant rows while counting tombstone cells 
(https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java#L199)
- creates a ColumnFamily object instance with the rows
- filters the selected columns from the internal CF 
(https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java#L653)
- returns the result

If you have many unnecessary tombstones you read many unnecessary cells.



From: Eric Stevens [mailto:migh...@gmail.com]
Sent: Wednesday, May 06, 2015 4:37 PM
To: user@cassandra.apache.org
Subject: Re: Inserting null values

I agree that inserting null is not as good as not inserting that column at all 
when you have confidence that you are not shadowing any underlying data. But 
pragmatically speaking it really doesn't sound like a small number of 
incidental nulls/tombstones ( 20% of columns, otherwise CASSANDRA-3442 takes 
over) is going to have any performance impact either in your query patterns or 
in compaction in any practical sense.

If INSERT of null values is problematic for small portions of your data, then 
it stands to reason that an INSERT option containing an instruction to prevent 
tombstone creation would be an important performance optimization (and would 
also address the fact that non-null collections also generate tombstones on 
INSERT as well).  INSERT INTO ... USING no_tombstones;


 There's thresholds (log messages, etc.) which operate on tombstone counts 
 over a certain number, but not on column counts over the same number.

tombstone_warn_threshold and tombstone_failure_threshold only apply to 
clustering scans right?  I.E. tombstones don't count against those thresholds 
if they are not part of the clustering key column being considered for the 
non-EQ relation?  The documentation certainly implies so:

tombstone_warn_threshold¶http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_warn_threshold
(Default: 1000) The maximum number of tombstones a query can scan before 
warning.
tombstone_failure_threshold¶http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_failure_threshold
(Default: 10) The maximum number of tombstones a query can scan before 
aborting.

On Wed, Apr 29, 2015 at 12:42 PM, Robert Coli 
rc...@eventbrite.commailto:rc...@eventbrite.com wrote:
On Wed, Apr 29, 2015 at 9:16 AM, Eric Stevens 
migh...@gmail.commailto:migh...@gmail.com wrote:
In the end, inserting a tombstone into a non-clustered column shouldn't be 
appreciably worse (if it is at all) than inserting a value instead.  Or am I 
missing something here?

There's thresholds (log messages, etc.) which operate on tombstone counts over 
a certain number, but not on column counts over the same number.

Given that tombstones are often smaller than data columns, sorta hard to 
understand conceptually?

=Rob




RE: Data Modeling for 2.1 Cassandra

2015-04-30 Thread Peer, Oded
In general your data model should match your queries in Cassandra.
In the examples you provided the queries are by name, not by ID, so I don’t see 
much use in using ID as the primary key.
Without much context, like why you are using SET or if queries must specify 
both first_name and last_name which is not supported in option 2 , I think it 
would make sense to use the following model for your data:

CREATE TABLE users (
   first_name text,
   last_name text,
   dob text,
   id int
   PRIMARY KEY ((first_name, last_name)) // Note this defines a composite 
partition key by using an extra set of parentheses
);
INSERT INTO users(first_name, last_name, id) values (‘neha’, ‘dave’, 1);
SELECT * FROM users where first_name = 'rob' and last_name = 'abb';



From: Neha Trivedi [mailto:nehajtriv...@gmail.com]
Sent: Thursday, April 30, 2015 10:16 AM
To: user@cassandra.apache.org
Subject: Data Modeling for 2.1 Cassandra

Helle all,
I was wondering which data model of the Three describe below better in terms of 
performance. Seems 3 is good.

#1. log with 3 Index

CREATE TABLE log (
id int PRIMARY KEY,
first_name settext,
last_name settext,
dob set text
   );
CREATE INDEX log_firstname_index ON test.log (first_name);
CREATE INDEX log_lastname_index ON test.log (last_name);
CREATE INDEX log_dob_index ON test.log (dob);
INSERT INTO log(id, first_name,last_name) VALUES ( 3, {'rob'},{'abbate'});
INSERT INTO log(id, first_name,last_name) VALUES ( 4, {'neha'},{'dave'});
select id from log where first_name contains 'rob';
select id from log where last_name contains 'abbate';

#2. log with UDT

CREATE TYPE test.user_profile (
first_name text,
last_name text,
dob text
);

CREATE TABLE test.log_udt (
id int PRIMARY KEY,
userinfo setfrozenuser_profile
);
CREATE INDEX log_udt1__index ON test.log_udt1 (userinfo);
INSERT INTO log_udt1 (id, userinfo ) values ( 
1,{first_name:'rob',last_name:'abb',dob: 'dob'});
INSERT INTO log_udt1 (id, userinfo ) values ( 
2,{first_name:'neha',last_name:'dave',dob: 'dob1'});

select * FROM log_udt1 where userinfo = {first_name: 'rob', last_name: 'abb', 
dob: 'dob'};
This will not do query like : select id from log_fname where first_name 
contains 'rob';

#3. log with different Tables for each


CREATE TABLE log_fname (
id int PRIMARY KEY,
first_name settext,
   );
CREATE INDEX log_firstname_index ON test.log_fname (first_name);
CREATE TABLE log_lname (
id int PRIMARY KEY,
last_name settext,
   );
CREATE INDEX log_lastname_index ON test.log_lname (last_name);
CREATE TABLE log_dob (
id int PRIMARY KEY,
dob set text
   );
CREATE INDEX log_dob_index ON test.log_dob (dob);

INSERT INTO log_fname(id, first_name) VALUES ( 3, {'rob'});
INSERT INTO log_lname(id, last_name) VALUES ( 4, {'dave'});
select id from log_fname where first_name contains 'rob';
select id from log_lname where last_name contains 'abbate';

Regards
Neha


RE: Inserting null values

2015-04-29 Thread Peer, Oded
Inserting a null value creates a tombstone. Tombstones can have major 
performance implications.
You can see the tombstones using sstable2json.
If you have a small number of records with null values this seems OK, otherwise 
I recommend using the QueryBuilder (for Java clients) and waiting for 
https://issues.apache.org/jira/browse/CASSANDRA-7304


From: Matthew Johnson [mailto:matt.john...@algomi.com]
Sent: Wednesday, April 29, 2015 11:37 AM
To: user@cassandra.apache.org
Subject: Inserting null values

Hi all,

I have some fields that I am storing into Cassandra, but some of them could be 
null at any given point. As there are quite a lot of them, it makes the code 
much more readable if I don’t check each one for null before adding it to the 
INSERT.

I can see a few Jiras around CQL 3 supporting inserting nulls:

https://issues.apache.org/jira/browse/CASSANDRA-3783
https://issues.apache.org/jira/browse/CASSANDRA-5648

But I have tested inserting null and it seems to work fine (when querying the 
table with cqlsh, it shows up as a red lowercase null).

Are there any obvious pitfalls to look out for that I have missed? Could it be 
a performance concern to insert a row with some nulls, as opposed to checking 
the values first and inserting the row and just omitting those columns?

Thanks!
Matt



RE: Tables showing up as our_table-147a2090ed4211e480153bc81e542ebd/ in data dir

2015-04-29 Thread Peer, Oded
See https://github.com/apache/cassandra/blob/trunk/NEWS.txt#L173



SSTable data directory name is slightly changed. Each directory will have hex 
string appended after CF name, e.g. ks/cf-5be396077b811e3a3ab9dc4b9ac088d/

This hex string part represents unique ColumnFamily ID.

Note that existing directories are used as is, so only newly created 
directories after upgrade have new directory name format.


From: Donald Smith [mailto:donald.sm...@audiencescience.com]
Sent: Wednesday, April 29, 2015 8:04 AM
To: user@cassandra.apache.org
Subject: Tables showing up as our_table-147a2090ed4211e480153bc81e542ebd/ in 
data dir


Using 2.1.4, tables in our data/ directory are showing up as



our_table-147a2090ed4211e480153bc81e542ebd/



instead of as



 our_table/



Why would that happen? We're also seeing lagging compactions and high cpu usage.



 Thanks, Don


RE: Data model suggestions

2015-04-27 Thread Peer, Oded
I recommend truncating the table instead of dropping it since you don’t need to 
re-issue DDL commands and put load on the system keyspace.
Both DROP and TRUNCATE automatically create snapshots, there no “snapshotting” 
advantage for using DROP . See 
http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__auto_snapshot


From: Ali Akhtar [mailto:ali.rac...@gmail.com]
Sent: Sunday, April 26, 2015 10:31 PM
To: user@cassandra.apache.org
Subject: Re: Data model suggestions

Thanks Peer. I like the approach you're suggesting.

Why do you recommend truncating the last active table rather than just dropping 
it? Since all the data would be inserted into a new table, seems like it would 
make sense to drop the last table, and that way truncate snapshotting also 
won't have to be dealt with (unless I'm missing anything).

Thanks.


On Sun, Apr 26, 2015 at 1:29 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
I would maintain two tables.
An “archive” table that holds all the active and inactive records, and is 
updated hourly (re-inserting the same record has some compaction overhead but 
on the other side deleting records has tombstones overhead).
An “active” table which holds all the records in the last external API 
invocation.
To avoid tombstones and read-before-delete issues “active” should actually a 
synonym, an alias, to the most recent active table.
I suggest you create two identical tables, “active1” and “active2”, and an 
“active_alias” table that informs which of the two is the most recent.
Thus when you query the external API you insert the data to “archive” and to 
the unaliased “activeN” table, switch the alias value in “active_alias” and 
truncate the new unaliased “activeM” table.
No need to query the data before inserting it. Make sure truncating doesn’t 
create automatic snapshots.


From: Narendra Sharma 
[mailto:narendra.sha...@gmail.commailto:narendra.sha...@gmail.com]
Sent: Friday, April 24, 2015 6:53 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Data model suggestions


I think one table say record should be good. The primary key is record id. This 
will ensure good distribution.
Just update the active attribute to true or false.
For range query on active vs archive records maintain 2 indexes or try 
secondary index.
On Apr 23, 2015 1:32 PM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:
Good point about the range selects. I think they can be made to work with 
limits, though. Or, since the active records will never usually be  500k, the 
ids may just be cached in memory.

Most of the time, during reads, the queries will just consist of select * where 
primaryKey = someValue . One row at a time.

The question is just, whether to keep all records in one table (including 
archived records which wont be queried 99% of the time), or to keep active 
records in their own table, and delete them when they're no longer active. Will 
that produce tombstone issues?

On Fri, Apr 24, 2015 at 12:56 AM, Manoj Khangaonkar 
khangaon...@gmail.commailto:khangaon...@gmail.com wrote:
Hi,
If your external API returns active records, that means I am guessing you need 
to do a select * on the active table to figure out which records in the table 
are no longer active.
You might be aware that range selects based on partition key will timeout in 
cassandra. They can however be made to work using the column cluster key.
To comment more, We would need to see your proposed cassandra tables and 
queries that you might need to run.
regards



On Thu, Apr 23, 2015 at 9:45 AM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:
That's returned by the external API we're querying. We query them for active 
records, if a previous active record isn't included in the results, that means 
its time to archive that record.

On Thu, Apr 23, 2015 at 9:20 PM, Manoj Khangaonkar 
khangaon...@gmail.commailto:khangaon...@gmail.com wrote:
Hi,
How do you determine if the record is no longer active ? Is it a perioidic 
process that goes through every record and checks when the last update happened 
?
regards

On Thu, Apr 23, 2015 at 8:09 AM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:
Hey all,

We are working on moving a mysql based application to Cassandra.

The workflow in mysql is this: We have two tables: active and archive . Every 
hour, we pull in data from an external API. The records which are active, are 
kept in 'active' table. Once a record is no longer active, its deleted from 
'active' and re-inserted into 'archive'

The purpose for that, is because most of the time, queries are only done 
against the active records rather than archived. Therefore keeping the active 
table small may help with faster queries, if it only has to search 200k records 
vs 3 million or more.

Is it advisable to keep the same data model in Cassandra? I'm concerned about

RE: Data model suggestions

2015-04-26 Thread Peer, Oded
I would maintain two tables.
An “archive” table that holds all the active and inactive records, and is 
updated hourly (re-inserting the same record has some compaction overhead but 
on the other side deleting records has tombstones overhead).
An “active” table which holds all the records in the last external API 
invocation.
To avoid tombstones and read-before-delete issues “active” should actually a 
synonym, an alias, to the most recent active table.
I suggest you create two identical tables, “active1” and “active2”, and an 
“active_alias” table that informs which of the two is the most recent.
Thus when you query the external API you insert the data to “archive” and to 
the unaliased “activeN” table, switch the alias value in “active_alias” and 
truncate the new unaliased “activeM” table.
No need to query the data before inserting it. Make sure truncating doesn’t 
create automatic snapshots.


From: Narendra Sharma [mailto:narendra.sha...@gmail.com]
Sent: Friday, April 24, 2015 6:53 AM
To: user@cassandra.apache.org
Subject: Re: Data model suggestions


I think one table say record should be good. The primary key is record id. This 
will ensure good distribution.
Just update the active attribute to true or false.
For range query on active vs archive records maintain 2 indexes or try 
secondary index.
On Apr 23, 2015 1:32 PM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:
Good point about the range selects. I think they can be made to work with 
limits, though. Or, since the active records will never usually be  500k, the 
ids may just be cached in memory.

Most of the time, during reads, the queries will just consist of select * where 
primaryKey = someValue . One row at a time.

The question is just, whether to keep all records in one table (including 
archived records which wont be queried 99% of the time), or to keep active 
records in their own table, and delete them when they're no longer active. Will 
that produce tombstone issues?

On Fri, Apr 24, 2015 at 12:56 AM, Manoj Khangaonkar 
khangaon...@gmail.commailto:khangaon...@gmail.com wrote:
Hi,
If your external API returns active records, that means I am guessing you need 
to do a select * on the active table to figure out which records in the table 
are no longer active.
You might be aware that range selects based on partition key will timeout in 
cassandra. They can however be made to work using the column cluster key.
To comment more, We would need to see your proposed cassandra tables and 
queries that you might need to run.
regards



On Thu, Apr 23, 2015 at 9:45 AM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:
That's returned by the external API we're querying. We query them for active 
records, if a previous active record isn't included in the results, that means 
its time to archive that record.

On Thu, Apr 23, 2015 at 9:20 PM, Manoj Khangaonkar 
khangaon...@gmail.commailto:khangaon...@gmail.com wrote:
Hi,
How do you determine if the record is no longer active ? Is it a perioidic 
process that goes through every record and checks when the last update happened 
?
regards

On Thu, Apr 23, 2015 at 8:09 AM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:
Hey all,

We are working on moving a mysql based application to Cassandra.

The workflow in mysql is this: We have two tables: active and archive . Every 
hour, we pull in data from an external API. The records which are active, are 
kept in 'active' table. Once a record is no longer active, its deleted from 
'active' and re-inserted into 'archive'

The purpose for that, is because most of the time, queries are only done 
against the active records rather than archived. Therefore keeping the active 
table small may help with faster queries, if it only has to search 200k records 
vs 3 million or more.

Is it advisable to keep the same data model in Cassandra? I'm concerned about 
tombstone issues when records are deleted from active.

Thanks.


--
http://khangaonkar.blogspot.com/



--
http://khangaonkar.blogspot.com/



RE: sstable writer and creating bytebuffers

2015-03-31 Thread Peer, Oded
I may have overcomplicated things.
In my opinion creating a CompositeType with a single type should throw an 
exception.


From: Sylvain Lebresne [mailto:sylv...@datastax.com]
Sent: Tuesday, March 31, 2015 10:18 AM
To: user@cassandra.apache.org
Subject: Re: sstable writer and creating bytebuffers

On Tue, Mar 31, 2015 at 7:42 AM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
Thanks Sylvain.
Is there any way to create a composite key with only one column in Cassandra 
when creating a table, or should creating a CompositeType instance with a 
single column be prohibited?

It's hard to answer without knowing what you are trying to achieve. Provided I 
don't misunderstand what you are asking, then yes, it's technically possible, 
but it's hard to say how wise that is unless I know more about your 
constraints/the reasons you're considering that. Let's say that in general, if 
you have only a single column, then there isn't too much reasons to use a 
CompositeType.

--
Sylvain



From: Sylvain Lebresne 
[mailto:sylv...@datastax.commailto:sylv...@datastax.com]
Sent: Monday, March 30, 2015 1:57 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: sstable writer and creating bytebuffers

No, it's not a bug. In a composite every elements start by a 2 short indicating 
the size of the element, plus an extra byte that is used for sorting purposes. 
A little bit more details can be found in the CompositeType class javadoc if 
you're interested. It's not the most compact format there is but changing it 
would break backward compatibility anyway.

On Mon, Mar 30, 2015 at 12:38 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
I am writing code to bulk load data into Cassandra using 
SSTableSimpleUnsortedWriter
I changed my partition key from a composite key (long, int) to a single column 
key (long).
For creating the composite key I used a CompositeType, and I kept using it 
after changing the key to a single column.
My code didn’t work until I changed the way I create the ByteBuffer not to use 
CompositeType.

The following code prints ‘false’.
Do you consider this a bug?

  long val = 123L;
  ByteBuffer direct = bytes( val );
  ByteBuffer composite = CompositeType.getInstance( 
LongType.instance ).builder().add( bytes( val ) ).build();
  System.out.println( direct.equals( composite ) );





sstable writer and creating bytebuffers

2015-03-30 Thread Peer, Oded
I am writing code to bulk load data into Cassandra using 
SSTableSimpleUnsortedWriter
I changed my partition key from a composite key (long, int) to a single column 
key (long).
For creating the composite key I used a CompositeType, and I kept using it 
after changing the key to a single column.
My code didn't work until I changed the way I create the ByteBuffer not to use 
CompositeType.

The following code prints 'false'.
Do you consider this a bug?

  long val = 123L;
  ByteBuffer direct = bytes( val );
  ByteBuffer composite = CompositeType.getInstance( 
LongType.instance ).builder().add( bytes( val ) ).build();
  System.out.println( direct.equals( composite ) );



RE: sstable writer and creating bytebuffers

2015-03-30 Thread Peer, Oded
Thanks Sylvain.
Is there any way to create a composite key with only one column in Cassandra 
when creating a table, or should creating a CompositeType instance with a 
single column be prohibited?


From: Sylvain Lebresne [mailto:sylv...@datastax.com]
Sent: Monday, March 30, 2015 1:57 PM
To: user@cassandra.apache.org
Subject: Re: sstable writer and creating bytebuffers

No, it's not a bug. In a composite every elements start by a 2 short indicating 
the size of the element, plus an extra byte that is used for sorting purposes. 
A little bit more details can be found in the CompositeType class javadoc if 
you're interested. It's not the most compact format there is but changing it 
would break backward compatibility anyway.

On Mon, Mar 30, 2015 at 12:38 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
I am writing code to bulk load data into Cassandra using 
SSTableSimpleUnsortedWriter
I changed my partition key from a composite key (long, int) to a single column 
key (long).
For creating the composite key I used a CompositeType, and I kept using it 
after changing the key to a single column.
My code didn’t work until I changed the way I create the ByteBuffer not to use 
CompositeType.

The following code prints ‘false’.
Do you consider this a bug?

  long val = 123L;
  ByteBuffer direct = bytes( val );
  ByteBuffer composite = CompositeType.getInstance( 
LongType.instance ).builder().add( bytes( val ) ).build();
  System.out.println( direct.equals( composite ) );




RE: Saving a file using cassandra

2015-03-30 Thread Peer, Oded
Try this
http://stackoverflow.com/a/17208343/248656


From: jean paul [mailto:researche...@gmail.com]
Sent: Wednesday, March 18, 2015 7:06 PM
To: user@cassandra.apache.org
Subject: Saving a file using cassandra

Hello,
Finally, i have created my ring using cassandra.
Please, i'd like to store a file replicated 2 times in my cluster.
is that possible ? can you please send me a link for a tutorial ?

Thanks a lot.
Best Regards.


Configuring all nodes as seeds

2014-06-18 Thread Peer, Oded
My intended Cassandra cluster will have 15 nodes per DC, with 2 DCs.
I am considering using all the nodes as seed nodes.
It looks like having all the nodes as seeds should actually reduce the Gossip 
overhead (See Gossiper implementation in 
http://wiki.apache.org/cassandra/ArchitectureGossip)
Is there any reason not do this?



RE: Problem using sstableloader with SSTableSimpleUnsortedWriter and a composite key

2013-06-30 Thread Peer, Oded
Thank you Aaaron!

Your blog post helped me understand how a row with a compound key is stored and 
this helped me understand how to create the sstable files.
For anyone who needs it this is how it works:

In Cassandra-cli the row looks like this:
RowKey: 5
= (column=10:created, value=013f84be6288, timestamp=137232163700)

From this we see that the row key is a single Long value 5, and it has one 
composite column 10:created with a timestamp value.
Thus the code should look like this:

   File directory = new File( System.getProperty( output ) );
   IPartitioner partitioner = new Murmur3Partitioner();
   String keyspace = test_keyspace;
   String columnFamily = test_table;
   ListAbstractType? compositeList = new ArrayListAbstractType?();
   compositeList.add( LongType.instance );
   compositeList.add( LongType.instance );
   CompositeType compositeType = CompositeType.getInstance( compositeList );
   SSTableSimpleUnsortedWriter sstableWriter = new SSTableSimpleUnsortedWriter(
  directory,
  partitioner,
  keyspace,
  columnFamily,
  compositeType,
  null,
  64 );
   long timestamp = 1372321637000L;
   long nanotimestamp = timestamp * 1000;
   long k1 = 5L;
   long k2 = 10L;
   sstableWriter.newRow( bytes( k1 ) );
   sstableWriter.addColumn( compositeType.builder().add( bytes( k2 ) ).add( 
bytes( created ) ).build(), bytes( timestamp ), nanotimestamp );
   sstableWriter.close();





Problem using sstableloader with SSTableSimpleUnsortedWriter and a composite key

2013-06-27 Thread Peer, Oded
Hi,

I am using Cassandra 1.2.5. I built a cluster of 2 data centers with 3 nodes in 
each data center.
I created a key space and table with a composite key:
   create keyspace test_keyspace WITH replication = {'class': 
'NetworkTopologyStrategy', 'DC1' : 1, 'DC2' : 1};
   create table test_table ( k1 bigint, k2 bigint, created timestamp, PRIMARY 
KEY (k1, k2) ) with compaction = { 'class' : 'LeveledCompactionStrategy' };
I then tried to load data to the table using sstableloader, which uses input 
created via SSTableSimpleUnsortedWriter using the following code:

   File directory = new File( System.getProperty( output ) );
   IPartitioner partitioner = new Murmur3Partitioner();
   String keyspace = test_keyspace;
   String columnFamily = test_table;
   ListAbstractType? compositeList = new ArrayListAbstractType?();
   compositeList.add( LongType.instance );
   compositeList.add( LongType.instance );
   CompositeType compositeType = CompositeType.getInstance( compositeList );
   SSTableSimpleUnsortedWriter sstableWriter = new SSTableSimpleUnsortedWriter(
  directory,
  partitioner,
  keyspace,
  columnFamily,
  compositeType,
  null,
  64 );
   long timestamp = 1372321637000L;
   long nanotimestamp = timestamp * 1000;
   sstableWriter.newRow( compositeType.builder().add( bytes( 1L ) ).add( bytes( 
1L ) ).build() );
   sstableWriter.addColumn( bytes( created ), bytes( timestamp ), 
nanotimestamp );
   sstableWriter.close();
   System.exit( 0 );

I then load the sstable files using the command sstableloader -d node1 -v 
-debug test_keyspace/test_table/
The command ends without any indication of a problem, but the table remains 
empty.
I see an exception in one of the nodes system.log:
java.lang.RuntimeException: java.lang.IllegalArgumentException
at 
org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(RangeSliceVerbHandler.java:64)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:247)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:51)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126)
at 
org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96)
at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164)
at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136)
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84)
at 
org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:106)
at 
org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:79)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:114)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:97)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.db.ColumnFamilyStore$3.computeNext(ColumnFamilyStore.java:1399)
at 
org.apache.cassandra.db.ColumnFamilyStore$3.computeNext(ColumnFamilyStore.java:1395)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:1466)
at 
org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1443)
at 
org.apache.cassandra.service.RangeSliceVerbHandler.executeLocally(RangeSliceVerbHandler.java:46)
at 
org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(RangeSliceVerbHandler.java:58)
... 4 more

Am I using the CompositeType and SSTableSimpleUnsortedWriter correctly?