No deletes - is periodic repair needed? I think not...

2014-01-25 Thread Laing, Michael
I have a simple set of tables that can be grouped as follows: 1. Regular values, no deletes, no overwrites, write heavy, ttl's to manage size 2. Regular values, no deletes, some overwrites, read heavy (10 to 1), ttl's to manage size 3. Counter values, no deletes, update heavy,

Re: Latest Stable version of cassandra in production

2014-01-09 Thread Laing, Michael
I would like to +1 Jan. We are using C* 2.0 and have just gone into production directly supporting the latest revision of www.nytimes.com. I avoid new features unless I really need them; we are prepared to read code and make fixes ourselves if necessary, but it has not been. Best regards,

Re: Latest Stable version of cassandra in production

2014-01-09 Thread Laing, Michael
are you getting from 2.0 if you aren't using the new features? Why not stick with 1.2.x? cheers, Bruce On Thu, Jan 9, 2014 at 12:37 PM, Laing, Michael michael.la...@nytimes.com wrote: I would like to +1 Jan. We are using C* 2.0 and have just gone into production directly supporting

Re: Crash with TombstoneOverwhelmingException

2013-12-25 Thread Laing, Michael
It's a feature: In the stock cassandra.yaml file for 2.03 see: # When executing a scan, within or across a partition, we need to keep the # tombstones seen in memory so we can return them to the coordinator, which # will use them to make sure other replicas also know about the deleted rows.

Re: Recurring actions with 4 hour interval

2013-12-10 Thread Laing, Michael
2.0.3: system tables have a 1 hour memtable_flush_period which I have observed to trigger compaction on the 4 hour mark. Going by memory tho... -ml On Tue, Dec 10, 2013 at 10:31 AM, Andre Sprenger andre.spren...@getanet.dewrote: As far as I know there is nothing hard coded in Cassandra that

Re: Exactly one wide row per node for a given CF?

2013-12-10 Thread Laing, Michael
You could shard your rows like the following. You would need over 100 shards, possibly... so testing is in order :) Michael -- put this in file and run using 'cqlsh -f file DROP KEYSPACE robert_test; CREATE KEYSPACE robert_test WITH replication = { 'class': 'SimpleStrategy',

Re: Nodetool repair exceptions in Cassandra 2.0.2

2013-12-09 Thread Laing, Michael
My experience is that you must upgrade to 2.0.3 ASAP to fix this. Michael On Mon, Dec 9, 2013 at 6:39 PM, David Laube d...@stormpath.com wrote: Hi All, We are running Cassandra 2.0.2 and have recently stumbled upon an issue with nodetool repair. Upon running nodetool repair on each of the

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Laing, Michael
We use the python-driver and have contributed some to its development. I have been careful to not push too fast on features until we need them. For example, we have just started using prepared statements - working well BTW. Next we will employ futures and start to exploit the async nature of new

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Laing, Michael
and thread pooling in python-driver? For now, i would avoid object mapper cqlengine, just because of my deadlines. — Sent from Mailbox https://www.dropbox.com/mailbox for iPhone On Tue, Nov 26, 2013 at 1:52 PM, Laing, Michael michael.la...@nytimes.com wrote: We use the python-driver

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Laing, Michael
That's not a problem we have faced yet. On Tue, Nov 26, 2013 at 2:46 PM, Kumar Ranjan winnerd...@gmail.com wrote: How do you insert huge amount of data? — Sent from Mailbox https://www.dropbox.com/mailbox for iPhone On Tue, Nov 26, 2013 at 2:31 PM, Laing, Michael michael.la...@nytimes.com

Re: Efficient IP address location lookup

2013-11-16 Thread Laing, Michael
This approach is similar to Janne's. But I used a shard as an example to make more even rows, and just converted each IP to an int. -- put this in file and run using 'cqlsh -f file DROP KEYSPACE jacob_test; CREATE KEYSPACE jacob_test WITH replication = { 'class': 'SimpleStrategy',

Re: How would you model that?

2013-11-08 Thread Laing, Michael
You could try this: CREATE TABLE user_activity (shard text, user text, ts timeuuid, primary key (shard, ts)); select user, ts from user_activity where shard in ('00', '01', ...) order by ts desc; Grab each user and ts the first time you see that user. Use as many shards as you think you need

Re: IN predicates on non-primary-key columns (%s) is not yet supported - then will it be ?

2013-11-08 Thread Laing, Michael
try this: CREATE COLUMNFAMILY post ( KEY uuid, author uuid, blog timeuuid, -- sortable name text, data text, PRIMARY KEY ( KEY, blog ) ); create index on post (author); SELECT * FROM post WHERE blog = 4d6b5fc5-487b-11e3-a6f4-406c8f1838fa AND blog =

Re: Best data structure for tracking most recent updates.

2013-11-08 Thread Laing, Michael
Here are a couple ideas: 1. You can rotate tables and truncate to avoid deleting. 2. You can shard your tables (partition key) to mitigate hotspots. 3. You can use a column key to store rows in timeuuid sequence. create table recent_updates_00 (shard text, uuid timeuuid, message text, primary

Re: How to select timestamp with CQL

2013-10-23 Thread Laing, Michael
http://www.datastax.com/documentation/cql/3.1/webhelp/index.html#cql/cql_reference/select_r.html On Wed, Oct 23, 2013 at 6:50 AM, Alex N lot...@gmail.com wrote: Thanks! I can't find it in the documentation... 2013/10/23 Cyril Scetbon cyril.scet...@free.fr Hi, Now you can ask for the

Re: Nodes not added to existing cluster

2013-09-25 Thread Laing, Michael
Check your security groups to be sure you have appropriate access. If in a VPC check both IN and OUT; if using ACLs check those. On Wed, Sep 25, 2013 at 3:41 PM, Skye Book skye.b...@gmail.com wrote: Hi all, I have a three node cluster using the EC2 Multi-Region Snitch currently operating

Re: Composite Column Grouping

2013-09-11 Thread Laing, Michael
script above: select_stmt = select * from time_series where userid = 'XYZ' This would return me many hundreds of thousands of columns. I need to go in time-series order using ranges [Pagination queries]. On Wed, Sep 11, 2013 at 7:06 AM, Laing, Michael michael.la...@nytimes.com wrote

Re: Composite Column Grouping

2013-09-11 Thread Laing, Michael
Col-Name-2 1001 On Wed, Sep 11, 2013 at 6:13 AM, Laing, Michael michael.la...@nytimes.comwrote: Then you can do this. I handle millions of entries this way and it works well if you are mostly interested in recent activity. If you need to span all activity then you can use a separate table

Re: Complex JSON objects

2013-09-11 Thread Laing, Michael
A way to do this would be to express the JSON structure as (path, value) tuples and then use a mapjson, json to store them. For example, your JSON above can be expressed as shown below where the path is a list of keys/indices and the value is a scalar. You could also concatenate the path

Re: Composite Column Grouping

2013-09-10 Thread Laing, Michael
You could try this. C* doesn't do it all for you, but it will efficiently get you the right data. -ml -- put this in file and run using 'cqlsh -f file DROP KEYSPACE latest; CREATE KEYSPACE latest WITH replication = { 'class': 'SimpleStrategy', 'replication_factor' : 1 }; USE latest;

Re: Composite Column Grouping

2013-09-10 Thread Laing, Michael
', u'1000', u'203', u'Col-Name-4') # (u'XYZ', u'1001', u'201', u'Col-Name-2') On Tue, Sep 10, 2013 at 6:32 PM, Laing, Michael michael.la...@nytimes.comwrote: You could try this. C* doesn't do it all for you, but it will efficiently get you the right data. -ml -- put this in file and run using

Re: One node out of three not flushing memtables

2013-09-09 Thread Laing, Michael
I have seen something similar. Of course correlation is not causation... Like you, doing testing with heavy writes. I was using a python client to drive the writes using the cql module which is thrift based. The correlation I eventually tracked down was that whichever node my python client(s)

Re: Versioning in cassandra

2013-09-04 Thread Laing, Michael
. Please let me know if the second approach is fine. Regards, Dawood On Wed, Sep 4, 2013 at 2:47 AM, Laing, Michael michael.la...@nytimes.comwrote: I use the technique described in my previous message to handle millions of messages and their versions. Actually, I use timeuuid's instead

Re: Selecting multiple rows with composite partition keys using CQL3

2013-09-04 Thread Laing, Michael
you could try this. -ml -- put this in file and run using 'cqlsh -f file DROP KEYSPACE carl_test; CREATE KEYSPACE carl_test WITH replication = { 'class': 'SimpleStrategy', 'replication_factor' : 1 }; USE carl_test; CREATE TABLE carl_table ( app text, name text, ts int,

Re: Versioning in cassandra

2013-09-03 Thread Laing, Michael
try the following. -ml -- put this in file and run using 'cqlsh -f file DROP KEYSPACE latest; CREATE KEYSPACE latest WITH replication = { 'class': 'SimpleStrategy', 'replication_factor' : 1 }; USE latest; CREATE TABLE file ( parentid text, -- row_key, same for each version id

Re: Versioning in cassandra

2013-09-03 Thread Laing, Michael
to 9 replicas of each version. We journal them all and use them for reporting latencies in our processing pipelines as well as for replay when we need to recover application state. Regards, Michael On Tue, Sep 3, 2013 at 3:15 PM, Laing, Michael michael.la...@nytimes.comwrote: try the following

<    1   2