Re: Adding disk capacity to a running node

2016-10-17 Thread Laing, Michael
You could just expand the size of your ebs volume and extend the file system. No data is lost - assuming you are running Linux. On Monday, October 17, 2016, Seth Edwards wrote: > We're running 2.0.16. We're migrating to a new data model but we've had an > unexpected increase in write traffic tha

Re: Cassandra event notification on INSERT/DELETE of records

2016-05-25 Thread Laing, Michael
You could also follow this related issue: https://issues.apache.org/jira/browse/CASSANDRA-8844 On Wed, May 25, 2016 at 12:04 PM, Aaditya Vadnere wrote: > Thanks Eric and Mark, we were thinking along similar lines. But we already > need Cassandra for regular database purpose, so instead of having

Re: UUID coming as int while using SPARK SQL

2016-05-24 Thread Laing, Michael
; SELECT id, workflow FROM sam WHERE dept='blah'; > > And in Spark with Python: > SELECT distinct id, dept, workflow FROM samd WHERE dept='blah'; > > > Best, > Rajesh R > > > -- > *From:* Laing, Michael [michael.la...@n

Re: UUID coming as int while using SPARK SQL

2016-05-24 Thread Laing, Michael
Try converting that int from decimal to hex and inserting dashes in the appropriate spots - or go the other way. Also, you are looking at different rows, based upon your selection criteria... ml On Tue, May 24, 2016 at 6:23 AM, Rajesh Radhakrishnan < rajesh.radhakrish...@phe.gov.uk> wrote: > Hi

Re: Publishing from cassandra

2016-04-24 Thread Laing, Michael
You could take a look at, or follow: https://issues.apache.org/jira/browse/CASSANDRA-8844 On Sun, Apr 24, 2016 at 10:51 AM, Alexander Orr wrote: > Hi, > > I'm wondering if someone could help me, I'd like to use cassandra to store > data and publish this on dowstream to another database (kdb if a

Re: Migration from 2.0.10 to 2.1.12

2016-03-30 Thread Laing, Michael
fyi the list of reserved keywords is at: https://cassandra.apache.org/doc/cql3/CQL.html#appendixA ml On Wed, Mar 30, 2016 at 9:41 AM, Jean Carlo wrote: > Yes we did some reads and writes, the problem is that adding double quotes > force us to modify our code to change and insert like that > >

Re: Modeling contact list, plain table or List

2016-01-09 Thread Laing, Michael
Note that in C* 3.02 the second query is invalid: cqlsh> Select * from communication.user_contact_list where user_id = 98f50f00-b6d5-11e5-afec-6003089bf572 and is_favorite = true order by contact_name asc; *InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY column "is_favorite" cannot

Re: Slow write speeds

2015-12-31 Thread Laing, Michael
To add to what Jonathan and Jack have said... To get high levels of performance with the python driver you should: - prepare your statements once (recent drivers default to Token Aware - and will correctly apply it if the statement is prepared). - execute asynchronously (up to ~150 futur

Re: Materialized View: can the view's partition key change due to changes to the underlying table?

2015-12-15 Thread Laing, Michael
why don't you just try it? On Tue, Dec 15, 2015 at 6:30 PM, Will Zhang wrote: > Hi all, > > I originally raised this on SO, but not really getting any answer there, > thought I give it a try here. > > > Just thinking about this so please correct my understanding if any of this > isn't right. > >

Re: list data value multiplied x2 in multi-datacenter environment

2015-11-25 Thread Laing, Michael
You don't have any syntax in your application anywhere such as: UPDATE data SET field5 = field5 + [ 1,2,3 ] WHERE field1=...; Just a quick idempotency check :) On Wed, Nov 25, 2015 at 9:16 AM, Jack Krupansky wrote: > Is the data corrupted exactly the same way on all three nodes and in both > d

Re: Getting code=2200 [Invalid query] message=Invalid column name ... while executing ALTER statement

2015-11-21 Thread Laing, Michael
t, Nov 21, 2015 at 8:52 AM, Laing, Michael wrote: > All these pain we need to take because the column names have special >>> character like " ' _- ( ) '' ¬ " etc. >>> >> > Hmm. I tried: > > cqlsh:test> create table quoted_

Re: Getting code=2200 [Invalid query] message=Invalid column name ... while executing ALTER statement

2015-11-21 Thread Laing, Michael
> > All these pain we need to take because the column names have special >> character like " ' _- ( ) '' ¬ " etc. >> > Hmm. I tried: cqlsh:test> create table quoted_col_name ( pk int primary key, "'_-()""¬" int); cqlsh:test> select * from quoted_col_name; *pk* | *'_-()"¬* +- (0 row

Re: Overriding timestamp with light weight transactions

2015-11-16 Thread Laing, Michael
So you are reading the row before writing as you say you have the timestamp. If you really need CAS for the write *and* the timestamp you read is in the future (by local reckoning), why not delay that write until the future arrives and forget about explicitly setting the timestamp? Backtracking o

Re: Convert timeuuid in timestamp programmatically

2015-11-16 Thread Laing, Michael
http://www.tutorialspoint.com/java/util/uuid_timestamp.htm On Mon, Nov 16, 2015 at 7:38 AM, Marlon Patrick wrote: > Hi Donfeng, > > I'm interested in convert a timeuuid already generated in a timestamp, > similar to dateOf function of the Cassandra, but in Java code. The your > sugestion is for

Re: Getting code=2200 [Invalid query] message=Invalid column name ... while executing ALTER statement

2015-11-13 Thread Laing, Michael
Dynamic schema changes are generally a bad idea, especially if they are rapid. You should rethink your approach. On Fri, Nov 13, 2015 at 7:20 AM, Rajesh Radhakrishnan < rajesh.radhakrish...@phe.gov.uk> wrote: > > Thank you Carlos for looking. > But when I rand the nodetool describecluster. > It

Re: Read query taking a long time

2015-10-21 Thread Laing, Michael
Are the clocks synchronized across the cluster - probably, but I thought I would ask :) On Wed, Oct 21, 2015 at 3:35 AM, Brice Figureau < brice+cassan...@daysofwonder.com> wrote: > Hi, > > On 20/10/2015 19:48, Carlos Alonso wrote: > > I think also having the output of cfhistograms could help. I'd

Re: Removed node is not completely removed

2015-10-14 Thread Laing, Michael
Remember that the system keyspace uses LocalStrategy: each node has its own set of system tables. -ml On Wed, Oct 14, 2015 at 9:17 AM, Tom van den Berge < tom.vandenbe...@gmail.com> wrote: > Hi Carlos, > > I'm using 2.1.6. The mysterious node is not in the peers table. Any other > ideas? > One of

Re: Consistency Issues

2015-09-30 Thread Laing, Michael
What client are you using? Official java and python clients should not have a LB between them and the C* nodes AFAIK. Why aren't you using 2.1.9? Have you checked for schema agreement amongst all nodes? ml On Wed, Sep 30, 2015 at 11:22 AM, Walsh, Stephen wrote: > More information, > > > > I’

Re: High read latency

2015-09-26 Thread Laing, Michael
Maybe compaction not keeping up - since you are hitting so many sstables? Read heavy... are you using LCS? Plenty of resources... tune to increase memtable size? On Sat, Sep 26, 2015 at 9:19 AM, Eric Stevens wrote: > Since you have most of your reads hitting 5-8 SSTables, it's probably > relat

Re: Question about consistency

2015-09-09 Thread Laing, Michael
Wiser heads may have to chime in then :) On Wed, Sep 9, 2015 at 3:07 PM, Eric Plowe wrote: > So I set speculative_retry to NONE and I encountered the situation about > 30 minutes ago. > > > > On Wednesday, September 9, 2015, Laing, Michael > wrote: > >>

Re: Question about consistency

2015-09-09 Thread Laing, Michael
. > > On Wednesday, September 9, 2015, Laing, Michael > wrote: > >> "alter table test.test_root WITH speculative_retry = '0.0PERCENTILE';" >> >> seemed to work for me with C* version 2.1.7 >> >> On Wed, Sep 9, 2015 at 10:11 AM, Eric Plowe w

Re: Question about consistency

2015-09-09 Thread Laing, Michael
report back my findings. >> >> Thank you, Michael. >> >> >> On Wednesday, September 9, 2015, Laing, Michael < >> michael.la...@nytimes.com> wrote: >> >>> Perhaps a variation on >>> https://issues.apache.org/jira/

Re: Question about consistency

2015-09-09 Thread Laing, Michael
ptember 9, 2015, Laing, Michael > wrote: > >> What are your read repair settings? >> >> On Tue, Sep 8, 2015 at 9:28 PM, Eric Plowe wrote: >> >>> To further expand. We have two data centers, Miami and Dallas. Dallas is >>> our disaster recovery data c

Re: Question about consistency

2015-09-09 Thread Laing, Michael
What are your read repair settings? On Tue, Sep 8, 2015 at 9:28 PM, Eric Plowe wrote: > To further expand. We have two data centers, Miami and Dallas. Dallas is > our disaster recovery data center. The cluster has 12 nodes, 6 in Miami and > 6 in Dallas. The servers in Miami only read/write to Mi

Re: Convert joins in RDBMS to Cassandra

2015-09-06 Thread Laing, Michael
Denormalize your data to support the query, e.g.: CREATE TABLE name_by_cust_id (cust_id int, name text, PRIMARY KEY > (cust_id)); > SELECT name WHERE cust_id = 3; For additional queries, similarly denormalize. Refer to https://academy.datastax.com/courses for free online courses covering this t

Re: Is Cassandra really Strong consistency?

2015-09-06 Thread Laing, Michael
: Sun, 6 Sep 2015 13:10:14 +0100 >> Subject: Re: Is Cassandra really Strong consistency? >> From: ibrahimsaba...@gmail.com >> To: user@cassandra.apache.org >> >> >> Do you mean Cassandra does synchronize the clock across all the cluster, >> if yes how it doe

Re: Is Cassandra really Strong consistency?

2015-09-06 Thread Laing, Michael
I think I saw this before. Clocks must be synchronized. On Sun, Sep 6, 2015 at 7:28 AM, ibrahim El-sanosi wrote: > Hi folks, > > Assume we have 4-nodes cluster N1, N2, N3, and N4 and replication factor > is 3. When write CL =ALL and read CL=ONE: > > Client c1 sends W1 = [k1,V1] to N1 (a coordi

Re: Write request in Cassandra?

2015-08-21 Thread Laing, Michael
2 is more correct. On Fri, Aug 21, 2015 at 11:48 AM, ibrahim El-sanosi < ibrahimsaba...@gmail.com> wrote: > Dear folks, > > > I have doubt on how Cassandra performs a write request; I have two > scenarios, please read them and ensure which one is correct? > > > Assume we have cluster consists of

Re: Write request in Cassandra?

2015-08-21 Thread Laing, Michael
https://academy.datastax.com/courses/ds201-cassandra-core-concepts/internal-architecture-replication On Fri, Aug 21, 2015 at 11:53 AM, Laing, Michael wrote: > 2 is more correct. > > On Fri, Aug 21, 2015 at 11:48 AM, ibrahim El-sanosi < > ibrahimsaba...@gmail.com> wrot

Re: Question about how to remove data

2015-08-19 Thread Laing, Michael
Possibly you have snapshots? If so, use nodetool to clear them. On Wed, Aug 19, 2015 at 4:54 PM, Analia Lorenzatto < analialorenza...@gmail.com> wrote: > Hello guys, > > I have a cassandra cluster 2.1 comprised of 4 nodes. > > I removed a lot of data in a Column Family, then I ran manually a > co

Re: Data model suggestions

2015-04-27 Thread Laing, Michael
No - it immediately removes the sstables on all nodes. On Mon, Apr 27, 2015 at 7:53 AM, Ali Akhtar wrote: > Wouldn't truncating the table create tombstones? > > On Mon, Apr 27, 2015 at 11:55 AM, Peer, Oded wrote: > >> I recommend truncating the table instead of dropping it since you don’t >> n

Re: Cassandra tombstones being created by updating rows with TTL's

2015-04-21 Thread Laing, Michael
Executor.java:1142) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: > /var/lib/cassandra/d

Re: Cassandra tombstones being created by updating rows with TTL's

2015-04-21 Thread Laing, Michael
n the situation, for what I > read we need to start doing this also. > > > > https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair > > > > > > *From:* Laing, Michael [mailto:michael.la...@nytimes.com] > *Sent:* 21 April 2015 16:26 > *To:

Re: Cassandra tombstones being created by updating rows with TTL's

2015-04-21 Thread Laing, Michael
If you never delete except by ttl, and always write with the same ttl (or monotonically increasing), you can set gc_grace_seconds to 0. That's what we do. There have been discussions on the list over the last few years re this topic. ml On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen wrote: >

Re: [Cassandra 2.0] truncate table

2015-04-09 Thread Laing, Michael
rtfm - trncate creates snapshots by default, they must be cleared on all nodes to recover *disk space *as requested by the OP. On Thu, Apr 9, 2015 at 10:17 AM, Anuj Wadehra wrote: > You can try doing it from cassandra cli. Set consistency level to All and > then truncate. > > Anuj Wadehra > > Se

Re: [Cassandra 2.0] truncate table

2015-04-09 Thread Laing, Michael
Nodetool clearsnapshot On Thursday, April 9, 2015, Eduardo Cusa wrote: > Hi Guys, I truncated a column family that has a size of 31 gb, and the > disk space was not released > > what else do i have to do? > > Regards > Eduardo > >

Re: How to store unique visitors in cassandra

2015-03-31 Thread Laing, Michael
We use Alain's solution as well to make major operational revisions. We have a "red team" and a "blue team in each AWS region, so we just add and drop datacenters to get where we want to be. Pretty simple. ml On Tue, Mar 31, 2015 at 8:16 AM, Alain RODRIGUEZ wrote: > People keep asking me if w

Re: High latencies for simple queries

2015-03-27 Thread Laing, Michael
Actually I am in the middle of setting up the same sort of thing for PostgreSQL using psycopg2 and pyev. I'll be using Cassandra and PostgreSQL in an IoT experiment as the backend for swarms of MQTT brokers at something in the 10-100M client range. ml On Fri, Mar 27, 2015 at 4:59 PM,

Re: High latencies for simple queries

2015-03-27 Thread Laing, Michael
I use callback chaining with the python driver and can confirm that it is very fast. You can "chain the chains" together to perform sequential processing. I do this when retrieving "metadata" and then the referenced "payload" for example, when the metadata has been inverted and the payload is larg

Re: Storing bi-temporal data in Cassandra

2015-02-15 Thread Laing, Michael
Perhaps you should learn more about Cassandra before you ask such questions. It's easy if you just look at the readily accessible docs. ml On Sat, Feb 14, 2015 at 6:05 PM, Raj N wrote: > I don't think thats solves my problem. The question really is why can't we > use ranges for both time colum

Re: Adding more nodes causes performance problem

2015-02-09 Thread Laing, Michael
Use token-awareness so you don't have as much coordinator overhead. ml On Mon, Feb 9, 2015 at 5:32 AM, Marcelo Valle (BLOOMBERG/ LONDON) < mvallemil...@bloomberg.net> wrote: > AFAIK, if you were using RF 3 in a 3 node cluster, so all your nodes had > all your data. > When the number of nodes sta

Re: number of replicas per data center?

2015-01-19 Thread Laing, Michael
Since our workload is spread globally, we spread our nodes across AWS regions as well: 2 nodes per zone, 6 nodes per region (datacenter) (RF 3), 12 nodes total (except during upgrade migrations). We autodeploy into VPCs. If a region goes "bad" we can route all traffic to another and bring up a thir

Re: [Consitency on cqlsh command prompt]

2014-12-17 Thread Laing, Michael
http://datastax.github.io/python-driver/api/cassandra.html On Wed, Dec 17, 2014 at 9:27 AM, nitin padalia wrote: > > Thanks! Philip/Ryan, > Ryan I am using single Datacenter. > Philip could you point some link where we could see those enums. > -Nitin > On Dec 17, 2014 7:14 PM, "Philip Thompson"

Re: Recommissioned node is much smaller

2014-12-07 Thread Laing, Michael
On a mac this works (different sed, use an actual newline): " nodetool info -T | grep ^Token | awk '{ print $3 }' | tr \\n , | sed -e 's/,$/\ >/' " Otherwise the last token will have an 'n' appended which you may not notice. On Fri, Dec 5, 2014 at 4:34 PM, Robert Coli wrote: > On Wed, Dec 3, 2

Re: Using Cassandra for session tokens

2014-12-01 Thread Laing, Michael
able as the OP suggested. > > On Mon Dec 01 2014 at 7:18:51 AM Laing, Michael > wrote: > >> Since the session tokens are random, perhaps computing a shard from each >> one and using it as the partition key would be a good idea. >> >> I would also use uuid v1 to ge

Re: Using Cassandra for session tokens

2014-12-01 Thread Laing, Michael
Since the session tokens are random, perhaps computing a shard from each one and using it as the partition key would be a good idea. I would also use uuid v1 to get ordering. With such a small amount of data, only a few shards would be needed. On Mon, Dec 1, 2014 at 10:08 AM, Phil Wise wrote:

Re: OOM at Bootstrap Time

2014-10-27 Thread Laing, Michael
gt;> - So we see ~ 3000 flush being enqueued. >>> >> - This happens so suddenly that even boosting the number of flush >>> writers >>> >> to 20 does not suffice. I don't even see "all time blocked" numbers >>> for it >>> >> before C*

Re: OOM at Bootstrap Time

2014-10-25 Thread Laing, Michael
Since no one else has stepped in... We have run clusters with ridiculously small nodes - I have a production cluster in AWS with 4GB nodes each with 1 CPU and disk-based instance storage. It works fine but you can see those little puppies struggle... And I ran into problems such as you observe...

Re: EC2 - Performace Question

2014-09-01 Thread Laing, Michael
Is there a reason why updating a counter for this information will not work for you? On Monday, September 1, 2014, eduardo.cusa < eduardo.c...@usmediaconsulting.com> wrote: > yes, is the same table, my mistake. > > > On Mon, Sep 1, 2014 at 6:35 PM, Laing, Michael [via [hid

Re: EC2 - Performace Question

2014-09-01 Thread Laing, Michael
Is table track_user equivalent to table userpixel? On Monday, September 1, 2014, Eduardo Cusa < eduardo.c...@usmediaconsulting.com> wrote: > Hi All. I Have a Cluster in Amazon with the following settings: > > * 2 Nodes M3.Large > * Cassandra 2.0.7 > * Default instaltion on ubuntu > > And I have o

Re: Help with select IN query in cassandra

2014-09-01 Thread Laing, Michael
ging” rather than the exercise in > futility of doing a massive number of deletes and updates in place? > > -- Jack Krupansky > > *From:* Laing, Michael > *Sent:* Monday, September 1, 2014 9:33 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Help with select IN quer

Re: Help with select IN query in cassandra

2014-09-01 Thread Laing, Michael
y criteria how should I construct my schema? One > thought has occurred to me is make three tables with each item > asset_id , event_time, timeuuid as primary keys and depending on type > of query choose the table to do query upon. That seems like a waste of > resources (disk, cp

Re: Help with select IN query in cassandra

2014-08-31 Thread Laing, Michael
Oh it must be late - I missed the fact that you didn't want to specify asset_id. The above queries will still work but you have to use 'allow filtering' - generally not a good idea. I'll look again in the morning. On Sun, Aug 31, 2014 at 9:41 PM, Laing, Michael wrote:

Re: Help with select IN query in cassandra

2014-08-31 Thread Laing, Michael
> this data and reference interesting data points via the timestamp field. > The timestamp field is my bridge between Sal and nosql world. > > Subodh > On Aug 31, 2014 5:33 PM, "Laing, Michael" > wrote: > >> Are event_time and timestamp essentially repre

Re: Help with select IN query in cassandra

2014-08-31 Thread Laing, Michael
Are event_time and timestamp essentially representing the same datetime? On Sunday, August 31, 2014, Subodh Nijsure wrote: > I have following database schema > > CREATE TABLE sensor_info_table ( > asset_id text, > event_time timestamp, > "timestamp" timeuuid, > sensor_reading map, > se

Re: Help with migration from Thrift to CQL3 on Cassandra 2.0.10

2014-08-31 Thread Laing, Michael
x27;m struggling to find documentation on the CQL to physical > layout that isn't a trivial example, especially are around multiget use > cases. Do you have any pointers to blogs or tutorials you've found > helpful? > > Thanks, > Todd > > > On Sunday, August 31, 2014,

Re: Help with migration from Thrift to CQL3 on Cassandra 2.0.10

2014-08-31 Thread Laing, Michael
Actually I think you do want to use scopeId, scopeType as the partition key (and drop row caching until you upgrade to 2.1 where "rows" are in fact rows and not partitions): CREATE TABLE IF NOT EXISTS Graph_Marked_Nodes ( scopeId uuid, scopeType varchar, nodeId uuid, nodeType varchar, timestam

Re: select many rows one time or select many times?

2014-08-01 Thread Laing, Michael
I don't think there is an easy "answer" to this... A possible approach, based upon the implied dimensions of the problem, would be to maintain a bloom filter over "words" for each user as a partition key with the user as clustering key. Then a single query would efficiently yield the list of users

Re: Measuring WAN replication latency

2014-07-29 Thread Laing, Michael
I saw this awhile back: With requests possibly coming in from either US region, we need to make > sure that the replication of data happens within an acceptable time > threshold. This lead us to perform an experiment where we wrote 1 million > records in one region of a multi-region cluster. We th

Re: IN clause with composite primary key?

2014-07-25 Thread Laing, Michael
You may also want to use tuples for the clustering columns: The tuple notation may also be used for IN clauses on CLUSTERING COLUMNS: > > SELECT * FROM posts WHERE userid='john doe' AND (blog_title, posted_at) IN > (('John''s Blog', '2012-01-01), ('Extreme Chess', '2014-06-01')) > > > from https:

Re: Does SELECT … IN () use parallel dispatch?

2014-07-25 Thread Laing, Michael
Except then you have to merge results if you want them ordered. On Fri, Jul 25, 2014 at 2:15 PM, Kevin Burton wrote: > Ah.. ok. Nice. That should work. Parallel dispatch on the client would > work too.. using async. > > > On Fri, Jul 25, 2014 at 1:37 PM, Laing, Michael >

Re: Does SELECT … IN () use parallel dispatch?

2014-07-25 Thread Laing, Michael
We use IN (keeping the number down). The coordinator does parallel dispatch AND applies ORDERED BY to the aggregate results, which we would otherwise have to do ourselves. Anyway, worth it for us. ml On Fri, Jul 25, 2014 at 1:24 PM, Kevin Burton wrote: > Perhaps the best strategy is to have th

Re: How to maintain the N-most-recent versions of a value?

2014-07-18 Thread Laing, Michael
The cql you provided is invalid. You probably meant something like: CREATE TABLE foo ( > > rowkey text, > > family text, > > qualifier text, > > version int, > > value blob, > > PRIMARY KEY ((rowkey, family, qualifier), version)) > > WITH CLUSTERING ORDER BY (version DESC);

Re: Does the default LIMIT applies to automatic paging?

2014-06-24 Thread Laing, Michael
And with python use future.has_more_pages and future.start_fetching_next_page(). On Tue, Jun 24, 2014 at 1:20 AM, DuyHai Doan wrote: > With the Java Driver, set the fetchSize and use ResultSet.iterator > Le 24 juin 2014 01:04, "ziju feng" a écrit : > > Hi All, >> >> I have a wide row table th

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Laing, Michael
However my extensive benchmarking this week of the python driver from master shows a performance *decrease* when using 'token_aware'. This is on 12-node, 2-datacenter, RF-3 cluster in AWS. Also why do the work the coordinator will do for you: send all the queries, wait for everything to come back

Re: Summarizing Timestamp datatype

2014-06-18 Thread Laing, Michael
as opposed to how i'm doing it >> now, in python. >> >> On Tue, Jun 17, 2014 at 9:46 PM, Laing, Michael >> wrote: >> > If you can arrange to index your rows by: >> > >> > (, ) >> > >> > Then you can select ranges as you

Re: Summarizing Timestamp datatype

2014-06-17 Thread Laing, Michael
If you can arrange to index your rows by: (, ) Then you can select ranges as you wish. This works because is the "partition key", arrived at by hash (really it's a hash key), whereas is the "clustering key" (really it is a range key) which is kept in sorted order both in memory and on disk. I

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Laing, Michael
Just to add 2 more cents... :) The CQL3 protocol is asynchronous. This can provide a substantial throughput increase, according to my benchmarking, when one uses non-blocking techniques. It is also peer-to-peer. Hence the server can generate events to send to the client, e.g. schema changes - in

Re: Large number of row keys in query kills cluster

2014-06-12 Thread Laing, Michael
Just an FYI, my benchmarking of the new python driver, which uses the asynchronous CQL native transport, indicates that one can largely overcome client-to-node latency effects if you employ a suitable level of concurrency and non-blocking techniques. Of course response size and other factors come

Re: Large number of row keys in query kills cluster

2014-06-10 Thread Laing, Michael
Perhaps if you described both the schema and the query in more detail, we could help... e.g. did the query have an IN clause with 2 keys? Or is the key compound? More detail will help. On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma wrote: > I didn't explain clearly - I'm not requesting 200

python fast table copy/transform (subject updated)

2014-06-06 Thread Laing, Michael
; > Thanks a lot! > > Best regards, > Marcelo. > > > 2014-06-04 22:28 GMT-03:00 Laing, Michael : > > BTW you might want to put a LIMIT clause on your SELECT for testing. -ml >> >> >> On Wed, Jun 4, 2014 at 6:04 PM, Laing, Michael > > wrote: >>

Re: Bad Request: Type error: cannot assign result of function token (type bigint) to id (type int)

2014-06-06 Thread Laing, Michael
select * from test_paging where *token(*id*)* > token(0); ml On Fri, Jun 6, 2014 at 1:47 AM, Jonathan Haddad wrote: > Sorry, the datastax docs are actually a bit better: > http://www.datastax.com/documentation/cql/3.0/cql/cql_using/paging_c.html > > Jon > > > On Thu, Jun 5, 2014 at 10:46 PM, J

Re: migration to a new model

2014-06-04 Thread Laing, Michael
BTW you might want to put a LIMIT clause on your SELECT for testing. -ml On Wed, Jun 4, 2014 at 6:04 PM, Laing, Michael wrote: > Marcelo, > > Here is a link to the preview of the python fast copy program: > > https://gist.github.com/michaelplaing/37d89c8f5f09ae779e47 > >

Re: migration to a new model

2014-06-04 Thread Laing, Michael
lbacks going at once so it is fun to watch. On my regional cluster of small nodes in AWS I got about 3000 rows per second transferred after things warmed up a bit - each row about 6kb. ml On Wed, Jun 4, 2014 at 11:49 AM, Laing, Michael wrote: > OK Marcelo, I'll work on it today. -ml &g

Re: migration to a new model

2014-06-04 Thread Laing, Michael
osts. Both servers have SDD and 64 Gb RAM, I > could use the script as a benchmark for you if you want. Besides, we have > some bigger clusters, I could run on the just to test the speed if this is > going to help. > > Regards > Marcelo. > > > 2014-06-03 11:40 GMT-03

Re: High latency on 5 node Cassandra Cluster

2014-06-04 Thread Laing, Michael
I would first check to see if there was a time synchronization issue among nodes that triggered and/or perpetuated the event. ml On Wed, Jun 4, 2014 at 3:12 AM, Arup Chakrabarti wrote: > Hello. We had some major latency problems yesterday with our 5 node > cassandra cluster. Wanted to get some

Re: migration to a new model

2014-06-03 Thread Laing, Michael
Hi Marcelo, I could create a fast copy program by repurposing some python apps that I am using for benchmarking the python driver - do you still need this? With high levels of concurrency and multiple subprocess workers, based on my current actual benchmarks, I think I can get well over 1,000 row

Re: Schema disagreement errors

2014-05-12 Thread Laing, Michael
Upgrade to 2.0.7 fixed this for me. You can also try 'nodetool resetlocalschema' on disagreeing nodes. This worked temporarily for me in 2.0.6. ml On Mon, May 12, 2014 at 3:31 PM, Gaurav Sehgal wrote: > We have recently started seeing a lot of Schema Disagreement errors. We > are using Cassan

Re: Deleting column names

2014-04-22 Thread Laing, Michael
Your understanding is incorrect - the easiest way to see that is to try it. On Tue, Apr 22, 2014 at 12:00 PM, Sebastian Schmidt wrote: > From my understanding, this would delete all entries with the given s. > Meaning, if I have inserted (sa, p1, o1, c1) and (sa, p2, o2, c2), > executing this: >

Re: Deleting column names

2014-04-22 Thread Laing, Michael
Referring to the original post, I think the confusion is what is a "row" in this context: So as far as I understand, the s column is now the *row *key ... Since I have multiple different p, o, c combinations per s, deleting the whole > *row* identified by s is no option The s column is in fact

Re: clearing tombstones?

2014-04-11 Thread Laing, Michael
I've never noticed that that setting tombstone_threshold has any effect... at least in 2.0.6. What gets written to the log? On Fri, Apr 11, 2014 at 3:31 PM, DuyHai Doan wrote: > I was wondering, to remove the tombstones from Sstables created by LCS, > why don't we just set the tombstone_thresh

Re: clearing tombstones?

2014-04-11 Thread Laing, Michael
At the cost of really quite a lot of compaction, you can temporarily switch to SizeTiered, and when that is completely done (check each node), switch back to Leveled. it's like doing the laundry twice :) I've done this on CFs that were about 5GB but I don't see why it wouldn't work on larger ones

Re: clearing tombstones?

2014-04-11 Thread Laing, Michael
I have played with this quite a bit and recommend you set gc_grace_seconds to 0 and use 'nodetool compact [keyspace] [cfname]' on your table. A caveat I have is that we use C* 2.0.6 - but the space we expect to recover is in fact recovered. Actually, since we never delete explicitly (just ttl) we

Re: Setting gc_grace_seconds to zero and skipping "nodetool repair (was RE: Timeseries with TTL)

2014-04-07 Thread Laing, Michael
e run routinely as part of regular cluster maintenance > operations. > > > > If RF=2, ReadConsistency is ONE and data failed to get replicated to the > second node, then during a read might the app incorrectly return “missing > data”? > > > > It seems to me that the

Re: Timeseries with TTL

2014-04-06 Thread Laing, Michael
Since you are using LeveledCompactionStrategy there is no major/minor compaction - just compaction. Leveled compaction does more work - your logs don't look unreasonable to me - the real question is whether your nodes can keep up w the IO. SSDs work best. BTW if you never delete and only ttl your

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Laing, Michael
3, it's easy to pull just the new tables > out via aws-cli tools (s3 sync), to your remote, non-aws server, and not > incur the overhead of routinely backing up the entire dataset. For a non > trivial database, this matters quite a bit. > > > On Fri, Mar 28, 2014 at 1:21 PM, La

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Laing, Michael
As I tried to say, EBS snapshots require much care or you get corruption such as you have encountered. Does Cassandra quiesce the file system after a snapshot using fsfreeze or xfs_freeze? Somehow I doubt it... On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad wrote: > I have a nagging memory o

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Laing, Michael
In your step 4, be sure you create a consistent EBS snapshot. You may have pieces of your sstables that have not actually been flushed all the way to EBS. See https://github.com/alestic/ec2-consistent-snapshot ml On Fri, Mar 28, 2014 at 3:21 PM, Russ Lavoie wrote: > Thank you for your quick r

Re: Kernel keeps killing cassandra process - OOM

2014-03-22 Thread Laing, Michael
guys? >> I have already tried reducing the number of rpc threads. Also tried >> reducing the linux kernel overcommit. >> >> >> On Sat, Mar 22, 2014 at 5:44 PM, Laing, Michael < >> michael.la...@nytimes.com> wrote: >> >>> I ran into the same p

Re: Kernel keeps killing cassandra process - OOM

2014-03-22 Thread Laing, Michael
I ran into the same problem some time ago. Upgrading to Cassandra 2, jdk 1.7, and default parameters fixed it. I think the jdk change was the key for my similarly small memory cluster. ml On Sat, Mar 22, 2014 at 1:36 PM, prem yadav wrote: > Michael, no memory constraints. System memory is 4

Re: Data model for boolean attributes

2014-03-21 Thread Laing, Michael
Of course what you really want is this: create table x( id text, timestamp timeuuid, flag boolean, // other fields primary key (flag, id, timestamp) ) Whoops now there are only 2 partition keys! Not good if you have any reasonable number of rows... Faced with a situation like this (alt

Re: ALLOW FILTERING usage

2014-03-17 Thread Laing, Michael
Your second query is invalid: *Bad Request: Partition KEY part key cannot be restricted by IN relation (only the last part of the partition key can)* ml On Mon, Mar 17, 2014 at 6:56 AM, Tupshin Harper wrote: > It's the difference between reading from only the partitions that you are > interes

Re: Exception in thread event_loop

2014-03-16 Thread Laing, Michael
A possible workaround - not a fix - might be to install libev so the libev event loop is used. See http://datastax.github.io/python-driver/installation.html Also be sure you are running the latest version: 1.0.2 I believe. Your ';' is outside of your 'str' - actually shouldn't be a problem tho.

Re: Cassandra slow on some reads

2014-03-14 Thread Laing, Michael
*If* you do not need to do range queries on your 'timestam' (ts) column - *and* if you can change your schema (big if...), then you could move 'timestam' into the partition key like this (using your notation): PK((key String , timestam int), column1 string, col2 string) , list1 , list 2, list 3 .

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael
These are my personal opinions, reflecting both my long experience w database systems, and my newness to Cassandra... [tl;dr] The Cassandra contributors, having made its history, tend to describe it in terms of implementation rather than action. And its implementation has a history, all relativel

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael
ied to identify individual rows in a > partition. Without clustering columns, one partition is one row. So, it’s a > matter of whether you want your rows to be in the same partition or > distributed. > > -- Jack Krupansky > > *From:* Laing, Michael > *Sent:* Thursday, March

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael
RIMARY KEY ((key1, key2)), any examples would be welcome if you have the > time. > > Kind regards, > > Dave > > > On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael > wrote: > >> Create your table like this and it will work: >> >> CREATE TABLE test.do

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael
rently testing with 2.0.2 which got >> dragged in by the cassandra unit library I'm using for testing [1] I will >> try to fix my build dependencies and retry, thx. >> >> /Dave >> >> [1] https://github.com/jsevellec/cassandra-unit >> >> >&

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael
I have no problem doing this w 2.0.5 - what version of C* are you using? Or maybe I don't understand your data model... attach 'creates' if you don't mind. ml On Thu, Mar 13, 2014 at 9:24 AM, David Savage wrote: > Hi Peter, > > Thanks for the help, unfortunately I'm not sure that's the problem,

Re: CQL decimal encoding

2014-02-26 Thread Laing, Michael
go uses 'zig-zag' encoding, perhaps that is the difference? On Wed, Feb 26, 2014 at 6:52 AM, Peter Lin wrote: > > You may need to bit shift if that is the case > > Sent from my iPhone > > > On Feb 26, 2014, at 2:53 AM, Ben Hood <0x6e6...@gmail.com> wrote: > > > > Hey Colin, > > > >> On Tue, Feb

  1   2   >