Question about 'duplicate' columns

2013-08-06 Thread Franc Carter
I've been thinking through some cases that I can see happening at some point and thought I'd ask on the list to see if my understanding is correct. Say a bunch of columns have been loaded 'a long time ago', i.e long enough in the past that they have been compacted. My understanding is that if

Re: Any good GUI based tool to manage data in Casandra?

2013-08-06 Thread Aaron Morton
There is a list here. http://wiki.apache.org/cassandra/Administration%20Tools Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 3/08/2013, at 6:19 AM, Tony Anecito adanec...@yahoo.com wrote: Hi All, Is there a GUI tool

Re: cassandra 1.2.5- virtual nodes (num_token) pros/cons?

2013-08-06 Thread Aaron Morton
The reason for me looking at virtual nodes is because of terrible experiences we had with 0.8 repairs and as per documentation (an logically) the virtual nodes seems like it will help repairs being smoother. Is this true? I've not thought too much about how they help repair run smoother, what

Re: Better to have lower or greater cardinality for partition key in CQL3?

2013-08-06 Thread Aaron Morton
So from anyones experience, is it better to use a low cardinality partition key or a high cardinality. IMHO go with whatever best supports the read paths. They all get If you have lots (e.g. north of 1 billion) rows per node there are extra considerations that come into play. Cassandra 1.2

Re: Which of these VPS configurations would perform better for Cassandra ?

2013-08-06 Thread Aaron Morton
how many nodes to start with(2 ok?) ? I'd recommend 3, that will give you some redundancy see http://thelastpickle.com/2011/06/13/Down-For-Me/ Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 5/08/2013, at 1:41 AM, Rajkumar

Re: Reducing the number of vnodes

2013-08-06 Thread Aaron Morton
Repair runs in two phases, first it works out the differences then it streams the data. The length of the first depends on the size of the data and the second on the level of inconsistency. To track the first use nodetool compaction stats or look in the logs for the messages about requesting

Re: Unable to bootstrap node

2013-08-06 Thread Aaron Morton
Caused by: java.io.FileNotFoundException: /data/1/cassandra/data/rts/40301_feedProducts/rts-40301_feedProducts-ib-1-Data.db (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:233) at

Re: Question about 'duplicate' columns

2013-08-06 Thread Aaron Morton
Yes. If you overwrite much older data with new data both versions of the column will remain on disk until compaction get's to work on both fragments of the row. Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 6/08/2013, at

Re: Question about 'duplicate' columns

2013-08-06 Thread Franc Carter
On Tue, Aug 6, 2013 at 6:10 PM, Aaron Morton aa...@thelastpickle.comwrote: Yes. If you overwrite much older data with new data both versions of the column will remain on disk until compaction get's to work on both fragments of the row. thanks Cheers - Aaron Morton

Is there update-in-place on maps?

2013-08-06 Thread Jan Algermissen
Hi, I think it does not fit the model of how C* does writes, but just to verify: Is there an update-in-place possibility on maps? That is, could I do an atomic increment on a value in a map? Jan

Effect of TTL on collection updates

2013-08-06 Thread Jan Algermissen
Hi, after seeing Patrick's truly excellent 3-part series on modeling, this question pops up: When I do an update on a collection, using a TTL in the update statement (like Patrick does in the example with the login-location time series example), does the TTL apply to the update only, or to

Re: cassandra 1.2.5- virtual nodes (num_token) pros/cons?

2013-08-06 Thread Richard Low
On 6 August 2013 08:40, Aaron Morton aa...@thelastpickle.com wrote: The reason for me looking at virtual nodes is because of terrible experiences we had with 0.8 repairs and as per documentation (an logically) the virtual nodes seems like it will help repairs being smoother. Is this true?

Re: Effect of TTL on collection updates

2013-08-06 Thread Alain RODRIGUEZ
Hi Jan TTLs if used only apply to the newly inserted/updated values, from : http://cassandra.apache.org/doc/cql3/CQL.html#collections This manual is updated often enough to be up to date, and so, useful, you should keep it bookmarked. Alain 2013/8/6 Jan Algermissen

Re: Is there update-in-place on maps?

2013-08-06 Thread Alain RODRIGUEZ
Once again, this should answer your question : http://cassandra.apache.org/doc/cql3/CQL.html#collections Alain 2013/8/6 Jan Algermissen jan.algermis...@nordsc.com Hi, I think it does not fit the model of how C* does writes, but just to verify: Is there an update-in-place possibility on

Re: Is there update-in-place on maps?

2013-08-06 Thread Jan Algermissen
Alain, On 06.08.2013, at 11:17, Alain RODRIGUEZ arodr...@gmail.com wrote: Once again, this should answer your question : http://cassandra.apache.org/doc/cql3/CQL.html#collections yup, I understand the hint :-) However, since I am about to base application architecture on these capabilities,

Re: Is there update-in-place on maps?

2013-08-06 Thread Andy Twigg
Store pointers to counters as map values?

Re: Is there update-in-place on maps?

2013-08-06 Thread Jan Algermissen
On 06.08.2013, at 11:36, Andy Twigg andy.tw...@gmail.com wrote: Store pointers to counters as map values? Sorry, but this fits into nothing I know about C* so far - can you explain? Jan

Re: Is there update-in-place on maps?

2013-08-06 Thread Andy Twigg
Counters can be atomically incremented ( http://wiki.apache.org/cassandra/Counters). Pick a UUID for the counter, and use that: c=map.get(k); c.incr() On 6 August 2013 11:01, Jan Algermissen jan.algermis...@nordsc.com wrote: On 06.08.2013, at 11:36, Andy Twigg andy.tw...@gmail.com wrote:

Re: Any good GUI based tool to manage data in Casandra?

2013-08-06 Thread Tony Anecito
Thanks Aaron. I found that before I asked the question and Helenos seems the closest but it does not allow you to easily use CRUD like say SQL Server Management tools where you can get a list of say 1,000 records in a grid control and select rows for deletion or insert or update.   I will look

clarification of token() in CQL3

2013-08-06 Thread Keith Freeman
I've seen in several places the advice to use queries like to this page through lots of rows: select id from mytable where token(id) token(last_id) But it's hard to find detailed information about how this works (at least that I can understand -- the description in the Cassandra manual is

Re: clarification of token() in CQL3

2013-08-06 Thread Richard Low
On 6 August 2013 15:12, Keith Freeman 8fo...@gmail.com wrote: I've seen in several places the advice to use queries like to this page through lots of rows: select id from mytable where token(id) token(last_id) But it's hard to find detailed information about how this works (at least

RE: Counters and replication

2013-08-06 Thread Christopher Wirt
Hi Richard, Thanks for your reply. The uid value is a generated guid and should distribute nicely. I've just checked the data yesterday there are only 3 uids out of millions for which there would have been more than 1000 increments. We started with 256 num_tokens. Client and server side I

Re: clarification of token() in CQL3

2013-08-06 Thread Keith Freeman
Ok, I get that, I'll have to find another way to sort out new rows. Your description makes me think that if new rows are added during the paging (i.e. between one select with token()'s and another), they might show up in the query results, right? (because the hash of the new row keys might

Re: clarification of token() in CQL3

2013-08-06 Thread Richard Low
On 6 August 2013 16:56, Keith Freeman 8fo...@gmail.com wrote: Your description makes me think that if new rows are added during the paging (i.e. between one select with token()'s and another), they might show up in the query results, right? (because the hash of the new row keys might fall

CQL3 select between is broken?

2013-08-06 Thread Keith Freeman
I've been looking at examples about modeling series data in Cassandra, and in one experiment created a table like this: create table vvv (k text, t bigint, value text, primary key (k, t)); After inserting some data with identical k values and differing t values, I tried this query (which is

Re: CQL3 select between is broken?

2013-08-06 Thread David Ward
http://cassandra.apache.org/doc/cql3/CQL.html#selectStmt try `and t 111 and t 222' or = and = if you want inclusive. On Tue, Aug 6, 2013 at 10:35 AM, Keith Freeman 8fo...@gmail.com wrote: I've been looking at examples about modeling series data in Cassandra, and in one experiment created a

Re: Unable to bootstrap node

2013-08-06 Thread Keith Wright
The file does not appear on disk and the permissions are definitely correct. We have seen the file in snapshots. This is completely blocking us from adding the new node. How can we recover? Just run repairs? Thanks From: Aaron Morton aa...@thelastpickle.commailto:aa...@thelastpickle.com

Re: Unable to bootstrap node

2013-08-06 Thread sankalp kohli
@Aaron This problem happens when you drop and recreate a keyspace with the same name and you do it very quickly. I have also filed a JIRA for it https://issues.apache.org/jira/browse/CASSANDRA-5843 On Tue, Aug 6, 2013 at 10:31 AM, Keith Wright kwri...@nanigans.com wrote: The file does not

Large number of pending gossip stage tasks in nodetool tpstats

2013-08-06 Thread Faraaz Sareshwala
I'm running cassandra-1.2.8 in a cluster with 45 nodes across three racks. All nodes are well behaved except one. Whenever I start this node, it starts churning CPU. Running nodetool tpstats, I notice that the number of pending gossip stage tasks is constantly increasing [1]. When looking at

Re: Which of these VPS configurations would perform better for Cassandra ?

2013-08-06 Thread S Ahmed
From what I understood tons of people are running things on ec2, but it could be the instance size is pretty large that it compares to a dedicated server (especially if you go with SSD, it is like 1K/month!) On Tue, Aug 6, 2013 at 3:54 AM, Aaron Morton aa...@thelastpickle.comwrote: how many

Re: Which of these VPS configurations would perform better for Cassandra ?

2013-08-06 Thread David Ward
3 node EC2 m1.xlarge is ~ $1000/k month + any incidental costs ( s3 backups, transfer out of the AZ ), etc ) or ~$300/month after a ~$1400 upfront 1 year reservation fee. There are some uncomfortable spots when compaction kicks on concurrently for several large CF's but otherwise its been

Re: Which of these VPS configurations would perform better for Cassandra ?

2013-08-06 Thread Ertio Lew
Amazon seems to much overprice its services. If you look out for a similar size deployment elsewhere like linode or digital ocean(very competitive pricing), you'll notice huge differences. Ok, some services features are extra but may we all don't need them necessarily when you can host on