Re: About Composite range queries

2012-05-31 Thread Cyril Auburtin
Thx for the answer 1 more thing, a Composite key is not hashed only once I guess? It's hashed the number of part the composite have? So this means there are twice or 3 or ... as many keys as for normal column keys, is it true? Le 31 mai 2012 02:59, aaron morton aa...@thelastpickle.com a écrit :

cassandra-hadoop mapper

2012-05-31 Thread murat migdisoglu
Hi, I'm working on some use cases to understand how cassandra-hadoop integration works. I have a very basic scenario: I have a column family that keeps the session id and some bson data that contains the username in two separate columns. I want to go through all rows and dump the row to a file

Re: cassandra-hadoop mapper

2012-05-31 Thread Filippo Diotalevi
Hi, yes, the work can be split between different mappers, but each one will process one row at the time. In fact, the method public void map(ByteBuffer key, SortedMapByteBuffer, IColumn columns, Context context) processes 1 row, with the specified ByteBuffer key and the list of columns

Re: Retrieving old data version for a given row

2012-05-31 Thread aaron morton
-Is there any other way to stract the contect of SSTable, writing a java program for example instead of using sstable2json? Look at the code in sstale2json and copy it :) -I tried to get tombstons using the thrift API, but seems to be not possible, is it right? When I try, the program throws

Re: Renaming a keyspace in 1.1

2012-05-31 Thread aaron morton
Not directly. * stop the cluster * rename the /var/lib/cassandra/data/mykeyspace directory * start the cluster * create the keyspace with new name * drop the keyspace with the old name Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On

Re: tokens and RF for multiple phases of deployment

2012-05-31 Thread aaron morton
Could you provide some guide on how to assign the tokens in this growing deployment phases? background http://www.datastax.com/docs/1.0/install/cluster_init#calculating-tokens-for-a-multi-data-center-cluster Start with tokens for a 4 node cluster. Add the next 4 between between each of

Re: commitlog_sync_batch_window_in_ms change in 0.7

2012-05-31 Thread aaron morton
Agree. Just happy to see people upgrade to something 1.X A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 31/05/2012, at 8:24 AM, Rob Coli wrote: On Tue, May 29, 2012 at 10:29 PM, Pierre Chalamet pie...@chalamet.net wrote: You'd better use

Re: java.net.SocketTimeoutException while Trying to Drop a Collection

2012-05-31 Thread aaron morton
There are two times of timeouts. The thrift TimedOutException occurs when the coordinator times out waiting for the CL level nodes to respond. The error is transmitted back to the client and raised. This is a client side socket timeout waiting for the coordinator to respond. See the

Re: will compaction delete empty rows after all columns expired?

2012-05-31 Thread aaron morton
You can set the gc_grace_secs as a little value and force major compaction after the row is expired. After then please check whether the row still exists. There are some downsides to major compactions. (There have been some recent discussions). You can provoke (some) minor compactions by:

Re: About Composite range queries

2012-05-31 Thread aaron morton
it is hashed once. To the partitioner it's just some bytes. Other parts of the code car about it's structure. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 31/05/2012, at 7:00 PM, Cyril Auburtin wrote: Thx for the answer 1 more

How can we use composite indexes and secondary indexes together

2012-05-31 Thread Nury Redjepow
We want to use cassandra to store complex data. But we can't figure out, how to organize indexes. Our table (column family) looks like this: Users = { RandomId int, Firstname varchar, Lastname varchar, Age int, Country int, ChildCount int } In our queries we have mandatory fields

Re: About Composite range queries

2012-05-31 Thread Cyril Auburtin
but sorry, I dont undertand If you hash 4 composite keys, let's say ('A','B','C'), ('A','D','C'), ('A','E','X'), ('A','R','X'), you have only 4 hashes or you have more? If it's 4, how come you are able to range query for example between start_column=('A', 'D') and end_column=('A','E') and get

RE: nodetool move 0 gets stuck in moving state forever

2012-05-31 Thread Poziombka, Wade L
Let me elaborate a bit. two node cluster node1 has token 0 node2 has token 85070591730234615865843651857942052864 node1 goes down perminently. do a nodetool move 0 on node2. monitor with ring... is in Moving state forever it seems. From: Poziombka, Wade L Sent: Tuesday, May 29, 2012 4:29

Invalid Counter Shard errors?

2012-05-31 Thread Charles Brophy
Hi guys, We're running a three node cluster of cassandra 1.1 servers, originally 1.0.7 and immediately after the upgrade the error logs of all three servers began filling up with the following message: ERROR [ReplicateOnWriteStage:177] 2012-05-31 08:17:02,236 CounterContext.java (line 381)

Re: java.net.SocketTimeoutException while Trying to Drop a Collection

2012-05-31 Thread Christof Bornhoevd
Thanks a lot Aaron for the very fast response! I have increased the CassandraThriftSocketTimeout from 5000 to 9000. Is this a reasonable setting? configurator.setCassandraThriftSocketTimeout(9000); Cheers, Christof 2012/5/31 aaron morton aa...@thelastpickle.com There are two times of

Re: tokens and RF for multiple phases of deployment

2012-05-31 Thread Chong Zhang
Thanks Aaron. I might use LOCAL_QUORUM to avoid the waiting on the ack from DC2. Another question, after I setup a new node with token +1 in a new DC, and updated a CF with RF {DC1:2, DC2:1}. When i update a column on one node in DC1, it's also updated in the new node in DC2. But all the other

newbie question :got error 'org.apache.thrift.transport.TTransportException'

2012-05-31 Thread Chen, Simon
Hi, I am new to Cassandra. I have started a Cassandra instance (Cassandra.bat), played with it for a while, created a keyspace Zodiac. When I kill Cassandra instance and restarted, the keyspace is gone but when I tried to recreate it, I got 'org.apache.thrift.transport.TTransportException'

Re: cassandra read latency help

2012-05-31 Thread Gurpreet Singh
Aaron, Thanks for your email. The test kinda resembles how the actual application will be. It is going to be a simple key-value store with 500 million keys per node. The traffic will be read heavy in steady state, and there will be some keys that will have a lot more traffic than others. The

Re: cassandra read latency help

2012-05-31 Thread crypto five
You may also consider disabling key/row cache at all. 1mm rows * 400 bytes = 400MB of data, can easily be in fs cache, and you will access your hot keys with thousands of qps without hitting disk at all. Enabling compression can make situation even better. On Thu, May 31, 2012 at 12:01 PM,

Re: cassandra read latency help

2012-05-31 Thread crypto five
But I think it's bad idea, since hot data will be evenly distributed between multiple sstables and filesystem pages. On Thu, May 31, 2012 at 1:08 PM, crypto five cryptof...@gmail.com wrote: You may also consider disabling key/row cache at all. 1mm rows * 400 bytes = 400MB of data, can easily

RE: 1.1 not removing commit log files?

2012-05-31 Thread Bryce Godfrey
So this happened to me again, but it was only when the cluster had a node down for a while. Then the commit logs started piling up past the limit I set in the config file, and filled the drive. After the node recovered and hints had replayed the space was never reclaimed. A flush or drain did

Re: How can we use composite indexes and secondary indexes together

2012-05-31 Thread aaron morton
If you want to do arbitrary complex online / realtime queries look at Data Stax Enterprise, or https://github.com/tjake/Solandra or straight Solr. Alternatively denormalise the model to materialise the results when you insert so you query is a straight lookup. Or do some client side filtering

Re: Cassandra Data Archiving

2012-05-31 Thread aaron morton
I'm not sure on your needs, but the simplest thing to consider is snapshotting and copying off node. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 1/06/2012, at 12:23 AM, Shubham Srivastava wrote: I need to archive my Cassandra data

Re: About Composite range queries

2012-05-31 Thread aaron morton
If you hash 4 composite keys, let's say ('A','B','C'), ('A','D','C'), ('A','E','X'), ('A','R','X'), you have only 4 hashes or you have more? Four If it's 4, how come you are able to range query for example between start_column=('A', 'D') and end_column=('A','E') and get this column

Re: nodetool move 0 gets stuck in moving state forever

2012-05-31 Thread aaron morton
Look in the logs for errors or warnings. Also let us know what version you are using. Am guessing that node 2 still thought that node 1 was in the cluster when you did the move. Which should(?) have errored. Cheers - Aaron Morton Freelance Developer @aaronmorton

Re: Invalid Counter Shard errors?

2012-05-31 Thread aaron morton
I suggest creating a ticket on https://issues.apache.org/jira/browse/CASSANDRA with the details. If it is an immediate concern see if you can find someone in the #cassandra chat room http://cassandra.apache.org/ Cheers - Aaron Morton Freelance Developer @aaronmorton

Re: java.net.SocketTimeoutException while Trying to Drop a Collection

2012-05-31 Thread aaron morton
The default value for rpc_timeout is 1 - 10 seconds. You want the socket timeout to be higher than the rpc_timeout otherwise the client will give up before the server. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 1/06/2012, at

Re: tokens and RF for multiple phases of deployment

2012-05-31 Thread aaron morton
The ring (2 in DC1, 1 in DC2) looks OK, but the load on the new node in DC2 is almost 0%. yeah, thats the way it will look. But all the other rows are not in the new node. Do I need to copy the data files from a node in DC1 to the new node? How did you add the node ? (see

Re: newbie question :got error 'org.apache.thrift.transport.TTransportException'

2012-05-31 Thread aaron morton
Sounds like https://issues.apache.org/jira/browse/CASSANDRA-4219?attachmentOrder=desc Drop back to 1.0.10 and have a play. Good luck. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 1/06/2012, at 6:38 AM, Chen, Simon wrote: Hi, I am new

Re: 1.1 not removing commit log files?

2012-05-31 Thread aaron morton
Could be this https://issues.apache.org/jira/browse/CASSANDRA-4201 But that talks about segments not being cleared at startup. Does not explain why they were allowed to get past the limit in the first place. Can you share some logs from the time the commit log got out of control ? Cheers

RE: Cassandra Data Archiving

2012-05-31 Thread Harshvardhan Ojha
Problem statement: We are keeping daily generated data(user generated content) in Cassandra, but our application is using only 15 days old data. So how can we archive data older than 15 days so that we can reduce load on Cassandra ring. Note : we can't apply TTL, as this data may be needed in

Re: Cassandra Data Archiving

2012-05-31 Thread Zhu Han
On Fri, Jun 1, 2012 at 12:28 PM, Harshvardhan Ojha harshvardhan.o...@makemytrip.com wrote: Problem statement: We are keeping daily generated data(user generated content) in Cassandra, but our application is using only 15 days old data. So how can we archive data older than 15 days so

Re: Cassandra Data Archiving

2012-05-31 Thread samal
I believe you are talking about HDD space, consumed by user generated data which is no longer required after 15 days or may required. First case to use TTL which you don't wan to use. 2nd as aaron pointed snapshotting data, but data still exist in cluster, only used for back up. I think of like