Re: About Composite range queries

2012-05-31 Thread Cyril Auburtin
Thx for the answer
1 more thing, a Composite key is not hashed only once I guess?
It's hashed the number of part the composite have?
So this means there are twice or 3 or ... as many keys as for normal column
keys, is it true?
Le 31 mai 2012 02:59, aaron morton aa...@thelastpickle.com a écrit :

 Composite Columns compare each part in turn, so the values are ordered as
 you've shown them.

 However the rows are not ordered according to key value. They are ordered
 using the random token generated by the partitioner see
 http://wiki.apache.org/cassandra/FAQ#range_rp

 What is the real advantage compared to super column families?

 They are faster.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 29/05/2012, at 10:08 PM, Cyril Auburtin wrote:

 How is it done in Cassandra to be able to range query on a composite key?

 key1 = (A:A:C), (A:B:C), (A:C:C), (A:D:C), (B,A,C)

 like get_range (key1, start_column=(A,), end_column=(A, C)); will
 return [ (A:B:C), (A:C:C) ] (in pycassa)

 I mean does the composite implementation add much overhead to make it work?
 Does it need to add other Column families, to be able to range query
 between composites simple keys (first, second and third part of the
 composite)?

 What is the real advantage compared to super column families?

 key1 = A: (A,C), (B,C), (C,C), (D,C)  , B: (A,C)

 thx





cassandra-hadoop mapper

2012-05-31 Thread murat migdisoglu
Hi,

I'm working on some use cases to understand how cassandra-hadoop
integration works.

I have a very basic scenario: I have a column family that keeps the session
id and some bson data that contains the username in two separate columns. I
want to go through all rows and dump the row to a file when the username is
matching to a certain criteria. And I don't need any Reducer or Combiner
for now.

After I've written the following very simple hadoop job, I see from the
logs that my mapper function is called per each row.  Is that normal? If
that is the case, doing such a search operation in a big dataset would take
hours if not days...Besides that, I see many small output files being
created on HDFS.

I guess i need a better understanding on how splitting the job into tasks
works exactly..


@Override
public void map(ByteBuffer key, SortedMapByteBuffer, IColumn columns,
Context context)
throws IOException, InterruptedException
{
String rowkey = ByteBufferUtil.string(key);
String ip = context.getConfiguration().
get(IP);
IColumn column = columns.get(sourceColumn);
if (column == null)
return;
ByteBuffer byteBuffer = column.value();
ByteBuffer bb2 = byteBuffer.duplicate();

DataConvertor convertor= fromBson(byteBuffer,
DataConvertor.class);
String username= convertor.getUsername();
BytesWritable value = new BytesWritable();
if (username != null  username.equals(cip)) {
byte[] arr = convertToByteArray(bb2);
value.set(new BytesWritable(arr));
Text tkey = new Text(rowkey);
context.write( tkey, value);
} else {
log.info(ip not match [ + ip + ]);
}
}

Thanks in advance
Kind Regards


-- 
Find a job you enjoy, and you'll never work a day in your life.
Confucius


Re: cassandra-hadoop mapper

2012-05-31 Thread Filippo Diotalevi
Hi,  
yes, the work can be split between different mappers, but each one will process 
one row at the time. In fact, the method

  public void map(ByteBuffer key, SortedMapByteBuffer, IColumn columns, 
 Context context)

processes 1 row, with the specified ByteBuffer key and the list of columns 
SortedMapByteBuffer, IColumn columns.


That doesn't mean you will make millions of requests to Cassandra to retrieve 
one row at the time though. Requests are batched, and the parameter
cassandra.range.batch.size
determines The number of rows to request with each get range slices request 
(as per javadoc).

Performance-wise, that shouldn't be a problem… the operation you are doing is 
very simple, and Cassandra will be fast to retrieve such a short rows.  
In any case, your business logic works well in parallel, so you can split the 
job between many concurrent mappers and distribute the work among them.

--  
Filippo



On Thursday, 31 May 2012 at 09:59, murat migdisoglu wrote:

  
 Hi,  
  
 I'm working on some use cases to understand how cassandra-hadoop integration 
 works.  
  
 I have a very basic scenario: I have a column family that keeps the session 
 id and some bson data that contains the username in two separate columns. I 
 want to go through all rows and dump the row to a file when the username is 
 matching to a certain criteria. And I don't need any Reducer or Combiner for 
 now.  
  
 After I've written the following very simple hadoop job, I see from the logs 
 that my mapper function is called per each row.  Is that normal? If that is 
 the case, doing such a search operation in a big dataset would take hours if 
 not days...Besides that, I see many small output files being created on HDFS. 
  
  
 I guess i need a better understanding on how splitting the job into tasks 
 works exactly..  
  
  
 @Override
 public void map(ByteBuffer key, SortedMapByteBuffer, IColumn columns, 
 Context context)
 throws IOException, InterruptedException
 {
 String rowkey = ByteBufferUtil.string(key);
 String ip = context.getConfiguration().
 get(IP);
 IColumn column = columns.get(sourceColumn);
 if (column == null)
 return;
 ByteBuffer byteBuffer = column.value(); 
 ByteBuffer bb2 = byteBuffer.duplicate();
  
 DataConvertor convertor= fromBson(byteBuffer, DataConvertor.class);   
   
 String username= convertor.getUsername();
 BytesWritable value = new BytesWritable();
 if (username != null  username.equals(cip)) {
 byte[] arr = convertToByteArray(bb2);
 value.set(new BytesWritable(arr));
 Text tkey = new Text(rowkey);
 context.write( tkey, value);
 } else {
 log.info (http://log.info)(ip not match [ + ip + ]);
 }
 }
  
 Thanks in advance
 Kind Regards
  
  
 --  
 Find a job you enjoy, and you'll never work a day in your life.
 Confucius  
  



Re: Retrieving old data version for a given row

2012-05-31 Thread aaron morton
 -Is there any other way to stract the contect of SSTable, writing a
 java program for example instead of using sstable2json?
Look at the code in sstale2json and copy it :)

 -I tried to get tombstons using the thrift API, but seems to be not
 possible, is it right? When I try, the program throws an exception.
No. 
Tombstones are not returned from API (See ColumnFamilyStore.getColumnFamily() 
). 
You can see them if you use sstable2json.

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 30/05/2012, at 9:53 PM, Felipe Schmidt wrote:

 I have further questions:
 -Is there any other way to stract the contect of SSTable, writing a
 java program for example instead of using sstable2json?
 -I tried to get tombstons using the thrift API, but seems to be not
 possible, is it right? When I try, the program throws an exception.
 
 thanks in advance
 
 Regards,
 Felipe Mathias Schmidt
 (Computer Science UFRGS, RS, Brazil)
 
 
 
 
 2012/5/24 aaron morton aa...@thelastpickle.com:
 Ok... it's really strange to me that Cassandra doesn't support data
 versioning cause all of other key-value databases support it (at least
 those who I know).
 
 You can design it into your data model if you need it.
 
 
 I have one remaining question:
 -in the case that I have more than 1 SSTable in the disk for the same
 column but with different data versions, is it possible to make a
 
 query to get the old version instead of the newest one?
 
 No.
 There is only ever 1 value for a column.
 The older copies of the column in the SSTables are artefacts of immutable
 on disk structures.
 If you want to see what's inside an SSTable use bin/sstable2json
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 24/05/2012, at 9:42 PM, Felipe Schmidt wrote:
 
 Ok... it's really strange to me that Cassandra doesn't support data
 versioning cause all of other key-value databases support it (at least
 those who I know).
 
 I have one remaining question:
 -in the case that I have more than 1 SSTable in the disk for the same
 column but with different data versions, is it possible to make a
 query to get the old version instead of the newest one?
 
 Regards,
 Felipe Mathias Schmidt
 (Computer Science UFRGS, RS, Brazil)
 
 
 
 
 2012/5/16 Dave Brosius dbros...@mebigfatguy.com:
 
 You're in for a world of hurt going down that rabbit hole. If you truely
 
 want version data then you should think about changing your keying to
 
 perhaps be a composite key where key is of form
 
 
 NaturalKey/VersionId
 
 
 Or if you want the versioning at the column level, use composite columns
 
 with ColumnName/VersionId format
 
 
 
 
 
 On 05/16/2012 10:16 AM, Felipe Schmidt wrote:
 
 
 That was very helpfull, thank you very much!
 
 
 I still have some questions:
 
 -it is possible to make Cassandra keep old value data after flushing?
 
 The same question for the memTable, before flushing. Seems to me that
 
 when I update some tuple, the old data will be overwrited in memTable,
 
 even before flushing.
 
 -it is possible to scan values from the memtable, maybe using the
 
 so-called Thrift API? Using the client-api I can just see the newest
 
 data version, I can't see what's really happening with the memTable.
 
 
 I ask that cause what I'll try to do is a Change Data Capture to
 
 Cassandra and the answers will define what kind of aproaches I'm able
 
 to use.
 
 
 Thanks in advance.
 
 
 Regards,
 
 Felipe Mathias Schmidt
 
 (Computer Science UFRGS, RS, Brazil)
 
 
 
 2012/5/14 aaron mortonaa...@thelastpickle.com:
 
 
 Cassandra does not provide access to multiple versions of the same
 
 column.
 
 It is essentially implementation detail.
 
 
 All mutations are written to the commit log in a binary format, see the
 
 o.a.c.db.RowMutation.getSerializedBuffer() (If you want to tail it for
 
 analysis you may want to change commitlog_sync in cassandra.yaml)
 
 
 Here is post about looking at multiple versions columns in an
 
 sstable http://thelastpickle.com/2011/05/15/Deletes-and-Tombstones/
 
 
 Remember that not all versions of a column are written to disk
 
  (see http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/).
 
 Also
 
 compaction will compress multiple versions of the same column from
 
 multiple
 
 files into a single version in a single file .
 
 
 Hope that helps.
 
 
 
 -
 
 Aaron Morton
 
 Freelance Developer
 
 @aaronmorton
 
 http://www.thelastpickle.com
 
 
 On 14/05/2012, at 9:50 PM, Felipe Schmidt wrote:
 
 
 Yes, I need this information just for academic purposes.
 
 
 So, to read old data values, I tried to open the Commitlog using tail
 
 -f and also the log files viewer of Ubuntu, but I can not see many
 
 informations inside of the log!
 
 Is there any other way to open this log? I didn't find any Cassandra
 
 API for this purpose.
 
 
 Thanks averybody in advance.
 
 
 Regards,
 
 Felipe Mathias 

Re: Renaming a keyspace in 1.1

2012-05-31 Thread aaron morton
Not directly. 

* stop the cluster
* rename the /var/lib/cassandra/data/mykeyspace directory
* start the cluster
* create the keyspace with new name
* drop the keyspace with the old name


Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 30/05/2012, at 11:13 PM, Oleg Dulin wrote:

 Is it possible ? How ?
 
 



Re: tokens and RF for multiple phases of deployment

2012-05-31 Thread aaron morton

 Could you provide some guide on how to assign the tokens in this growing 
 deployment phases? 

background 
http://www.datastax.com/docs/1.0/install/cluster_init#calculating-tokens-for-a-multi-data-center-cluster

Start with tokens for a 4 node cluster. Add the next 4 between between each of 
the ranges. Add 8 in the new DC to have the same tokens as the first DC +1

 Also if we use the same RF (3) in both DC, and use EACH_QUORUM for write and 
 LOCAL_QUORUM for read, can the read also reach to the 2nd cluster?
No. It will fail if there are not enough nodes available in the first DC. 

 We'd like to keep both write and read on the same cluster.
Writes go to all replicas. Using EACH_QUORUM means the client in the first DC 
will be waiting for the quorum from the second DC to ack the write. 


Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 31/05/2012, at 3:20 AM, Chong Zhang wrote:

 Hi all,
 
 We are planning to deploy a small cluster with 4 nodes in one DC first, and 
 will expend that to 8 nodes, then add another DC with 8 nodes for fail over 
 (not active-active), so all the traffic will go to the 1st cluster, and 
 switch to 2nd cluster if the whole 1st cluster is down or on maintenance. 
 
 Could you provide some guide on how to assign the tokens in this growing 
 deployment phases? I looked at some docs but not very clear on how to assign 
 tokens on the fail-over case.
 Also if we use the same RF (3) in both DC, and use EACH_QUORUM for write and 
 LOCAL_QUORUM for read, can the read also reach to the 2nd cluster? We'd like 
 to keep both write and read on the same cluster.
 
 Thanks in advance,
 Chong



Re: commitlog_sync_batch_window_in_ms change in 0.7

2012-05-31 Thread aaron morton
Agree. 
Just happy to see people upgrade to something 1.X
A

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 31/05/2012, at 8:24 AM, Rob Coli wrote:

 On Tue, May 29, 2012 at 10:29 PM, Pierre Chalamet pie...@chalamet.net wrote:
 You'd better use version 1.0.9 (using this one in production) or 1.0.10.
 
 1.1 is still a bit young to be ready for prod unfortunately.
 
 OP described himself as experimenting which I inferred to mean
 not-production. I agree with others, 1.0.x is what I'd currently
 recommend for production. :)
 
 =Rob
 
 -- 
 =Robert Coli
 AIMGTALK - rc...@palominodb.com
 YAHOO - rcoli.palominob
 SKYPE - rcoli_palominodb



Re: java.net.SocketTimeoutException while Trying to Drop a Collection

2012-05-31 Thread aaron morton
There are two times of timeouts. The thrift TimedOutException occurs when the 
coordinator times out waiting for the CL level nodes to respond. The error is 
transmitted back to the client and raised.  

This is a client side socket timeout waiting for the coordinator to respond. 
See the CassandraHostConfigurator.setCassandraThriftSocketTimeout() setting. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 31/05/2012, at 11:44 AM, Christof Bornhoevd wrote:

 Hello,
  
 We are using Cassandra 1.0.8 with Hector 1.0-5 on both Windows and Linux. In 
 our development/test environment we always recreate the schema in Cassandra 
 (first dropping all ColumnFamilies then recreating them) and then seeding the 
 test data. We simply use cluster.dropColumnFamily(keyspace.getKeyspaceName(), 
 collectionName); to drop ColumnFamilies. The client is using 
 ThriftFramedTransport (configurator.setUseThriftFramedTransport(true);).
  
 Every so often we run into the following exception (with different 
 ColumnFamilies):
  
 Caused by: me.prettyprint.hector.api.exceptions.HectorTransportException: 
 org.apache.thrift.transport.TTransportException: 
 java.net.SocketTimeoutException: Read timed out
 at 
 me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:33)
 at 
 me.prettyprint.cassandra.service.AbstractCluster$7.execute(AbstractCluster.java:279)
 at 
 me.prettyprint.cassandra.service.AbstractCluster$7.execute(AbstractCluster.java:266)
 at 
 me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
 at 
 me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258)
 at 
 me.prettyprint.cassandra.service.AbstractCluster.dropColumnFamily(AbstractCluster.java:283)
 at 
 me.prettyprint.cassandra.service.AbstractCluster.dropColumnFamily(AbstractCluster.java:261)
 at 
 com.supervillains.plouton.cassandradatastore.CassandraDataStore.deleteCollection(CassandraDataStore.java:195)
 ... 57 more
  
 Is this problem related to 
 https://issues.apache.org/jira/browse/CASSANDRA-3551 (which should have been 
 fixed with Cassandra 1.0.6) or could there be anything we do wrong here?
  
 Thanks in advance for any kind help!
 Chris



Re: will compaction delete empty rows after all columns expired?

2012-05-31 Thread aaron morton
 You can set the gc_grace_secs as a little value and force major compaction 
 after the row is expired.  After then please check whether the row still 
 exists.
There are some downsides to major compactions. (There have been some recent 
discussions).

You can provoke (some) minor compactions by:
* setting the min_compaction_threshold to 2 (not sure if nodetool in 0.7 
supports this, you may need to make a schema change)
* using nodetool flush
 
If you have some larger sstables that do not get compacted try the 
userDefinedCompaction() method on the CompactionManager MBean via JMX (i may 
have gotten the names wrong there in 0.7). 

 So if I understand... the empty row will only be removed after gc_grace if 
 enough compactions have occurred so that all the column tombstones for the 
 empty row are in a single SSTable file?
We need to know that all the fragments of the row are contain in all of the 
sstables in the compaction task. They don't have to be in the same SSTable. 

You need tombstones to stop columns written previously from appearing in the 
results. If we purge the tombstone and a previous column value is in another 
sstable the delete will be undone. 

If you cannot compact the tombstones away let us know. 

Hope that helps. 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 31/05/2012, at 2:16 PM, Zhu Han wrote:

 On Thu, May 31, 2012 at 9:31 AM, Curt Allred c...@mediosystems.com wrote:
 No, these were not wide rows.  They are rows that formerly had one or 2 
 columns. The columns are deleted but the empty rows dont go away, even after 
 gc_grace_secs.
 
 
 The empty row goes away only during a compaction after the gc_grace_secs.
 
 You can set the gc_grace_secs as a little value and force major compaction 
 after the row is expired.  After then please check whether the row still 
 exists.
  
 
  
 
 So if I understand... the empty row will only be removed after gc_grace if 
 enough compactions have occurred so that all the column tombstones for the 
 empty row are in a single SSTable file?
 
 
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 
 
  
 
 Minor compaction will remove the tombstones if the row only exists in the 
 sstable being compaction. 
 
  
 
 Are these very wide rows that are constantly written to ? 
 
  
 
 Cheers
 
  p.s. cassandra 1.0 really does rock. 
 
 



Re: About Composite range queries

2012-05-31 Thread aaron morton
it is hashed once. 

To the partitioner it's just some bytes. Other parts of the code car about it's 
structure. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 31/05/2012, at 7:00 PM, Cyril Auburtin wrote:

 Thx for the answer
 1 more thing, a Composite key is not hashed only once I guess?
 It's hashed the number of part the composite have?
 So this means there are twice or 3 or ... as many keys as for normal column 
 keys, is it true?
 
 Le 31 mai 2012 02:59, aaron morton aa...@thelastpickle.com a écrit :
 Composite Columns compare each part in turn, so the values are ordered as 
 you've shown them. 
 
 However the rows are not ordered according to key value. They are ordered 
 using the random token generated by the partitioner see 
 http://wiki.apache.org/cassandra/FAQ#range_rp
 
 What is the real advantage compared to super column families?
 They are faster. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 29/05/2012, at 10:08 PM, Cyril Auburtin wrote:
 
 How is it done in Cassandra to be able to range query on a composite key?
 
 key1 = (A:A:C), (A:B:C), (A:C:C), (A:D:C), (B,A,C)
 
 like get_range (key1, start_column=(A,), end_column=(A, C)); will return 
 [ (A:B:C), (A:C:C) ] (in pycassa)
 
 I mean does the composite implementation add much overhead to make it work?
 Does it need to add other Column families, to be able to range query between 
 composites simple keys (first, second and third part of the composite)?
 
 What is the real advantage compared to super column families?
 
 key1 = A: (A,C), (B,C), (C,C), (D,C)  , B: (A,C)
 
 thx
 



How can we use composite indexes and secondary indexes together

2012-05-31 Thread Nury Redjepow
We want to use cassandra to store complex data. But we can't figure out, how to 
organize indexes.
Our table (column family) looks like this:
Users = { RandomId int, Firstname varchar, Lastname varchar, Age int, Country 
int, ChildCount int }
In our queries we have mandatory fields (Firstname,Lastname,Age) and extra 
search options (Country,ChildCount). How do we organize index to make this kind 
of queries fast?
First I thought, it would be natural to make composite index on 
(Firstname,Lastname,Age) and add separate secondary index on remaining fields 
(Country and ChildCount). But I can't insert rows into table after creating 
secondary indexes. And also, I can't query the table.
I'm using cassandra 1.1.0, and cqlsh with --cql3 option.
Any other suggestions to solve our problem (complex queries with mandatory and 
additional options) are welcome.The main point is, how can we join data in 
cassandra. If I make few index column families, I need to intersect the values, 
to get rows that pass all search criteria??? Or should I use something based on 
Hadoop (Pig,Hive) to make such queries?

Respectfully, Nury
--

--

--

--


Re: About Composite range queries

2012-05-31 Thread Cyril Auburtin
but sorry, I dont undertand

If you hash 4 composite keys, let's say
('A','B','C'), ('A','D','C'), ('A','E','X'), ('A','R','X'), you have only 4
hashes or you have more?

If it's 4, how come you are able to range query for example between
start_column=('A', 'D') and end_column=('A','E') and get this column
('A','D','C')

the composites are like chapters between the whole keys set, there must be
intermediate keys added?


2012/5/31 aaron morton aa...@thelastpickle.com

 it is hashed once.

 To the partitioner it's just some bytes. Other parts of the code car about
 it's structure.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 31/05/2012, at 7:00 PM, Cyril Auburtin wrote:

 Thx for the answer
 1 more thing, a Composite key is not hashed only once I guess?
 It's hashed the number of part the composite have?
 So this means there are twice or 3 or ... as many keys as for normal
 column keys, is it true?
 Le 31 mai 2012 02:59, aaron morton aa...@thelastpickle.com a écrit :

 Composite Columns compare each part in turn, so the values are ordered as
 you've shown them.

 However the rows are not ordered according to key value. They are ordered
 using the random token generated by the partitioner see
 http://wiki.apache.org/cassandra/FAQ#range_rp

 What is the real advantage compared to super column families?

 They are faster.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 29/05/2012, at 10:08 PM, Cyril Auburtin wrote:

 How is it done in Cassandra to be able to range query on a composite key?

 key1 = (A:A:C), (A:B:C), (A:C:C), (A:D:C), (B,A,C)

 like get_range (key1, start_column=(A,), end_column=(A, C)); will
 return [ (A:B:C), (A:C:C) ] (in pycassa)

 I mean does the composite implementation add much overhead to make it
 work?
 Does it need to add other Column families, to be able to range query
 between composites simple keys (first, second and third part of the
 composite)?

 What is the real advantage compared to super column families?

 key1 = A: (A,C), (B,C), (C,C), (D,C)  , B: (A,C)

 thx






RE: nodetool move 0 gets stuck in moving state forever

2012-05-31 Thread Poziombka, Wade L
Let me elaborate a bit.

two node cluster
node1 has token 0 
node2 has token 85070591730234615865843651857942052864

node1 goes down perminently.

do a nodetool move 0 on node2.

monitor with ring... is in Moving state forever it seems.



From: Poziombka, Wade L 
Sent: Tuesday, May 29, 2012 4:29 PM
To: user@cassandra.apache.org
Subject: nodetool move 0 gets stuck in moving state forever

If the node with token 0 dies and we just want it gone from the cluster we 
would do a nodetool move 0.  Then we monitor using nodetool ring it seems to be 
stuck on Moving forever.

Any ideas? 


Invalid Counter Shard errors?

2012-05-31 Thread Charles Brophy
Hi guys,

We're running a three node cluster of cassandra 1.1 servers, originally
1.0.7 and immediately after the upgrade the error logs of all three servers
began filling up with the following message:

ERROR [ReplicateOnWriteStage:177] 2012-05-31 08:17:02,236
CounterContext.java (line 381) invalid counter shard detected;
(3438afc0-7e71-11e1--da5a9d01e7f7, 3, 4) and
(3438afc0-7e71-11e1--da5a9d01e7f7, 3, 7) differ only in count; will
pick highest to self-heal; this indicates a bug or corruption generated a
bad counter shard

ERROR [ValidationExecutor:20] 2012-05-31 08:17:01,570 CounterContext.java
(line 381) invalid counter shard detected;
(343cf580-7e71-11e1--ebc411012bff, 14, 27) and
(343cf580-7e71-11e1--ebc411012bff, 14, 21) differ only in count; will
pick highest to self-heal; this indicates a bug or corruption generated a
bad counter shard

The counts change but the errors are constant. What is the best course of
action? Google only turns up the source code for these errors.

Thanks!
Charles


Re: java.net.SocketTimeoutException while Trying to Drop a Collection

2012-05-31 Thread Christof Bornhoevd
Thanks a lot Aaron for the very fast response!

I have increased the CassandraThriftSocketTimeout from 5000 to 9000. Is
this a reasonable setting?

configurator.setCassandraThriftSocketTimeout(9000);
Cheers,
Christof

2012/5/31 aaron morton aa...@thelastpickle.com

 There are two times of timeouts. The thrift TimedOutException occurs when
 the coordinator times out waiting for the CL level nodes to respond. The
 error is transmitted back to the client and raised.

 This is a client side socket timeout waiting for the coordinator to
 respond. See the
 CassandraHostConfigurator.setCassandraThriftSocketTimeout() setting.

 Cheers

-
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

  On 31/05/2012, at 11:44 AM, Christof Bornhoevd wrote:

  Hello,


 We are using Cassandra 1.0.8 with Hector 1.0-5 on both Windows and Linux. In
 our development/test environment we always recreate the schema in Cassandra
 (first dropping all ColumnFamilies then recreating them) and then seeding
 the test data. We simply use 
 cluster.dropColumnFamily(keyspace.getKeyspaceName(),
 collectionName); to drop ColumnFamilies. The client is using
 ThriftFramedTransport (configurator.setUseThriftFramedTransport(*true*);).


 Every so often we run into the following exception (with different ColumnFa
 milies):


 Caused by: me.prettyprint.hector.api.exceptions.HectorTransportException: 
 org.apache.thrift.transport.TTransportException:
 java.net.SocketTimeoutException: Read timed out
 at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(Exce
 ptionsTranslatorImpl.java:33)
 at me.prettyprint.cassandra.service.AbstractCluster$7.execute(AbstractClust
 er.java:279)
 at me.prettyprint.cassandra.service.AbstractCluster$7.execute(AbstractClust
 er.java:266)
 at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation
 .java:103)
 at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailov
 er(HConnectionManager.java:258)
 at me.prettyprint.cassandra.service.AbstractCluster.dropColumnFamily(Abstra
 ctCluster.java:283)
 at me.prettyprint.cassandra.service.AbstractCluster.dropColumnFamily(Abstra
 ctCluster.java:261)
 at com.supervillains.plouton.cassandradatastore.CassandraDataStore.deleteCo
 llection(CassandraDataStore.java:195)
 ... 57 more


 Is this problem related to
 https://issues.apache.org/jira/browse/CASSANDRA-
 https://issues.apache.org/jira/browse/CASSANDRA-https://issues.apache.org/jira/browse/CASSANDRA-35513551
 (which should have been fixed with Cassandra 1.0.6) or could there be anything
 we do wrong here?


 Thanks in advance for any kind help!
 Chris





Re: tokens and RF for multiple phases of deployment

2012-05-31 Thread Chong Zhang
Thanks Aaron.

I might use LOCAL_QUORUM to avoid the waiting on the ack from DC2.

Another question, after I setup a new node with token +1 in a new DC,  and
updated a CF with RF {DC1:2, DC2:1}. When i update a column on one node in
DC1, it's also updated in the new node in DC2. But all the other rows are
not in the new node. Do I need to copy the data files from a node in DC1 to
the new node?

The ring (2 in DC1, 1 in DC2) looks OK, but the load on the new node in DC2
is almost 0%.

Address DC  RackStatus State   LoadOwns
   Token

   85070591730234615865843651857942052864
10.10.10.1DC1 RAC1Up Normal  313.99 MB   50.00%
 0
10.10.10.3DC2 RAC1Up Normal  7.07 MB
0.00%   1
10.10.10.2DC1 RAC1Up Normal  288.91 MB   50.00%
 85070591730234615865843651857942052864

Thanks,
Chong

On Thu, May 31, 2012 at 5:48 AM, aaron morton aa...@thelastpickle.comwrote:


 Could you provide some guide on how to assign the tokens in this growing
 deployment phases?


 background
 http://www.datastax.com/docs/1.0/install/cluster_init#calculating-tokens-for-a-multi-data-center-cluster

 Start with tokens for a 4 node cluster. Add the next 4 between between
 each of the ranges. Add 8 in the new DC to have the same tokens as the
 first DC +1

 Also if we use the same RF (3) in both DC, and use EACH_QUORUM for write
 and LOCAL_QUORUM for read, can the read also reach to the 2nd cluster?

 No. It will fail if there are not enough nodes available in the first DC.

 We'd like to keep both write and read on the same cluster.

 Writes go to all replicas. Using EACH_QUORUM means the client in the first
 DC will be waiting for the quorum from the second DC to ack the write.


 Cheers
   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 31/05/2012, at 3:20 AM, Chong Zhang wrote:

 Hi all,

 We are planning to deploy a small cluster with 4 nodes in one DC first,
 and will expend that to 8 nodes, then add another DC with 8 nodes for fail
 over (not active-active), so all the traffic will go to the 1st cluster,
 and switch to 2nd cluster if the whole 1st cluster is down or
 on maintenance.

 Could you provide some guide on how to assign the tokens in this growing
 deployment phases? I looked at some docs but not very clear on how to
 assign tokens on the fail-over case.
 Also if we use the same RF (3) in both DC, and use EACH_QUORUM for write
 and LOCAL_QUORUM for read, can the read also reach to the 2nd cluster?
 We'd like to keep both write and read on the same cluster.

 Thanks in advance,
 Chong





newbie question :got error 'org.apache.thrift.transport.TTransportException'

2012-05-31 Thread Chen, Simon
Hi,
  I am new to Cassandra.
 I have started a Cassandra instance (Cassandra.bat), played with it for a 
while, created a keyspace Zodiac.
When I kill Cassandra instance and restarted, the keyspace is gone but when I 
tried to recreate it,
I got 'org.apache.thrift.transport.TTransportException' error. What have I done 
wrong here?

Following are screen shots:

C:\cassandra-1.1.0bin\cassandra-cli -host localhost -f 
C:\NoSqlProjects\dropZ.txt
Starting Cassandra Client
Connected to: ssc2Cluster on localhost/9160
Line 1 = Keyspace 'Zodiac' not found.

C:\cassandra-1.1.0bin\cassandra-cli -host localhost -f 
C:\NoSqlProjects\usageDB.txt
Starting Cassandra Client
Connected to: ssc2Cluster on localhost/9160
Line 1 = org.apache.thrift.transport.TTransportException

Following is part of server error message:

INFO 11:09:56,761 Node localhost/127.0.0.1 state jump to normal
INFO 11:09:56,761 Bootstrap/Replace/Move completed! Now serving reads.
INFO 11:09:56,761 Will not load MX4J, mx4j-tools.jar is not in the classpath
INFO 11:09:56,781 Binding thrift service to localhost/127.0.0.1:9160
INFO 11:09:56,781 Using TFastFramedTransport with a max frame size of 15728640 
bytes.
INFO 11:09:56,791 Using synchronous/threadpool thrift server on 
localhost/127.0.0.1 : 9160
INFO 11:09:56,791 Listening for thrift clients...
INFO 11:20:06,044 Enqueuing flush of 
Memtable-schema_keyspaces@1062244145(184/230 serialized/live bytes, 4 ops)
INFO 11:20:06,054 Writing Memtable-schema_keyspaces@1062244145(184/230 
serialized/live bytes, 4 ops)
INFO 11:20:06,074 Completed flushing 
c:\cassandra_data\data\system\schema_keyspaces\system-schema_keyspaces-hc-62-Data.
b (240 bytes)
RROR 11:20:06,134 Exception in thread Thread[MigrationStage:1,5,main]
ava.lang.AssertionError
   at org.apache.cassandra.db.DefsTable.updateKeyspace(DefsTable.java:441)
   at org.apache.cassandra.db.DefsTable.mergeKeyspaces(DefsTable.java:339)
   at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:269)
   at 
org.apache.cassandra.service.MigrationManager$1.call(MigrationManager.java:214)
   at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
   at java.util.concurrent.FutureTask.run(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
RROR 11:20:06,134 Error occurred during processing of message.
ava.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.AssertionError

usageDB.txt:

create keyspace Zodiac
with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
and strategy_options = {replication_factor:1};

use Zodiac;

create column family ServiceUsage
with comparator = UTF8Type
and default_validation_class = UTF8Type
and key_validation_class = LongType
AND column_metadata = [
  {column_name: 'TASK_ID', validation_class:  IntegerType}
  {column_name: 'USAGE_COUNT', validation_class:  IntegerType}
  {column_name: 'USAGE_TYPE', validation_class: UTF8Type}
   ];




From: Chong Zhang [mailto:chongz.zh...@gmail.com]
Sent: Thursday, May 31, 2012 8:47 AM
To: user@cassandra.apache.org
Subject: Re: tokens and RF for multiple phases of deployment

Thanks Aaron.

I might use LOCAL_QUORUM to avoid the waiting on the ack from DC2.

Another question, after I setup a new node with token +1 in a new DC,  and 
updated a CF with RF {DC1:2, DC2:1}. When i update a column on one node in DC1, 
it's also updated in the new node in DC2. But all the other rows are not in the 
new node. Do I need to copy the data files from a node in DC1 to the new node?

The ring (2 in DC1, 1 in DC2) looks OK, but the load on the new node in DC2 is 
almost 0%.

Address DC  RackStatus State   LoadOwns
Token
   
85070591730234615865843651857942052864
10.10.10.1DC1 RAC1Up Normal  313.99 MB   50.00%  0
10.10.10.3DC2 RAC1Up Normal  7.07 MB   0.00%   1
10.10.10.2DC1 RAC1Up Normal  288.91 MB   50.00%  
85070591730234615865843651857942052864

Thanks,
Chong

On Thu, May 31, 2012 at 5:48 AM, aaron morton 
aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote:

Could you provide some guide on how to assign the tokens in this growing 
deployment phases?

background 
http://www.datastax.com/docs/1.0/install/cluster_init#calculating-tokens-for-a-multi-data-center-cluster

Start with tokens for a 4 node cluster. Add the next 4 between between each of 
the ranges. Add 8 in the new DC to have the same tokens as the first DC +1

Also if we use the same RF (3) in both DC, and use EACH_QUORUM for write and 
LOCAL_QUORUM for read, can the read also reach to the 2nd cluster?
No. It will fail if there are not enough nodes available in 

Re: cassandra read latency help

2012-05-31 Thread Gurpreet Singh
Aaron,
Thanks for your email. The test kinda resembles how the actual application
will be.
It is going to be a simple key-value store with 500 million keys per node.
The traffic will be read heavy in steady state, and there will be some keys
that will have a lot more traffic than others. The expected hot rows are
estimated to be anywhere between 50  to 1 million keys.

I have already populated this test system with 500 million keys, compacted
it all to 1 file to check the size of the bloom filter and the index.

This is how i am estimating my memory for 500 million keys. plz correct me
if i am wrong or if i am missing any step.

bloom filter: 1 gig
index samples: Index file is 8.5 gig. I believe this index file is for all
keys. Index interval is 128. Hence in RAM, this would be (8.5g / 128)*10
(factor for datastructure overhead) = 664 mb (lets say 1 gig)

key cache size (3 million): 3 gigs
memtable_total_space_mb : 2 gigs

This totals 7 gig.
my heap size is 8 gigs.
Is there anything else that i am missing here?
When i do top right now, it shows java as 96% memory, thats a concern
because there is no write load. Should i be looking at any other number
here?

Off heap row cache: 500,000 - 750,000 ~ 3 and 5 gigs (avg row size =
250-500 bytes)

My test system has 16 gigs RAM, production system will mostly have 32 gigs
RAM and 12 spindles instead of 6 that i am testing with.

I changed the underneath filesystem from xfs to ext2, and i am seeing
better results, though not the best.
The cfstats latency is down to 20 ms for 35 qps read load. row cache hit
rate is 0.21, key cache = 0.75.
Measuring from the client side, i am seeing roughly 10-15 ms per key, i
would want even lesser though, any tips would greatly help.
In production,  i am hoping the row cache hit rate will be higher.


The biggest thing that is affecting my system right now is the Invalid
frame size of 0 error that cassandra server seems to be printing. Its
causing read timeouts every minute or 2 minutes. I havent been able to
figure out a way to fix this one. I see someone else also reported seeing
this, but not sure where the problem is hector, cassandra or thrift.

Thanks
Gurpreet






On Wed, May 30, 2012 at 4:38 PM, aaron morton aa...@thelastpickle.comwrote:

 80 ms per request

 sounds high.

 I'm doing some guessing here, i am guessing memory usage is the problem..

 * I assume you are not longer seeing excessive GC activity.
 * The key cache will not get used when you hit the row cache. I would
 disable the row cache if you have a random workload, which it looks like
 you do.
 * 500 million is a lot of keys to have on a single node. At the default
 index sample of every 128 keys it will have about 4 million samples, which
 is probably taking up a lot of memory.

 Is this testing a real world scenario or an abstract benchmark ? IMHO you
 will get more insight from testing something that resembles your
 application.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 26/05/2012, at 8:48 PM, Gurpreet Singh wrote:

 Hi Aaron,
 Here is the latest on this..
 i switched to a node with 6 disks and running some read tests, and i am
 seeing something weird.

 setup:
 1 node, cassandra 1.0.9, 8 cpu, 16 gig RAM, 6 7200 rpm SATA data disks
 striped 512 kb, commitlog mirrored.
 1 keyspace with just 1 column family
 random partitioner
 total number of keys: 500 million (the keys are just longs from 1 to 500
 million)
 avg key size: 8 bytes
 bloom filter size: 1 gig
 total disk usage: 70 gigs compacted 1 sstable
 mean compacted row size: 149 bytes
 heap size: 8 gigs
 keycache size: 2 million (takes around 2 gigs in RAM)
 rowcache size: 1 million (off-heap)
 memtable_total_space_mb : 2 gigs

 test:
 Trying to do 5 reads per second. Each read is a multigetslice query for
 just 1 key, 2 columns.

 observations:
 row cache hit rate: 0.4
 key cache hit rate: 0.0 (this will increase later on as system moves to
 steady state)
 cfstats - 80 ms

 iostat (every 5 seconds):

 r/s : 400
 %util: 20%  (all disks are at equal utilization)
 await: 65-70 ms (for each disk)
 svctm : 2.11 ms (for each disk)
 r-kB/s - 35000

 why this is weird is because..
 5 reads per second is causing a latency of 80 ms per request (according to
 cfstats). isnt this too high?
 35 MB/s is being read from the disk. That is again very weird. This number
 is way too high, avg row size is just 149 bytes. Even index reads should
 not cause this high data being read from the disk.

 what i understand is that each read request translates to 2 disk accesses
 (because there is only 1 sstable). 1 for the index, 1 for the data. At such
 a low reads/second, why is the latency so high?

 would appreciate help debugging this issue.
 Thanks
 Gurpreet


 On Tue, May 22, 2012 at 2:46 AM, aaron morton aa...@thelastpickle.comwrote:

 With

 heap size = 4 gigs

 I would check for GC activity in the logs and consider setting it to 8
 given you 

Re: cassandra read latency help

2012-05-31 Thread crypto five
You may also consider disabling key/row cache at all.
1mm rows * 400 bytes = 400MB of data, can easily be in fs cache, and you
will access your hot keys with thousands of qps without hitting disk at all.
Enabling compression can make situation even better.

On Thu, May 31, 2012 at 12:01 PM, Gurpreet Singh
gurpreet.si...@gmail.comwrote:

 Aaron,
 Thanks for your email. The test kinda resembles how the actual application
 will be.
 It is going to be a simple key-value store with 500 million keys per node.
 The traffic will be read heavy in steady state, and there will be some keys
 that will have a lot more traffic than others. The expected hot rows are
 estimated to be anywhere between 50  to 1 million keys.

 I have already populated this test system with 500 million keys, compacted
 it all to 1 file to check the size of the bloom filter and the index.

 This is how i am estimating my memory for 500 million keys. plz correct me
 if i am wrong or if i am missing any step.

 bloom filter: 1 gig
 index samples: Index file is 8.5 gig. I believe this index file is for all
 keys. Index interval is 128. Hence in RAM, this would be (8.5g / 128)*10
 (factor for datastructure overhead) = 664 mb (lets say 1 gig)

 key cache size (3 million): 3 gigs
 memtable_total_space_mb : 2 gigs

 This totals 7 gig.
 my heap size is 8 gigs.
 Is there anything else that i am missing here?
 When i do top right now, it shows java as 96% memory, thats a concern
 because there is no write load. Should i be looking at any other number
 here?

 Off heap row cache: 500,000 - 750,000 ~ 3 and 5 gigs (avg row size =
 250-500 bytes)

 My test system has 16 gigs RAM, production system will mostly have 32 gigs
 RAM and 12 spindles instead of 6 that i am testing with.

 I changed the underneath filesystem from xfs to ext2, and i am seeing
 better results, though not the best.
 The cfstats latency is down to 20 ms for 35 qps read load. row cache hit
 rate is 0.21, key cache = 0.75.
 Measuring from the client side, i am seeing roughly 10-15 ms per key, i
 would want even lesser though, any tips would greatly help.
 In production,  i am hoping the row cache hit rate will be higher.


 The biggest thing that is affecting my system right now is the Invalid
 frame size of 0 error that cassandra server seems to be printing. Its
 causing read timeouts every minute or 2 minutes. I havent been able to
 figure out a way to fix this one. I see someone else also reported seeing
 this, but not sure where the problem is hector, cassandra or thrift.

 Thanks
 Gurpreet






 On Wed, May 30, 2012 at 4:38 PM, aaron morton aa...@thelastpickle.comwrote:

 80 ms per request

 sounds high.

 I'm doing some guessing here, i am guessing memory usage is the problem..

 * I assume you are not longer seeing excessive GC activity.
 * The key cache will not get used when you hit the row cache. I would
 disable the row cache if you have a random workload, which it looks like
 you do.
 * 500 million is a lot of keys to have on a single node. At the default
 index sample of every 128 keys it will have about 4 million samples, which
 is probably taking up a lot of memory.

 Is this testing a real world scenario or an abstract benchmark ? IMHO you
 will get more insight from testing something that resembles your
 application.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 26/05/2012, at 8:48 PM, Gurpreet Singh wrote:

 Hi Aaron,
 Here is the latest on this..
 i switched to a node with 6 disks and running some read tests, and i am
 seeing something weird.

 setup:
 1 node, cassandra 1.0.9, 8 cpu, 16 gig RAM, 6 7200 rpm SATA data disks
 striped 512 kb, commitlog mirrored.
 1 keyspace with just 1 column family
 random partitioner
 total number of keys: 500 million (the keys are just longs from 1 to 500
 million)
 avg key size: 8 bytes
 bloom filter size: 1 gig
 total disk usage: 70 gigs compacted 1 sstable
 mean compacted row size: 149 bytes
 heap size: 8 gigs
 keycache size: 2 million (takes around 2 gigs in RAM)
 rowcache size: 1 million (off-heap)
 memtable_total_space_mb : 2 gigs

 test:
 Trying to do 5 reads per second. Each read is a multigetslice query for
 just 1 key, 2 columns.

 observations:
  row cache hit rate: 0.4
 key cache hit rate: 0.0 (this will increase later on as system moves to
 steady state)
 cfstats - 80 ms

 iostat (every 5 seconds):

 r/s : 400
 %util: 20%  (all disks are at equal utilization)
 await: 65-70 ms (for each disk)
 svctm : 2.11 ms (for each disk)
 r-kB/s - 35000

 why this is weird is because..
 5 reads per second is causing a latency of 80 ms per request (according
 to cfstats). isnt this too high?
 35 MB/s is being read from the disk. That is again very weird. This
 number is way too high, avg row size is just 149 bytes. Even index reads
 should not cause this high data being read from the disk.

 what i understand is that each read request translates to 2 disk accesses
 

Re: cassandra read latency help

2012-05-31 Thread crypto five
But I think it's bad idea, since hot data will be evenly distributed
between multiple sstables and filesystem pages.

On Thu, May 31, 2012 at 1:08 PM, crypto five cryptof...@gmail.com wrote:

 You may also consider disabling key/row cache at all.
 1mm rows * 400 bytes = 400MB of data, can easily be in fs cache, and you
 will access your hot keys with thousands of qps without hitting disk at all.
 Enabling compression can make situation even better.


 On Thu, May 31, 2012 at 12:01 PM, Gurpreet Singh gurpreet.si...@gmail.com
  wrote:

 Aaron,
 Thanks for your email. The test kinda resembles how the actual
 application will be.
 It is going to be a simple key-value store with 500 million keys per
 node. The traffic will be read heavy in steady state, and there will be
 some keys that will have a lot more traffic than others. The expected hot
 rows are estimated to be anywhere between 50  to 1 million keys.

 I have already populated this test system with 500 million keys,
 compacted it all to 1 file to check the size of the bloom filter and the
 index.

 This is how i am estimating my memory for 500 million keys. plz correct
 me if i am wrong or if i am missing any step.

 bloom filter: 1 gig
 index samples: Index file is 8.5 gig. I believe this index file is for
 all keys. Index interval is 128. Hence in RAM, this would be (8.5g /
 128)*10 (factor for datastructure overhead) = 664 mb (lets say 1 gig)

 key cache size (3 million): 3 gigs
 memtable_total_space_mb : 2 gigs

 This totals 7 gig.
 my heap size is 8 gigs.
 Is there anything else that i am missing here?
 When i do top right now, it shows java as 96% memory, thats a concern
 because there is no write load. Should i be looking at any other number
 here?

 Off heap row cache: 500,000 - 750,000 ~ 3 and 5 gigs (avg row size =
 250-500 bytes)

 My test system has 16 gigs RAM, production system will mostly have 32
 gigs RAM and 12 spindles instead of 6 that i am testing with.

 I changed the underneath filesystem from xfs to ext2, and i am seeing
 better results, though not the best.
 The cfstats latency is down to 20 ms for 35 qps read load. row cache hit
 rate is 0.21, key cache = 0.75.
 Measuring from the client side, i am seeing roughly 10-15 ms per key, i
 would want even lesser though, any tips would greatly help.
 In production,  i am hoping the row cache hit rate will be higher.


 The biggest thing that is affecting my system right now is the Invalid
 frame size of 0 error that cassandra server seems to be printing. Its
 causing read timeouts every minute or 2 minutes. I havent been able to
 figure out a way to fix this one. I see someone else also reported seeing
 this, but not sure where the problem is hector, cassandra or thrift.

 Thanks
 Gurpreet






 On Wed, May 30, 2012 at 4:38 PM, aaron morton aa...@thelastpickle.comwrote:

 80 ms per request

 sounds high.

 I'm doing some guessing here, i am guessing memory usage is the problem..

 * I assume you are not longer seeing excessive GC activity.
 * The key cache will not get used when you hit the row cache. I would
 disable the row cache if you have a random workload, which it looks like
 you do.
 * 500 million is a lot of keys to have on a single node. At the default
 index sample of every 128 keys it will have about 4 million samples, which
 is probably taking up a lot of memory.

 Is this testing a real world scenario or an abstract benchmark ? IMHO
 you will get more insight from testing something that resembles your
 application.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 26/05/2012, at 8:48 PM, Gurpreet Singh wrote:

 Hi Aaron,
 Here is the latest on this..
 i switched to a node with 6 disks and running some read tests, and i am
 seeing something weird.

 setup:
 1 node, cassandra 1.0.9, 8 cpu, 16 gig RAM, 6 7200 rpm SATA data disks
 striped 512 kb, commitlog mirrored.
 1 keyspace with just 1 column family
 random partitioner
 total number of keys: 500 million (the keys are just longs from 1 to 500
 million)
 avg key size: 8 bytes
 bloom filter size: 1 gig
 total disk usage: 70 gigs compacted 1 sstable
 mean compacted row size: 149 bytes
 heap size: 8 gigs
 keycache size: 2 million (takes around 2 gigs in RAM)
 rowcache size: 1 million (off-heap)
 memtable_total_space_mb : 2 gigs

 test:
 Trying to do 5 reads per second. Each read is a multigetslice query for
 just 1 key, 2 columns.

 observations:
  row cache hit rate: 0.4
 key cache hit rate: 0.0 (this will increase later on as system moves to
 steady state)
 cfstats - 80 ms

 iostat (every 5 seconds):

 r/s : 400
 %util: 20%  (all disks are at equal utilization)
 await: 65-70 ms (for each disk)
 svctm : 2.11 ms (for each disk)
 r-kB/s - 35000

 why this is weird is because..
 5 reads per second is causing a latency of 80 ms per request (according
 to cfstats). isnt this too high?
 35 MB/s is being read from the disk. That is again very weird. This
 number 

RE: 1.1 not removing commit log files?

2012-05-31 Thread Bryce Godfrey
So this happened to me again, but it was only when the cluster had a node down 
for a while.  Then the commit logs started piling up past the limit I set in 
the config file, and filled the drive.
After the node recovered and hints had replayed the space was never reclaimed.  
A flush or drain did not reclaim the space either and delete any log files.

Bryce Godfrey | Sr. Software Engineer | Azaleos 
Corporationhttp://www.azaleos.com/

From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
Sent: Tuesday, May 22, 2012 1:10 PM
To: user@cassandra.apache.org
Subject: RE: 1.1 not removing commit log files?

The nodes appear to be holding steady at the 8G that I set it to in the config 
file now.  I'll keep an eye on them.

From: aaron morton 
[mailto:aa...@thelastpickle.com]mailto:[mailto:aa...@thelastpickle.com]
Sent: Tuesday, May 22, 2012 4:08 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: 1.1 not removing commit log files?

4096 is also the internal hard coded default for commitlog_total_space_in_mb

If you are seeing more that 4GB of commit log files let us know.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/05/2012, at 6:35 AM, Bryce Godfrey wrote:

Thanks, I'll give it a try.

-Original Message-
From: Alain RODRIGUEZ 
[mailto:arodr...@gmail.com]mailto:[mailto:arodr...@gmail.com]
Sent: Monday, May 21, 2012 2:12 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: 1.1 not removing commit log files?

commitlog_total_space_in_mb: 4096

By default this line is commented in 1.0.x if I remember well. I guess it is 
the same in 1.1. You really should remove this comment or your commit logs will 
entirely fill up your disk as it happened to me a while ago.

Alain

2012/5/21 Pieter Callewaert 
pieter.callewa...@be-mobile.bemailto:pieter.callewa...@be-mobile.be:
Hi,



In 1.1 the commitlog files are pre-allocated with files of 128MB.
(https://issues.apache.org/jira/browse/CASSANDRA-3411) This should
however not exceed your commitlog size in Cassandra.yaml.



commitlog_total_space_in_mb: 4096



Kind regards,

Pieter Callewaert



From: Bryce Godfrey 
[mailto:bryce.godf...@azaleos.com]mailto:[mailto:bryce.godf...@azaleos.com]
Sent: maandag 21 mei 2012 9:52
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: 1.1 not removing commit log files?



The commit log drives on my nodes keep slowly filling up.  I don't see
any errors in my logs that are indicating any issues that I can map to
this issue.



Is this how 1.1 is supposed to work now?  Previous versions seemed to
keep this drive at a minimum as it flushed.



/dev/mapper/mpathf 25G   21G  4.2G  83% /opt/cassandra/commitlog





Re: How can we use composite indexes and secondary indexes together

2012-05-31 Thread aaron morton
If you want to do arbitrary complex online / realtime queries look at Data Stax 
Enterprise, or https://github.com/tjake/Solandra or straight Solr. 

Alternatively denormalise the model to materialise the results when you insert 
so you query is a straight lookup. Or do some client side filtering / 
aggregation. 

If you want to do the queries offline, you can use Pig or Hive with Hadoop over 
Cassandra. The Apache Cassandra distro includes the pig support, hive is coming 
(i think) and there are Hadoop interfaces.  You can also look at Data Stax 
Enterprise. 

 
Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 31/05/2012, at 11:07 PM, Nury Redjepow wrote:

 We want to use cassandra to store complex data. But we can't figure out, how 
 to organize indexes.
 
 Our table (column family) looks like this:
 
 Users = { RandomId int, Firstname varchar, Lastname varchar, Age int, Country 
 int, ChildCount int }
 
 In our queries we have mandatory fields (Firstname,Lastname,Age) and extra 
 search options (Country,ChildCount). How do we organize index to make this 
 kind of queries fast?
 
 First I thought, it would be natural to make composite index on 
 (Firstname,Lastname,Age) and add separate secondary index on remaining fields 
 (Country and ChildCount). But I can't insert rows into table after creating 
 secondary indexes. And also, I can't query the table.
 
 I'm using cassandra 1.1.0, and cqlsh with --cql3 option.
 
 Any other suggestions to solve our problem (complex queries with mandatory 
 and additional options) are welcome.
 
 The main point is, how can we join data in cassandra. If I make few index 
 column families, I need to intersect the values, to get rows that pass all 
 search criteria??? Or should I use something based on Hadoop (Pig,Hive) to 
 make such queries?
 
 Respectfully, Nury
 
 
 
 



Re: Cassandra Data Archiving

2012-05-31 Thread aaron morton
I'm not sure on your needs, but the simplest thing to consider is snapshotting 
and copying off node. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/06/2012, at 12:23 AM, Shubham Srivastava wrote:

 I need to archive my Cassandra data into another  permanent storage .
  
 Two intent
  
 1.To shed the unused data from the Live data.
  
 2.To use the archived data for getting some analytics out or a potential 
 source of DataWarehouse.
  
 Any recommendations for the same in terms of strategies or tools to use.
  
 Regards,
 Shubham Srivastava | Technical Lead - Technology Development
 +91 124 4910 548   |  MakeMyTrip.com, 243 SP Infocity, Udyog Vihar Phase 1, 
 Gurgaon, Haryana - 122 016, India
 image001.gifWhat's new? My Trip Rewards - An exclusive loyalty program for 
 MakeMyTrip customers.
 image002.gif
 image003.gif
 Office Map
 image004.gif
 Facebook
 image005.gif
 Twitter
  



Re: About Composite range queries

2012-05-31 Thread aaron morton
 If you hash 4 composite keys, let's say ('A','B','C'), ('A','D','C'), 
 ('A','E','X'), ('A','R','X'), you have only 4 hashes or you have more?
Four

 If it's 4, how come you are able to range query for example between 
 start_column=('A', 'D') and end_column=('A','E') and get this column 
 ('A','D','C')

That's a slice query against columns, the column value is not hashed. The 
values of the column are sorted according to the comparator which can be 
different to the raw byte order.

A range query is against rows. Rows keys are hashed (using the Random 
Partitioner) to create tokens, and are stored in token order. 

 the composites are like chapters between the whole keys set, there must be 
 intermediate keys added?

Not sure what you mean. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/06/2012, at 12:52 AM, Cyril Auburtin wrote:

 but sorry, I dont undertand
 
 If you hash 4 composite keys, let's say ('A','B','C'), ('A','D','C'), 
 ('A','E','X'), ('A','R','X'), you have only 4 hashes or you have more?
 
 If it's 4, how come you are able to range query for example between 
 start_column=('A', 'D') and end_column=('A','E') and get this column 
 ('A','D','C')
 
 the composites are like chapters between the whole keys set, there must be 
 intermediate keys added?
 
 
 2012/5/31 aaron morton aa...@thelastpickle.com
 it is hashed once. 
 
 To the partitioner it's just some bytes. Other parts of the code car about 
 it's structure. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 31/05/2012, at 7:00 PM, Cyril Auburtin wrote:
 
 Thx for the answer
 1 more thing, a Composite key is not hashed only once I guess?
 It's hashed the number of part the composite have?
 So this means there are twice or 3 or ... as many keys as for normal column 
 keys, is it true?
 
 Le 31 mai 2012 02:59, aaron morton aa...@thelastpickle.com a écrit :
 Composite Columns compare each part in turn, so the values are ordered as 
 you've shown them. 
 
 However the rows are not ordered according to key value. They are ordered 
 using the random token generated by the partitioner see 
 http://wiki.apache.org/cassandra/FAQ#range_rp
 
 What is the real advantage compared to super column families?
 They are faster. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 29/05/2012, at 10:08 PM, Cyril Auburtin wrote:
 
 How is it done in Cassandra to be able to range query on a composite key?
 
 key1 = (A:A:C), (A:B:C), (A:C:C), (A:D:C), (B,A,C)
 
 like get_range (key1, start_column=(A,), end_column=(A, C)); will return 
 [ (A:B:C), (A:C:C) ] (in pycassa)
 
 I mean does the composite implementation add much overhead to make it work?
 Does it need to add other Column families, to be able to range query 
 between composites simple keys (first, second and third part of the 
 composite)?
 
 What is the real advantage compared to super column families?
 
 key1 = A: (A,C), (B,C), (C,C), (D,C)  , B: (A,C)
 
 thx
 
 
 



Re: nodetool move 0 gets stuck in moving state forever

2012-05-31 Thread aaron morton
Look in the logs for errors or warnings. Also let us know what version you are 
using.  

Am guessing that node 2 still thought that node 1 was in the cluster when you 
did the move. Which should(?) have errored. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/06/2012, at 1:50 AM, Poziombka, Wade L wrote:

 Let me elaborate a bit.
 
 two node cluster
 node1 has token 0 
 node2 has token 85070591730234615865843651857942052864
 
 node1 goes down perminently.
 
 do a nodetool move 0 on node2.
 
 monitor with ring... is in Moving state forever it seems.
 
 
 
 From: Poziombka, Wade L 
 Sent: Tuesday, May 29, 2012 4:29 PM
 To: user@cassandra.apache.org
 Subject: nodetool move 0 gets stuck in moving state forever
 
 If the node with token 0 dies and we just want it gone from the cluster we 
 would do a nodetool move 0.  Then we monitor using nodetool ring it seems to 
 be stuck on Moving forever.
 
 Any ideas? 



Re: Invalid Counter Shard errors?

2012-05-31 Thread aaron morton
I suggest creating a ticket on https://issues.apache.org/jira/browse/CASSANDRA 
with the details.

If it is an immediate concern see if you can find someone in the #cassandra 
chat room http://cassandra.apache.org/

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/06/2012, at 3:20 AM, Charles Brophy wrote:

 Hi guys,
 
 We're running a three node cluster of cassandra 1.1 servers, originally 1.0.7 
 and immediately after the upgrade the error logs of all three servers began 
 filling up with the following message:
 
 ERROR [ReplicateOnWriteStage:177] 2012-05-31 08:17:02,236 CounterContext.java 
 (line 381) invalid counter shard detected; 
 (3438afc0-7e71-11e1--da5a9d01e7f7, 3, 4) and 
 (3438afc0-7e71-11e1--da5a9d01e7f7, 3, 7) differ only in count; will pick 
 highest to self-heal; this indicates a bug or corruption generated a bad 
 counter shard
 
 ERROR [ValidationExecutor:20] 2012-05-31 08:17:01,570 CounterContext.java 
 (line 381) invalid counter shard detected; 
 (343cf580-7e71-11e1--ebc411012bff, 14, 27) and 
 (343cf580-7e71-11e1--ebc411012bff, 14, 21) differ only in count; will 
 pick highest to self-heal; this indicates a bug or corruption generated a bad 
 counter shard
 
 The counts change but the errors are constant. What is the best course of 
 action? Google only turns up the source code for these errors.
 
 Thanks!
 Charles
 
 



Re: java.net.SocketTimeoutException while Trying to Drop a Collection

2012-05-31 Thread aaron morton
The default value for rpc_timeout is 1 - 10 seconds. 

You want the socket timeout to be higher than the rpc_timeout otherwise the 
client will give up before the server. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/06/2012, at 3:26 AM, Christof Bornhoevd wrote:

 Thanks a lot Aaron for the very fast response!
  
 I have increased the CassandraThriftSocketTimeout from 5000 to 9000. Is this 
 a reasonable setting?
 configurator.setCassandraThriftSocketTimeout(9000
 
 );
 Cheers,
 Christof
 
 2012/5/31 aaron morton aa...@thelastpickle.com
 There are two times of timeouts. The thrift TimedOutException occurs when the 
 coordinator times out waiting for the CL level nodes to respond. The error is 
 transmitted back to the client and raised.  
 
 This is a client side socket timeout waiting for the coordinator to respond. 
 See the CassandraHostConfigurator.setCassandraThriftSocketTimeout() setting. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 31/05/2012, at 11:44 AM, Christof Bornhoevd wrote:
 
 Hello,
  
 We are using Cassandra 1.0.8 with Hector 1.0-5 on both Windows and Linux. In 
 our development/test environment we always recreate the schema in Cassandra 
 (first dropping all ColumnFamilies then recreating them) and then seeding 
 the test data. We simply use 
 cluster.dropColumnFamily(keyspace.getKeyspaceName(), collectionName); to 
 drop ColumnFamilies. The client is using ThriftFramedTransport 
 (configurator.setUseThriftFramedTransport(true);).
  
 Every so often we run into the following exception (with different 
 ColumnFamilies):
  
 Caused by: me.prettyprint.hector.api.exceptions.HectorTransportException: 
 org.apache.thrift.transport.TTransportException: 
 java.net.SocketTimeoutException: Read timed out
 at 
 me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:33)
 at 
 me.prettyprint.cassandra.service.AbstractCluster$7.execute(AbstractCluster.java:279)
 at 
 me.prettyprint.cassandra.service.AbstractCluster$7.execute(AbstractCluster.java:266)
 at 
 me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
 at 
 me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258)
 at 
 me.prettyprint.cassandra.service.AbstractCluster.dropColumnFamily(AbstractCluster.java:283)
 at 
 me.prettyprint.cassandra.service.AbstractCluster.dropColumnFamily(AbstractCluster.java:261)
 at 
 com.supervillains.plouton.cassandradatastore.CassandraDataStore.deleteCollection(CassandraDataStore.java:195)
 ... 57 more
  
 Is this problem related to 
 https://issues.apache.org/jira/browse/CASSANDRA-3551 (which should have been 
 fixed with Cassandra 1.0.6) or could there be anything we do wrong here?
  
 Thanks in advance for any kind help!
 Chris
 
 



Re: tokens and RF for multiple phases of deployment

2012-05-31 Thread aaron morton
 The ring (2 in DC1, 1 in DC2) looks OK, but the load on the new node in DC2 
 is almost 0%.
yeah, thats the way it will look. 

 But all the other rows are not in the new node. Do I need to copy the data 
 files from a node in DC1 to the new node?

How did you add the node ? (see 
http://www.datastax.com/docs/1.0/operations/cluster_management#adding-nodes-to-a-cluster)

if in doubt run nodetool repair on the new node. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/06/2012, at 3:46 AM, Chong Zhang wrote:

 Thanks Aaron.
 
 I might use LOCAL_QUORUM to avoid the waiting on the ack from DC2.
 
 Another question, after I setup a new node with token +1 in a new DC,  and 
 updated a CF with RF {DC1:2, DC2:1}. When i update a column on one node in 
 DC1, it's also updated in the new node in DC2. But all the other rows are not 
 in the new node. Do I need to copy the data files from a node in DC1 to the 
 new node?
 
 The ring (2 in DC1, 1 in DC2) looks OK, but the load on the new node in DC2 
 is almost 0%.
 
 Address DC  RackStatus State   LoadOwns   
  Token   
   
  85070591730234615865843651857942052864  
 10.10.10.1DC1 RAC1Up Normal  313.99 MB   50.00%  
 0   
 10.10.10.3DC2 RAC1Up Normal  7.07 MB   0.00%  
  1   
 10.10.10.2DC1 RAC1Up Normal  288.91 MB   50.00%  
 85070591730234615865843651857942052864  
 
 Thanks,
 Chong
 
 On Thu, May 31, 2012 at 5:48 AM, aaron morton aa...@thelastpickle.com wrote:
 
 Could you provide some guide on how to assign the tokens in this growing 
 deployment phases? 
 
 background 
 http://www.datastax.com/docs/1.0/install/cluster_init#calculating-tokens-for-a-multi-data-center-cluster
 
 Start with tokens for a 4 node cluster. Add the next 4 between between each 
 of the ranges. Add 8 in the new DC to have the same tokens as the first DC +1
 
 Also if we use the same RF (3) in both DC, and use EACH_QUORUM for write and 
 LOCAL_QUORUM for read, can the read also reach to the 2nd cluster?
 No. It will fail if there are not enough nodes available in the first DC. 
 
 We'd like to keep both write and read on the same cluster.
 Writes go to all replicas. Using EACH_QUORUM means the client in the first DC 
 will be waiting for the quorum from the second DC to ack the write. 
 
 
 Cheers
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 31/05/2012, at 3:20 AM, Chong Zhang wrote:
 
 Hi all,
 
 We are planning to deploy a small cluster with 4 nodes in one DC first, and 
 will expend that to 8 nodes, then add another DC with 8 nodes for fail over 
 (not active-active), so all the traffic will go to the 1st cluster, and 
 switch to 2nd cluster if the whole 1st cluster is down or on maintenance. 
 
 Could you provide some guide on how to assign the tokens in this growing 
 deployment phases? I looked at some docs but not very clear on how to assign 
 tokens on the fail-over case.
 Also if we use the same RF (3) in both DC, and use EACH_QUORUM for write and 
 LOCAL_QUORUM for read, can the read also reach to the 2nd cluster? We'd like 
 to keep both write and read on the same cluster.
 
 Thanks in advance,
 Chong
 
 



Re: newbie question :got error 'org.apache.thrift.transport.TTransportException'

2012-05-31 Thread aaron morton
Sounds like 
https://issues.apache.org/jira/browse/CASSANDRA-4219?attachmentOrder=desc

Drop back to 1.0.10 and have a play. 

Good luck. 


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/06/2012, at 6:38 AM, Chen, Simon wrote:

 Hi,
   I am new to Cassandra.
  I have started a Cassandra instance (Cassandra.bat), played with it for a 
 while, created a keyspace Zodiac.
 When I kill Cassandra instance and restarted, the keyspace is gone but when I 
 tried to recreate it,
 I got 'org.apache.thrift.transport.TTransportException’ error. What have I 
 done wrong here?
  
 Following are screen shots:
  
 C:\cassandra-1.1.0bin\cassandra-cli -host localhost -f 
 C:\NoSqlProjects\dropZ.txt
 Starting Cassandra Client
 Connected to: ssc2Cluster on localhost/9160
 Line 1 = Keyspace 'Zodiac' not found.
  
 C:\cassandra-1.1.0bin\cassandra-cli -host localhost -f 
 C:\NoSqlProjects\usageDB.txt
 Starting Cassandra Client
 Connected to: ssc2Cluster on localhost/9160
 Line 1 = org.apache.thrift.transport.TTransportException
  
 Following is part of server error message:
  
 INFO 11:09:56,761 Node localhost/127.0.0.1 state jump to normal
 INFO 11:09:56,761 Bootstrap/Replace/Move completed! Now serving reads.
 INFO 11:09:56,761 Will not load MX4J, mx4j-tools.jar is not in the classpath
 INFO 11:09:56,781 Binding thrift service to localhost/127.0.0.1:9160
 INFO 11:09:56,781 Using TFastFramedTransport with a max frame size of 
 15728640 bytes.
 INFO 11:09:56,791 Using synchronous/threadpool thrift server on 
 localhost/127.0.0.1 : 9160
 INFO 11:09:56,791 Listening for thrift clients...
 INFO 11:20:06,044 Enqueuing flush of 
 Memtable-schema_keyspaces@1062244145(184/230 serialized/live bytes, 4 ops)
 INFO 11:20:06,054 Writing Memtable-schema_keyspaces@1062244145(184/230 
 serialized/live bytes, 4 ops)
 INFO 11:20:06,074 Completed flushing 
 c:\cassandra_data\data\system\schema_keyspaces\system-schema_keyspaces-hc-62-Data.
 b (240 bytes)
 RROR 11:20:06,134 Exception in thread Thread[MigrationStage:1,5,main]
 ava.lang.AssertionError
at org.apache.cassandra.db.DefsTable.updateKeyspace(DefsTable.java:441)
at org.apache.cassandra.db.DefsTable.mergeKeyspaces(DefsTable.java:339)
at org.apache.cassandra.db.DefsTable.mergeSchema(DefsTable.java:269)
at 
 org.apache.cassandra.service.MigrationManager$1.call(MigrationManager.java:214)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
 Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
 RROR 11:20:06,134 Error occurred during processing of message.
 ava.lang.RuntimeException: java.util.concurrent.ExecutionException: 
 java.lang.AssertionError
  
 usageDB.txt:
  
 create keyspace Zodiac
 with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
 and strategy_options = {replication_factor:1};
  
 use Zodiac;
  
 create column family ServiceUsage
 with comparator = UTF8Type
 and default_validation_class = UTF8Type
 and key_validation_class = LongType
 AND column_metadata = [
   {column_name: 'TASK_ID', validation_class:  IntegerType}
   {column_name: 'USAGE_COUNT', validation_class:  IntegerType}
   {column_name: 'USAGE_TYPE', validation_class: UTF8Type}
];
  
  
  
  
 From: Chong Zhang [mailto:chongz.zh...@gmail.com] 
 Sent: Thursday, May 31, 2012 8:47 AM
 To: user@cassandra.apache.org
 Subject: Re: tokens and RF for multiple phases of deployment
  
 Thanks Aaron.
  
 I might use LOCAL_QUORUM to avoid the waiting on the ack from DC2.
  
 Another question, after I setup a new node with token +1 in a new DC,  and 
 updated a CF with RF {DC1:2, DC2:1}. When i update a column on one node in 
 DC1, it's also updated in the new node in DC2. But all the other rows are not 
 in the new node. Do I need to copy the data files from a node in DC1 to the 
 new node?
  
 The ring (2 in DC1, 1 in DC2) looks OK, but the load on the new node in DC2 
 is almost 0%.
  
 Address DC  RackStatus State   LoadOwns   
  Token   
   
  85070591730234615865843651857942052864  
 10.10.10.1DC1 RAC1Up Normal  313.99 MB   50.00%  
 0   
 10.10.10.3DC2 RAC1Up Normal  7.07 MB   0.00%  
  1   
 10.10.10.2DC1 RAC1Up Normal  288.91 MB   50.00%  
 85070591730234615865843651857942052864  
  
 Thanks,
 Chong
  
 On Thu, May 31, 2012 at 5:48 AM, aaron morton aa...@thelastpickle.com wrote:
  
 Could you provide some guide on how to assign 

Re: 1.1 not removing commit log files?

2012-05-31 Thread aaron morton
Could be this 
https://issues.apache.org/jira/browse/CASSANDRA-4201

But that talks about segments not being cleared at startup. Does not explain 
why they were allowed to get past the limit in the first place. 

Can you share some logs from the time the commit log got out of control ? 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/06/2012, at 9:34 AM, Bryce Godfrey wrote:

 So this happened to me again, but it was only when the cluster had a node 
 down for a while.  Then the commit logs started piling up past the limit I 
 set in the config file, and filled the drive. 
 After the node recovered and hints had replayed the space was never 
 reclaimed.  A flush or drain did not reclaim the space either and delete any 
 log files.
  
 Bryce Godfrey | Sr. Software Engineer | Azaleos Corporation
  
 From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] 
 Sent: Tuesday, May 22, 2012 1:10 PM
 To: user@cassandra.apache.org
 Subject: RE: 1.1 not removing commit log files?
  
 The nodes appear to be holding steady at the 8G that I set it to in the 
 config file now.  I’ll keep an eye on them.
  
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Tuesday, May 22, 2012 4:08 AM
 To: user@cassandra.apache.org
 Subject: Re: 1.1 not removing commit log files?
  
 4096 is also the internal hard coded default for commitlog_total_space_in_mb
  
 If you are seeing more that 4GB of commit log files let us know. 
  
 Cheers
  
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
  
 On 22/05/2012, at 6:35 AM, Bryce Godfrey wrote:
  
 
 Thanks, I'll give it a try.
 
 -Original Message-
 From: Alain RODRIGUEZ [mailto:arodr...@gmail.com] 
 Sent: Monday, May 21, 2012 2:12 AM
 To: user@cassandra.apache.org
 Subject: Re: 1.1 not removing commit log files?
 
 commitlog_total_space_in_mb: 4096
 
 By default this line is commented in 1.0.x if I remember well. I guess it is 
 the same in 1.1. You really should remove this comment or your commit logs 
 will entirely fill up your disk as it happened to me a while ago.
 
 Alain
 
 2012/5/21 Pieter Callewaert pieter.callewa...@be-mobile.be:
 
 Hi,
  
  
  
 In 1.1 the commitlog files are pre-allocated with files of 128MB.
 (https://issues.apache.org/jira/browse/CASSANDRA-3411) This should
 however not exceed your commitlog size in Cassandra.yaml.
  
  
  
 commitlog_total_space_in_mb: 4096
  
  
  
 Kind regards,
  
 Pieter Callewaert
  
  
  
 From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
 Sent: maandag 21 mei 2012 9:52
 To: user@cassandra.apache.org
 Subject: 1.1 not removing commit log files?
  
  
  
 The commit log drives on my nodes keep slowly filling up.  I don't see
 any errors in my logs that are indicating any issues that I can map to
 this issue.
  
  
  
 Is this how 1.1 is supposed to work now?  Previous versions seemed to
 keep this drive at a minimum as it flushed.
  
  
  
 /dev/mapper/mpathf 25G   21G  4.2G  83% /opt/cassandra/commitlog
  
  



RE: Cassandra Data Archiving

2012-05-31 Thread Harshvardhan Ojha
Problem statement:
We are keeping daily generated data(user generated content)  in Cassandra, but 
our application is using only 15 days old data. So how can we archive data 
older than 15 days so that we can reduce load on Cassandra ring.

Note : we can't apply TTL, as this data may be needed in future.


From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Friday, June 01, 2012 6:57 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra Data Archiving

I'm not sure on your needs, but the simplest thing to consider is snapshotting 
and copying off node.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/06/2012, at 12:23 AM, Shubham Srivastava wrote:


I need to archive my Cassandra data into another  permanent storage .

Two intent

1.To shed the unused data from the Live data.

2.To use the archived data for getting some analytics out or a potential source 
of DataWarehouse.

Any recommendations for the same in terms of strategies or tools to use.

Regards,
Shubham Srivastava | Technical Lead - Technology Development

+91 124 4910 548   |  MakeMyTrip.comhttp://MakeMyTrip.com, 243 SP Infocity, 
Udyog Vihar Phase 1, Gurgaon, Haryana - 122 016, India


image001.gifWhat's new? My Trip Rewards - An exclusive loyalty program for 
MakeMyTrip customers.https://rewards.makemytrip.com/MTR

image002.gifhttp://www.makemytrip.com/

image003.gifhttp://www.makemytrip.com/support/gurgaon-travel-agent-office.php
Office Map

image004.gifhttp://www.facebook.com/pages/MakeMyTrip-Deals/120740541030?ref=searchsid=10077980239.1422657277..1
Facebook

image005.gifhttp://twitter.com/makemytripdeals
Twitter






Re: Cassandra Data Archiving

2012-05-31 Thread Zhu Han
On Fri, Jun 1, 2012 at 12:28 PM, Harshvardhan Ojha 
harshvardhan.o...@makemytrip.com wrote:

  Problem statement:

 We are keeping daily generated data(user generated content)  in
 Cassandra, but our application is using only 15 days old data. So how can
 we archive data older than 15 days so that we can reduce load on
 Cassandra ring.


Can you put the new data to a different column family?


 

 ** **

 Note : we can’t apply TTL, as this data may be needed in future.

 ** **

 ** **

 *From:* aaron morton [mailto:aa...@thelastpickle.com]
 *Sent:* Friday, June 01, 2012 6:57 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Cassandra Data Archiving

 ** **

 I'm not sure on your needs, but the simplest thing to consider is
 snapshotting and copying off node. 

 ** **

 Cheers

 ** **

 -

 Aaron Morton

 Freelance Developer

 @aaronmorton

 http://www.thelastpickle.com

 ** **

 On 1/06/2012, at 12:23 AM, Shubham Srivastava wrote:



 

 I need to archive my Cassandra data into another  permanent storage .

  

 Two intent

  

 1.To shed the unused data from the Live data.

  

 2.To use the archived data for getting some analytics out or a potential
 source of DataWarehouse.

  

 Any recommendations for the same in terms of strategies or tools to use.**
 **

  

 Regards,

 *Shubham Srivastava* *|* Technical Lead - Technology Development

 +91 124 4910 548   |  MakeMyTrip.com, 243 SP Infocity, Udyog Vihar Phase
 1, Gurgaon, Haryana - 122 016, India

 image001.gif*What's new?* My Trip Rewards - An exclusive loyalty
 program for MakeMyTrip customers. https://rewards.makemytrip.com/MTR

 image002.gif http://www.makemytrip.com/

 image003.gifhttp://www.makemytrip.com/support/gurgaon-travel-agent-office.php
 *Office Map*

 image004.gifhttp://www.facebook.com/pages/MakeMyTrip-Deals/120740541030?ref=searchsid=10077980239.1422657277..1
 *Facebook*

 image005.gif http://twitter.com/makemytripdeals
 *Twitter*

  

 ** **



Re: Cassandra Data Archiving

2012-05-31 Thread samal
I believe you are talking about HDD space, consumed by user generated
data which is no longer required after 15 days or may required.
First case to use TTL which you don't wan to use. 2nd as aaron pointed
snapshotting data, but data still exist in cluster, only used for back up.

I think of like using column family bucket, 15 day a bucket , 2 bucket a
month.

Creating new cf every 15th day with time-stamp marker trip_offer_cf_[ts
-ts%(86400*15)], caching cf name in app for 15 days, after 15th day old cf
bucket will be read only, no write goes into it, snapshotting that
old_cf_bucket _data, and deleting that cf few days later, this will keep cf
count fixed.

current cf count=n,
bucket cf count= b*n

using separate cluster old data analytic.

/Samal

On Fri, Jun 1, 2012 at 9:58 AM, Harshvardhan Ojha 
harshvardhan.o...@makemytrip.com wrote:

  Problem statement:

 We are keeping daily generated data(user generated content)  in
 Cassandra, but our application is using only 15 days old data. So how can
 we archive data older than 15 days so that we can reduce load on
 Cassandra ring.

 ** **

 Note : we can’t apply TTL, as this data may be needed in future.

 ** **

 ** **

 *From:* aaron morton [mailto:aa...@thelastpickle.com]
 *Sent:* Friday, June 01, 2012 6:57 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Cassandra Data Archiving

 ** **

 I'm not sure on your needs, but the simplest thing to consider is
 snapshotting and copying off node. 

 ** **

 Cheers

 ** **

 -

 Aaron Morton

 Freelance Developer

 @aaronmorton

 http://www.thelastpickle.com

 ** **

 On 1/06/2012, at 12:23 AM, Shubham Srivastava wrote:



 

 I need to archive my Cassandra data into another  permanent storage .

  

 Two intent

  

 1.To shed the unused data from the Live data.

  

 2.To use the archived data for getting some analytics out or a potential
 source of DataWarehouse.

  

 Any recommendations for the same in terms of strategies or tools to use.**
 **

  

 Regards,

 *Shubham Srivastava* *|* Technical Lead - Technology Development

 +91 124 4910 548   |  MakeMyTrip.com, 243 SP Infocity, Udyog Vihar Phase
 1, Gurgaon, Haryana - 122 016, India

 image001.gif*What's new?* My Trip Rewards - An exclusive loyalty
 program for MakeMyTrip customers. https://rewards.makemytrip.com/MTR

 image002.gif http://www.makemytrip.com/

 image003.gifhttp://www.makemytrip.com/support/gurgaon-travel-agent-office.php
 *Office Map*

 image004.gifhttp://www.facebook.com/pages/MakeMyTrip-Deals/120740541030?ref=searchsid=10077980239.1422657277..1
 *Facebook*

 image005.gif http://twitter.com/makemytripdeals
 *Twitter*

  

 ** **