RE: Unbalanced ring with C* 2.0.3 and vnodes after adding additional nodes

2013-12-20 Thread Andreas Finke
Hi Aaron,

I assume you mean seed_provider setting in cassandra.yaml by seed list. The 
current setting for vm1-vm6 is:

seed_provider = vm1,vm2,vm3,vm4

This setting also applied when the vm5 and vm6 were added. I checked the read 
repair metrics and it is about mean 20/s on vm5 and vm6. 

I tried to investigate the real distribution of tokens again and did on vm1:

1. nodetool describering marketdata  /tmp/ring.txt
2. for node in vm1 vm2 vm3 vm4 vm5 vm6 ; do cat /tmp/ring.txt |grep 
ip_of($node) | wc -l; done

This prints the number of times when a node was listed as endpoint:

vm1: 303
vm2: 312
vm3: 332
vm4: 311
vm5: 901
vm6: 913

So this shows that we are really unbalanced. 

1. Is there any way how we can fix that on a running production cluster?
2. Our backup plan is to snapshot all data, raise a complete fresh 6 node 
cluster and stream the data using sstable loader. Are there any objections 
about that plan from your point of view?

Thanks in advance!

Andi

From: Aaron Morton [aa...@thelastpickle.com]
Sent: Wednesday, December 18, 2013 3:14 AM
To: Cassandra User
Subject: Re: Unbalanced ring with C* 2.0.3 and vnodes after adding additional 
nodes

 Node: 4 CPU, 6 GB RAM, virtual appliance

 Cassandra: 3 GB Heap, vnodes 256
FWIW that’s a very low powered node.

 Maybe we forgot necessary actions during or after cluster expanding process. 
 We are open for every idea.
Where the nodes in the seed list when they joined the cluster? If so they do 
not bootstrap.

The extra writes in nodes 5 and 6 could be from Read Repair writing to them.

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 12/12/2013, at 11:49 pm, Andreas Finke andreas.fi...@solvians.com wrote:

 Hi,

 after adding 2 more nodes to a 4 nodes cluster (before) we are experiencing 
 high load on both new nodes. After doing some investigation we found out the 
 following:

 - High cpu load on vm5+6
 - Higher data load on vm5+6
 - Write requests are evenly distributed to all 6 nodes by our client 
 application (opscenter - metrics - WriteRequests)
 - Local writes are as twice as much in vm5 +6 (vm1-4: ~2800/s, vm5-6: ~6800/s)
 - Nodetool output:

 UN  vm1  9.51 GB256 20,7%  13fa7bb7-19cb-44f5-af83-71a72e04993a  X1

 UN  vm2  9.41 GB256 20,0%  b71c2d3d-4721-4dde-a418-802f1af4b7a1  D1

 UN  vm3  9.37 GB256 18,9%  8ce4c419-d79c-4ef1-b3fd-8936bff3e44f  X1


 UN  vm4  9.23 GB256 19,5%  17974f20-5756-4eba-a377-52feed3a1b10  D1

 UN  vm5  15.95 GB   256 10,7%  0c6db9ea-4c60-43f6-a12e-51a7d76f8e80  X1

 UN  vm6  14.86 GB   256 10,2%  f64d1909-dd96-442b-b602-efee29eee0a0  D1



 Although the ownership is lower on vm5-6 (which already is not right) the 
 data load is way higher.



 Some cluster facts:



 Node: 4 CPU, 6 GB RAM, virtual appliance

 Cassandra: 3 GB Heap, vnodes 256

 Schema: Replication strategy network, RF:2



 Has anyone an idea what could be the cause for the unbalancing. Maybe we 
 forgot necessary actions during or after cluster expanding process. We are 
 open for every idea.



 Regards

 Andi




Data File Mechanism

2013-12-20 Thread Bonnet Jonathan .
Hello,

  If possible, i need to know if it's possible to chose how a file *.db can
grow, it's limit and the general mechanism about the data files. 

   There is only one file *.db by column familly for one node (except
index-filter-statistics-summary) ?, or when the CF grow cassandra add it one
more ?

Regards,

Bonnet Jonathan.   



RE: Data File Mechanism

2013-12-20 Thread Andreas Finke
Hi Bonnet,

regarding

http://www.datastax.com/documentation/cql/3.1/webhelp/index.html#cql/cql_reference/cql_storage_options_c.html#concept_ds_xnr_4mw_xj__moreCompaction

there is the setting

sstable_size_in_mb // The target size for SSTables that use the leveled 
compaction strategy. 

for LeveledCompactionStrategy

Regards
Andi

From: Bonnet Jonathan. [jonathan.bon...@externe.bnpparibas.com]
Sent: Friday, December 20, 2013 11:07 AM
To: user@cassandra.apache.org
Subject: Data File Mechanism

Hello,

  If possible, i need to know if it's possible to chose how a file *.db can
grow, it's limit and the general mechanism about the data files.

   There is only one file *.db by column familly for one node (except
index-filter-statistics-summary) ?, or when the CF grow cassandra add it one
more ?

Regards,

Bonnet Jonathan.



Re: MUTATION messages dropped

2013-12-20 Thread Ken Hancock
I ended up changing memtable_flush_queue_size to be large enough to contain
the biggest flood I saw.

I monitored tpstats over time using a collection script and an analysis
script that I wrote to figure out what my largest peaks were.  In my case,
all my mutation drops correlated with hitting the maximum
memtable_flush_queue_size and then mutations drops stopped as soon as the
queue size dropped below the max.

I threw the scripts up on github in case they're useful...

https://github.com/hancockks/tpstats




On Fri, Dec 20, 2013 at 1:08 AM, Alexander Shutyaev shuty...@gmail.comwrote:

 Thanks for you answers.

 *srmore*,

 We are using v2.0.0. As for GC I guess it does not correlate in our case,
 because we had cassandra running 9 days under production load and no
 dropped messages and I guess that during this time there were a lot of GCs.

 *Ken*,

 I've checked the values you indicated. Here they are:

 node1 6498
 node2 6476
 node3 6642

 I guess this is not good :) What can we do to fix this problem?


 2013/12/19 Ken Hancock ken.hanc...@schange.com

 We had issues where the number of CF families that were being flushed
 would align and then block writes for a very brief period. If that happened
 when a bunch of writes came in, we'd see a spike in Mutation drops.

 Check nodetool tpstats for FlushWriter all time blocked.


 On Thu, Dec 19, 2013 at 7:12 AM, Alexander Shutyaev 
 shuty...@gmail.comwrote:

 Hi all!

 We've had a problem with cassandra recently. We had 2 one-minute periods
 when we got a lot of timeouts on the client side (the only timeouts during
 9 days we are using cassandra in production). In the logs we've found
 corresponding messages saying something about MUTATION messages dropped.

 Now, the official faq [1] says that this is an indicator that the load
 is too high. We've checked our monitoring and found out that 1-minute
 average cpu load had a local peak at the time of the problem, but it was
 like 0.8 against 0.2 usual which I guess is nothing for a 2 core virtual
 machine. We've also checked java threads - there was no peak there and
 their count was reasonable ~240-250.

 Can anyone give us a hint - what should we monitor to see this high
 load and what should we tune to make it acceptable?

 Thanks in advance,
 Alexander

 [1] http://wiki.apache.org/cassandra/FAQ#dropped_messages




 --
  *Ken Hancock *| System Architect, Advanced Advertising
 SeaChange International
 50 Nagog Park
 Acton, Massachusetts 01720
 ken.hanc...@schange.com | www.schange.com | 
 NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

 Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
 LinkedIn] http://www.linkedin.com/in/kenhancock

 [image: SeaChange International]
  http://www.schange.com/This e-mail and any attachments may contain
 information which is SeaChange International confidential. The information
 enclosed is intended only for the addressees herein and may not be copied
 or forwarded without permission from SeaChange International.





-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.


[RELEASE] Apache Cassandra 1.2.13 released

2013-12-20 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.2.13.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a maintenance/bug fix release[1] on the 1.2 series. As
always,
please pay attention to the release notes[2] and Let us know[3] if you were
to
encounter any problem.

Enjoy!

[1]: http://goo.gl/toiZIY (CHANGES.txt)
[2]: http://goo.gl/IE0xk4 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Multi-Column Slice Query w/ Partial Component of Composite Key

2013-12-20 Thread Josh Dzielak
Is there a way to include *multiple* column names in a slice query where one 
only component of the composite column name key needs to match?  

For example, if this was a single row -

username:0   |   username:1   |  city:0   |  city:1 |   other:0|   
other:1
---
bob  |   ted  |  sf   |  nyc|   foo|   bar

I can do a slice with username:0 and city:1 or any fully identified column 
names. I also can do a range query w/ first component equal to username, and 
set the bounds for the second component of the key to +/- infinity (or \u0 
to \u for utf8), and get all columns back that start with username.

But what if I want to get all usernames and all cities? Without composite keys 
this would be easy - just slice on a collection of column names - [username, 
city]. With composite column names it would have to look something like 
[username:*, city:*], where * represents a wildcard or a range.

My questions –

1) Is this supported in the Thrift interface or CQL?
2) If not, is there clever data modeling or indexing that could accomplish this 
use case? 1 single-row round-trip to get these columns?
3) Is there plans to support this in the future? Generally, what is the future 
of composite columns in a CQL world?

Thanks!
Josh



Astyanax - multiple key search with pagination

2013-12-20 Thread Parag Patel
Hi,

I'm using Astyanax and trying to do search for multiple keys with pagination.  
I tried .getKeySlice with a list a of primary keys, but it doesn't allow 
pagination.  Does anyone know how to tackle this issue with Astyanax?

Parag