Problems with adding datacenter and schema version disagreement

2014-03-11 Thread olek.stas...@gmail.com
Hi All, I've faced an issue with cassandra 2.0.5. I've 6 node cluster with random partitioner, still using tokens instead of vnodes. Cause we're changing hardware we decide to migrate cluster to 6 new machines and change partitioning options to vnode rather then token-based. I've followed

Re: Problems with adding datacenter and schema version disagreement

2014-03-11 Thread Duncan Sands
Hi Aleksander, this may be related to CASSANDRA-6799 and CASSANDRA-6700 (if it is caused by CASSANDRA-6700 then you are in luck: it is fixed in 2.0.6). Best wishes, Duncan. On 11/03/14 13:30, olek.stas...@gmail.com wrote: Hi All, I've faced an issue with cassandra 2.0.5. I've 6 node cluster

Re: need help with Cassandra 1.2 Full GCing -- output of jmap histogram

2014-03-11 Thread Takenori Sato
In addition to the suggestions by Jonathan, you can run a user defined compaction against a particular set of SSTable files, where you want to remove tombstones. But to do that, you need to find such an optimal set. Here you can find a couple of helpful tools.

Re: Problems with adding datacenter and schema version disagreement

2014-03-11 Thread olek.stas...@gmail.com
I plan to install 2.0.6 as soon as it will be available in datastax rpm repo. But how to deal with schema inconsistency on such scale? best regards Aleksander 2014-03-11 13:40 GMT+01:00 Duncan Sands duncan.sa...@gmail.com: Hi Aleksander, this may be related to CASSANDRA-6799 and CASSANDRA-6700

Re: Problems with adding datacenter and schema version disagreement

2014-03-11 Thread Duncan Sands
On 11/03/14 14:00, olek.stas...@gmail.com wrote: I plan to install 2.0.6 as soon as it will be available in datastax rpm repo. But how to deal with schema inconsistency on such scale? Does it get better if you restart all the nodes? In my case restarting just some of the nodes didn't help,

Re: Problems with adding datacenter and schema version disagreement

2014-03-11 Thread olek.stas...@gmail.com
Didn't help :) thanks and regards Aleksander 2014-03-11 14:14 GMT+01:00 Duncan Sands duncan.sa...@gmail.com: On 11/03/14 14:00, olek.stas...@gmail.com wrote: I plan to install 2.0.6 as soon as it will be available in datastax rpm repo. But how to deal with schema inconsistency on such scale?

Re: Cassandra DSC 2.0.5 not starting - * could not access pidfile for Cassandra

2014-03-11 Thread Ken Hancock
I was always kind of annoyed that the datastax RPMs made me force-install other java distros to satisfy RPM dependencies. Then I did some research and while I'm still annoyed, at least I'm sympathetic. See http://www.rudder-project.org/redmine/issues/2941 for the mess that has been created

How expensive are additional keyspaces?

2014-03-11 Thread Martin Meyer
Hey all - My company is working on introducing a configuration service system to provide cofig data to several of our applications, to be backed by Cassandra. We're already using Cassandra for other services, and at the moment our pending design just puts all the new tables (9 of them, I believe)

Re: How to add a new DC to cluster in Cassandra 2.x

2014-03-11 Thread Cyril Scetbon
Can someone from Datastax confirm this point ? If it's true, is it the same for a decommission ? I mean, if we decommission a node with old data (in case it has been down for more than max_hint_window_in_ms and not repaired), will we finally have a situation where old data has been spread and

Re: How expensive are additional keyspaces?

2014-03-11 Thread Edward Capriolo
The biggest expense of them is that you need to be authenticated to a keyspace to perform and operation. Thus connection pools are bound to keyspaces. Switching a keyspace is an RPC operation. In the thrift client, If you have 100 keyspaces you need 100 connection pools that starts to be a pain

Re: How expensive are additional keyspaces?

2014-03-11 Thread Keith Wright
Does this whole true for the native protocol? I’ve noticed that you can create a session object in the datastax driver without specifying a keyspace and so long as you include the keyspace in all queries instead of just table name, it works fine. In that case, I assume there’s only one

Re: How expensive are additional keyspaces?

2014-03-11 Thread Jeremiah D Jordan
The use of more than one keyspace is not uncommon. Using 100's of them is. That being said, different keyspaces let you specify different replication and different authentication. If you are not going to be doing one of those things, then there really is no point to multiple keyspaces. If

Re: How expensive are additional keyspaces?

2014-03-11 Thread Jeremiah D Jordan
Also, in terms of overhead, server side the overhead is pretty much all at the Column Family (CF)/Table level, so 100 keyspaces, 1 CF each, is the same as 1 keyspace, 100 CF's. -Jeremiah On Mar 11, 2014, at 10:36 AM, Jeremiah D Jordan jeremiah.jor...@gmail.com wrote: The use of more than

Re: DSE Hadoop support for provisioning hardware

2014-03-11 Thread Jonathan Lacefield
Hello, Not sure this question is appropriate for the Open Source C* users group. If you would like, please email me directly to discuss DataStax specific items. Thanks, Jonathan jlacefi...@datastax.om Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487

Re: DSE Hadoop support for provisioning hardware

2014-03-11 Thread Jeremiah D Jordan
Ariel, DSE lets you specify an Analytics virtual data center. You can then replicate your keyspaces over to that data center, and run your Analytics jobs against it, and as long as they are using the LOCAL_ consistency levels, they won't be hitting your real time nodes, and vice versa. So the

Re: Cassandra DSC 2.0.5 not starting - * could not access pidfile for Cassandra

2014-03-11 Thread Michael Shuler
On 03/11/2014 08:47 AM, Ken Hancock wrote: See http://www.rudder-project.org/redmine/issues/2941 for the mess that has been created regarding java JRE dependencies. That's a good example of the cluster.. so many thanks to the Oracle legal department for disallowing redistribution.. FWIW, I

Re: Authoritative failed write: Using paxos to cancel failed quorum writes

2014-03-11 Thread Tupshin Harper
OK, cool. I can think of no such reason. -Tupshin On Mar 11, 2014 10:27 AM, Wayne Schroeder wschroe...@pinsightmedia.com wrote: I think it will work just fine. I was just asking for opinions on if there was some reason it would not work that I was not thinking of. On Mar 10, 2014, at

Re: How expensive are additional keyspaces?

2014-03-11 Thread Edward Capriolo
The mathematical overhead is one thing. I would guess if you tried some design with 10,000 keyspaces and then you ran into a bug/performance problem the first thing someone would say to you is WTF do you have that many keyspaces :) Don't let that be you. On Tue, Mar 11, 2014 at 11:38 AM,

Re: DSE Hadoop support for provisioning hardware

2014-03-11 Thread Ariel Weisberg
Hi, Thanks that answers my question. Ariel On Tue, Mar 11, 2014, at 11:48 AM, Jeremiah D Jordan wrote: Ariel, DSE lets you specify an Analytics virtual data center. You can then replicate your keyspaces over to that data center, and run your Analytics jobs against it, and as long as they

Re: How expensive are additional keyspaces?

2014-03-11 Thread Edward Capriolo
This mistake is not a thrift limitation. In 0.6.X you could switch keyspaces without calling setKeyspace(String) methods specified the keyspace in every operation. This is mirrors the StorageProxy class. In 0.7.X setKeyspace() was created and the keyspace was removed from all these thrift methods.

Re: How expensive are additional keyspaces?

2014-03-11 Thread Peter Lin
I couldn't resist responding. Having done some experiments with lots of keyspaces and purposely created lots of keyspaces versus 1 keyspace, the only good reasons I see for many keyspaces 1. each keyspaces needs a different replication factor. Even in this case, I personally can't justify having

Re: How expensive are additional keyspaces?

2014-03-11 Thread Peter Lin
if I have time this summer, I may work on that, since I like having thrift. On Tue, Mar 11, 2014 at 12:05 PM, Edward Capriolo edlinuxg...@gmail.comwrote: This mistake is not a thrift limitation. In 0.6.X you could switch keyspaces without calling setKeyspace(String) methods specified the

Re: How expensive are additional keyspaces?

2014-03-11 Thread Edward Capriolo
So in the 0.6.X days a signature of a get looked something like this: get(String keyspace, ColumnPath cp, String rowkey) Besides changes form string - ByteBuffer the keyspace was pulled out of the argument. I think the better more flexible way to do this would be: struct GetRequest { 1:

Fwd: NetworkTopologyStrategy ring distribution across 2 DC

2014-03-11 Thread Ramesh Natarajan
Hi, I have 14 cassandra nodes, running as 2 data centers using PropertyFileSnitch as follows 192.168.1.101=DC1:RAC1 192.168.1.102=DC1:RAC1 192.168.1.103=DC1:RAC1 192.168.1.104=DC1:RAC1 192.168.1.105=DC1:RAC1 192.168.1.106=DC1:RAC1 192.168.1.107=DC1:RAC1 192.168.1.108=DC2:RAC1

Re: need help with Cassandra 1.2 Full GCing -- output of jmap histogram

2014-03-11 Thread Oleg Dulin
Good news is that since I lowered gc_grace period it collected over 100Gigs of tombstones and seems much happier now. Oleg On 2014-03-10 13:33:43 +, Jonathan Lacefield said: Hello,   You have several options:   1) going forward lower gc_grace_seconds

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Shao-Chuan Wang
Hi, I just received this email from Jonathan regarding this deprecation of thrift in 2.1 in dev emailing list. In fact, we migrated from thrift client to native one several months ago; however, in the Cassandra.hadoop, there are still a lot of dependencies on thrift interface, for example

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Tyler Hobbs
On Tue, Mar 11, 2014 at 2:41 PM, Shao-Chuan Wang shaochuan.w...@bloomreach.com wrote: So, does anyone know how to do describing the splits and describing the local rings using native protocol? For a ring description, you would do something like select peer, tokens from system.peers. I'm

Regarding Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Shao-Chuan Wang
(Moving this migration related question out of dev mailing list, and changed a title a bit) Thanks Tyler, If describe_splits is not feasible through native protocol, then how do we migrate org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat to use native protocol? Do we have any plans

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Peter Lin
My bias opinion, just because some member of cassandra develop want to abandon Thrift, I see benefits of continuing to improve it. The great thing about open source is that as long as some people want to keep working on it and improve it, it can happen. I plan to do my best to keep Thrift going,

Speed of sstableloader

2014-03-11 Thread Donald Smith
I tested bulk loading in cassandra with CQLSSTableWriter and sstableloader. It turns out that writing 1 millions rows with sstableloader took over twice as long as inserting regularly with batch CQL statements from Java (cassandra-driver-core, version 2.0.0). Specifically, the call to

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
Peter, My advice. Do not bother. I have become very active recently in attempting to add features to thrift. I had 4 open tickets I was actively working on. (I even found two bugs in the Cassandra in the process). People were aware of this and still called this vote. Several commit people have

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Steven A Robenalt
Okay, I'm officially lost on this thread. If you plan on forking Cassandra to preserve and continue to enhance the Thrift interface, you would also want to add a bunch of relational features to CQL as part of that same fork? On Tue, Mar 11, 2014 at 6:20 PM, Edward Capriolo

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Steven A Robenalt
I should add that I'm not trying to ignite a flame war. Just trying to understand your intentions. On Tue, Mar 11, 2014 at 6:50 PM, Steven A Robenalt srobe...@stanford.eduwrote: Okay, I'm officially lost on this thread. If you plan on forking Cassandra to preserve and continue to enhance the

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Peter Lin
sorry for the confusion. I'm not trying to add relational features like constraints, etc. what I want to do is make writing reports easier than the ugly mess that is today. Anyone that's used the various reporting tools for hadoop knows how ugly and painful it is. Without features like exist, or,

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
one of the things I'd like to see happen is for Cassandra to support queries with disjunction, exist, subqueries, joins and like. In theory CQL could support these features in the future. Cassandra would need a new query compiler and query planner. I don't see how the current design could do these

Re: NetworkTopologyStrategy ring distribution across 2 DC

2014-03-11 Thread Tyler Hobbs
On Tue, Mar 11, 2014 at 1:37 PM, Ramesh Natarajan rames...@gmail.comwrote: Note: Ownership information does not include topology; for complete information, specify a keyspace Also the owns column is 0% for the second DC. Is this normal? Yes. Without a keyspace specified, the Owns column

Re: How to guarantee consistency between counter and materialized view?

2014-03-11 Thread Robert Coli
On Tue, Mar 11, 2014 at 4:30 PM, ziju feng pkdog...@gmail.com wrote: Is there any way to guarantee a counter's value no. =Rob

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Peter Lin
I have no problems maintain my own fork :) or joining others forking cassandra. I'd be happy to work with you or anyone else to add features to thrift. That's the great thing about open source. Each person can scratch a technical itch and do what they love. I see lots of potential for Cassandra

How to guarantee consistency between counter and materialized view?

2014-03-11 Thread ziju feng
Hi all, Is there any way to guarantee a counter's value in materialized views, which could be some other column families with different row keys and with counter's value de-normalized, in sync with the value in its counter column family? Since batch can only work as either non-counter or

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Peter Lin
I'll give you a concrete example. One of the things we often need to do is do a keyword search on unstructured text. What we did in our tooling is we combined solr with cassandra, but we put an Object API infront of it. The API is inspired by JPA, but designed specifically to fit our needs. the

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
If were to run a fork it would do one thing: Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. Cassandra brings together the distributed systems technologies from Dynamohttp://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdfand

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
Peter, Solr is deeply integrated into DSE. Seemingly this can not efficiently be done client side (CQL/Thrift whatever) but the Solandra approach was to embed Solr in Cassandra. I think that is actually the future client dev, allowing users to embedded custom server side logic into there own API.