Re: Changing compression_parameters of exsisting CF
thank you Aaron, seems it's bug on cql 2.*.* version, through cassandra-cli it's working fine. Create issue CASSANDRA-4996 27.11.2012, 11:34, aaron morton aa...@thelastpickle.com: is it expectable behaviour? or it's bug? It exhibits bug like qualities. Can you create a ticket on https://issues.apache.org/jira/browse/CASSANDRA ? Thanks - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 27/11/2012, at 2:40 AM, Шамим sre...@yandex.ru wrote: Hello users, faced very strange behaviour when chnaging compression_parameters of exisiting CF. After changing the compaction strategy, compression_strategy returning back to the SnappyCompressor. Using version 1.1.5. [cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 2.0.0 | Thrift protocol 19.32.0] I have one column family with following paramters: cqlsh describe columnfamily auditlog_01; CREATE TABLE auditlog_01 ( lid text PRIMARY KEY, dscn text, asid text, soapa text, sysn text, msgs double, leid bigint, prc text, aeid bigint, adt timestamp, name text, asn text, msg text, msgid text, msgt text ) WITH comment='' AND comparator=text AND read_repair_chance=0.10 AND gc_grace_seconds=864000 AND default_validation=text AND min_compaction_threshold=4 AND max_compaction_threshold=32 AND replicate_on_write='true' AND compaction_strategy_class='SizeTieredCompactionStrategy' AND compaction_strategy_options:sstable_size_in_mb='5' AND compression_parameters:sstable_compression='SnappyCompressor'; Changing compression strategy to 'DeflateCompressor cqlsh:p00smev_archKS ALTER TABLE auditlog_01 WITH compression_parameters:sstabl e_compression = 'DeflateCompressor' AND compression_parameters:chunk_length_kb = 64; cqlsh:p00smev_archKS describe columnfamily auditlog_01; CREATE TABLE auditlog_01 ( lid text PRIMARY KEY, dscn text, asid text, soapa text, sysn text, msgs double, leid bigint, prc text, aeid bigint, adt timestamp, name text, asn text, msg text, msgid text, msgt text ) WITH comment='' AND comparator=text AND read_repair_chance=0.10 AND gc_grace_seconds=864000 AND default_validation=text AND min_compaction_threshold=4 AND max_compaction_threshold=32 AND replicate_on_write='true' AND compaction_strategy_class='SizeTieredCompactionStrategy' AND compaction_strategy_options:sstable_size_in_mb='5' AND compression_parameters:chunk_length_kb='64' AND compression_parameters:sstable_compression='DeflateCompressor'; it's sucessfuly changed the compression strategy to 'DeflateCompressor, after that when i am trying to change the compaction strategy, compression strategy returing back to SnappyCompressor. cqlsh:p00smev_archKS alter table auditlog_01 with compaction_strategy_class='Le veledCompactionStrategy' AND compaction_strategy_options:sstable_size_in_mb=5; cqlsh:p00smev_archKS describe columnfamily auditlog_01; CREATE TABLE auditlog_01 ( lid text PRIMARY KEY, dscn text, asid text, soapa text, sysn text, msgs double, leid bigint, prc text, aeid bigint, adt timestamp, name text, asn text, msg text, msgid text, msgt text ) WITH comment='' AND comparator=text AND read_repair_chance=0.10 AND gc_grace_seconds=864000 AND default_validation=text AND min_compaction_threshold=4 AND max_compaction_threshold=32 AND replicate_on_write='true' AND compaction_strategy_class='SizeTieredCompactionStrategy' AND compaction_strategy_options:sstable_size_in_mb='5' AND compression_parameters:sstable_compression='SnappyCompressor'; is it expectable behaviour? or it's bug?
for a school project
Hello, I have a question about codd's rules and cassandra cuz im doing a school project sith my colleagues. I know that cassandra is NOSQL type of dbms, but witch of 12 codd's rules can me applied? Thank you very much! :)
Re: for a school project
Rules that apply: 2 - guaranteed access 3 - treatment of nulls (though different than an rdbms due to the inherent sparse nature of rows) 4 - online catalog (not really true until Cassandra 1.2 and CQL 3 5 - comprehensive data sub language (only if you remove the word relational) 6 - view updating 7 - high level insert update delete (again, only if you remove the relational requirement) 11 - distribution independence (very fundamental to Cassandra) While Cassandra does nominally adhere to about seven of these rules, keep in mind that rules are designed with a relational model at their core, so many are totally impossible or just irrelevant to Cassandra. -Tupshin On Nov 27, 2012 7:02 AM, davuk...@veleri.hr wrote: Hello, I have a question about codd's rules and cassandra cuz im doing a school project sith my colleagues. I know that cassandra is NOSQL type of dbms, but witch of 12 codd's rules can me applied? Thank you very much! :)
Re: Other problem in update
I am just taking a stab at this one. UUID's interact with system time and maybe your real time os is doing something funky there. The other option, which seems more likely, is that your unit tests are not cleaning up their data directory and there is some corrupt data in there. On Tue, Nov 27, 2012 at 7:40 AM, Everton Lima peitin.inu...@gmail.comwrote: People, when i try to execute my program that use EmbeddedCassandraService, with the version 1.1.2 of cassandra in OpenSuse Real Time operation system it is throwing the follow exception: [27/11/12 10:27:28,314 BRST] ERROR service.CassandraDaemon: Exception in thread Thread[MutationStage:20,5,main] java.lang.NullPointerException at org.apache.cassandra.utils.UUIDGen.decompose(UUIDGen.java:96) at org.apache.cassandra.cql.jdbc.JdbcUUID.decompose(JdbcUUID.java:55) at org.apache.cassandra.db.marshal.UUIDType.decompose(UUIDType.java:187) at org.apache.cassandra.db.RowMutation.hintFor(RowMutation.java:107) at org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:582) at org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:557) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) When I try to execute the same program in Ubuntu 12.04 the program starts without ERRORS. Someone could help me?? -- Everton Lima Aleixo Bacharel em Ciencia da Computação Universidade Federal de Goiás
How to determine compaction bottlenecks
Setup: C* 1.1.6, 6 node (Linux, 64GB RAM, 16 Core CPU, 2x512 SSD), RF=3, 1.65TB total used Background: Client app is off - no reads/writes happening. Doing some cluster maintenance requiring node repairs and upgradesstables. I've been playing around with trying to figure out what is making compactions run so slow. Watching syslogs, it seems to average 3-4MB/s. That just seems so slow for this set up and the fact there is zero external load on the cluster. As far as I can tell: 1. Not I/O bound according to iostat data 2. CPU seems to be idiling also 3. From my understanding, I am using all the correct compaction settings for this setup: Here are those below: snapshot_before_compaction: false in_memory_compaction_limit_in_mb: 256 multithreaded_compaction: true compaction_throughput_mb_per_sec: 128 compaction_preheat_key_cache: true Some other thoughts: - I have turned on DEBUG logging for the Throttle class and played with the live compaction_throughput_mb_per_sec setting. I can see it performing the throttling if I set the value low (say 4), but anything over 8 it is apparently running wide open. [Side note: Although the math for the Throttle class adds up, over all the throttling seems to be very very conservative.] - I accidently turned on DEBUG for the entire ...compaction.* package and that unintentionally created A LOT of I/O from the ParallelCompactionIterable class, and the disk/OS handled that just fine. Perhaps I just don't fully grasp what is going on or have the correct expectations. I am OK with things being slow if the hardware is working hard, but that does not seem to be the case. Anyone have some insight? Thanks
Re: Other problem in update
Unless I'm misreading the git history, the stack trace you referenced isn't from 1.1.2. In particular, the writeHintForMutation method in StorageProxy.java wasn't added to the codebase until September 9th ( https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commitdiff;h=b38ca2879cf1cbf5de17e1912772b6588eaa7de6), and wasn't part of any release until 1.2.0-beta1. -Tupshin On Tue, Nov 27, 2012 at 7:40 AM, Everton Lima peitin.inu...@gmail.comwrote: writeHintForMutation
Java high-level client
Hi, I'm aware that this has been a frequent question, but answers are still hard to find: what's an appropriate Java high-level client? I actually believe that the lack of a single maintained Java API that is packaged with Cassandra is quite an issue. The way the situation is right now, new users have to pick more or less randomly one of the available options from the Cassandra Wiki and find a suitable solution for their individual requirements through trial implementations. This can cause and lot of wasted time (and frustration). Personally, I've played with Hector before figuring out that it seems to require an outdated Thrift version. Downgrading to Thrift 0.6 is not an option for me though because I use Thrift 0.9.0 in other classes of the same project. So I've had a look at Kundera and at Easy-Cassandra. Both seem to lack a real documentation beyond the examples available in their Github repositories, right? Can more experienced users recommend either one of the two or some of the other options listed at the Cassandra Wiki? I know that this strongly depends on individual requirements, but all I need are simple requests for very basic queries. So I would like to emphasize the importance a clear documentation and a stable and well-maintained API. Any hints? Thanks! Carsten -- Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform
Re: Java high-level client
I use hector-client master, which is pretty stable right now. It uses the latest thrift, so you can use hector with thrift 0.9.0. That's assuming you don't mind using the active development branch. On Tue, Nov 27, 2012 at 10:36 AM, Carsten Schnober schno...@ids-mannheim.de wrote: Hi, I'm aware that this has been a frequent question, but answers are still hard to find: what's an appropriate Java high-level client? I actually believe that the lack of a single maintained Java API that is packaged with Cassandra is quite an issue. The way the situation is right now, new users have to pick more or less randomly one of the available options from the Cassandra Wiki and find a suitable solution for their individual requirements through trial implementations. This can cause and lot of wasted time (and frustration). Personally, I've played with Hector before figuring out that it seems to require an outdated Thrift version. Downgrading to Thrift 0.6 is not an option for me though because I use Thrift 0.9.0 in other classes of the same project. So I've had a look at Kundera and at Easy-Cassandra. Both seem to lack a real documentation beyond the examples available in their Github repositories, right? Can more experienced users recommend either one of the two or some of the other options listed at the Cassandra Wiki? I know that this strongly depends on individual requirements, but all I need are simple requests for very basic queries. So I would like to emphasize the importance a clear documentation and a stable and well-maintained API. Any hints? Thanks! Carsten -- Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform
Re: Java high-level client
I am biased of course but you can find plenty of documentation on playorm here https://github.com/deanhiller/playorm/wiki It uses astyanax because of the better node knowledge. Also feel free to post questions on stackoverflow as we heavily monitor stack overflow and are notified every hour of new posts with playorm tags. A full feature list is located here https://github.com/deanhiller/playorm#playorm-feature-list Later, Dean On 11/27/12 8:36 AM, Carsten Schnober schno...@ids-mannheim.de wrote: Hi, I'm aware that this has been a frequent question, but answers are still hard to find: what's an appropriate Java high-level client? I actually believe that the lack of a single maintained Java API that is packaged with Cassandra is quite an issue. The way the situation is right now, new users have to pick more or less randomly one of the available options from the Cassandra Wiki and find a suitable solution for their individual requirements through trial implementations. This can cause and lot of wasted time (and frustration). Personally, I've played with Hector before figuring out that it seems to require an outdated Thrift version. Downgrading to Thrift 0.6 is not an option for me though because I use Thrift 0.9.0 in other classes of the same project. So I've had a look at Kundera and at Easy-Cassandra. Both seem to lack a real documentation beyond the examples available in their Github repositories, right? Can more experienced users recommend either one of the two or some of the other options listed at the Cassandra Wiki? I know that this strongly depends on individual requirements, but all I need are simple requests for very basic queries. So I would like to emphasize the importance a clear documentation and a stable and well-maintained API. Any hints? Thanks! Carsten -- Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform
Re: Hector (was: Java high-level client)
Am 27.11.2012 16:40, schrieb Peter Lin: Hi Peter, thanks for your prompt reply! I use hector-client master, which is pretty stable right now. Please excuse my ignorance, but just to be sure I'd like to ask: does hector-client master differ from the Hector client linked from the Cassandra Wiki (http://hector-client.github.com/hector/build/html/index.html)? It uses the latest thrift, so you can use hector with thrift 0.9.0. That's assuming you don't mind using the active development branch. In fact, I would prefer a stable version, but I could live with that for the time being. However, the homepage does not speak very clearly about versions and the most recent one I could find in the Git repository is 1.0-5. It does not look like a development branch, but it's hard to tell. Regarding Thrift, I've just assumed that this is where the problem lies because I get an IncompatibleClassChangeError exception upon querying the database. According to [1], this could be due to different Thrift versions. Best, Carsten [1] https://groups.google.com/forum/?fromgroups=#!topic/hector-users/wrab7Yxms18 -- Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform
Re: Hector (was: Java high-level client)
I could be wrong, but the most recent release is against cassandra 1.0.x master tracks against cassandra 1.1.x I've contributed a few patches related to CQL3 the last few weeks and master seems stable to me. for the record, I don't work for DataStax so it's just my opinion. I needed the functionality, so I made the changes and contributed back to hector. On Tue, Nov 27, 2012 at 11:06 AM, Carsten Schnober schno...@ids-mannheim.de wrote: Am 27.11.2012 16:40, schrieb Peter Lin: Hi Peter, thanks for your prompt reply! I use hector-client master, which is pretty stable right now. Please excuse my ignorance, but just to be sure I'd like to ask: does hector-client master differ from the Hector client linked from the Cassandra Wiki (http://hector-client.github.com/hector/build/html/index.html)? It uses the latest thrift, so you can use hector with thrift 0.9.0. That's assuming you don't mind using the active development branch. In fact, I would prefer a stable version, but I could live with that for the time being. However, the homepage does not speak very clearly about versions and the most recent one I could find in the Git repository is 1.0-5. It does not look like a development branch, but it's hard to tell. Regarding Thrift, I've just assumed that this is where the problem lies because I get an IncompatibleClassChangeError exception upon querying the database. According to [1], this could be due to different Thrift versions. Best, Carsten [1] https://groups.google.com/forum/?fromgroups=#!topic/hector-users/wrab7Yxms18 -- Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform
Re: Java high-level client
Netflix has a great client https://github.com/Netflix/astyanax On 11/27/12 7:40 AM, Peter Lin wool...@gmail.com wrote: I use hector-client master, which is pretty stable right now. It uses the latest thrift, so you can use hector with thrift 0.9.0. That's assuming you don't mind using the active development branch. On Tue, Nov 27, 2012 at 10:36 AM, Carsten Schnober schno...@ids-mannheim.de wrote: Hi, I'm aware that this has been a frequent question, but answers are still hard to find: what's an appropriate Java high-level client? I actually believe that the lack of a single maintained Java API that is packaged with Cassandra is quite an issue. The way the situation is right now, new users have to pick more or less randomly one of the available options from the Cassandra Wiki and find a suitable solution for their individual requirements through trial implementations. This can cause and lot of wasted time (and frustration). Personally, I've played with Hector before figuring out that it seems to require an outdated Thrift version. Downgrading to Thrift 0.6 is not an option for me though because I use Thrift 0.9.0 in other classes of the same project. So I've had a look at Kundera and at Easy-Cassandra. Both seem to lack a real documentation beyond the examples available in their Github repositories, right? Can more experienced users recommend either one of the two or some of the other options listed at the Cassandra Wiki? I know that this strongly depends on individual requirements, but all I need are simple requests for very basic queries. So I would like to emphasize the importance a clear documentation and a stable and well-maintained API. Any hints? Thanks! Carsten -- Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook
Re: Java high-level client
So I've had a look at Kundera and at Easy-Cassandra. Both seem to lack a real documentation beyond the examples available in their Github repositories, right? Vivek Could you please share what exactly you looking for documentation and not present. I suggest you to join http://groups.google.com/group/kundera-discuss/subscribe to discuss more on this. /Vivek On Tue, Nov 27, 2012 at 9:40 PM, Michael Kjellman mkjell...@barracuda.comwrote: Netflix has a great client https://github.com/Netflix/astyanax On 11/27/12 7:40 AM, Peter Lin wool...@gmail.com wrote: I use hector-client master, which is pretty stable right now. It uses the latest thrift, so you can use hector with thrift 0.9.0. That's assuming you don't mind using the active development branch. On Tue, Nov 27, 2012 at 10:36 AM, Carsten Schnober schno...@ids-mannheim.de wrote: Hi, I'm aware that this has been a frequent question, but answers are still hard to find: what's an appropriate Java high-level client? I actually believe that the lack of a single maintained Java API that is packaged with Cassandra is quite an issue. The way the situation is right now, new users have to pick more or less randomly one of the available options from the Cassandra Wiki and find a suitable solution for their individual requirements through trial implementations. This can cause and lot of wasted time (and frustration). Personally, I've played with Hector before figuring out that it seems to require an outdated Thrift version. Downgrading to Thrift 0.6 is not an option for me though because I use Thrift 0.9.0 in other classes of the same project. So I've had a look at Kundera and at Easy-Cassandra. Both seem to lack a real documentation beyond the examples available in their Github repositories, right? Can more experienced users recommend either one of the two or some of the other options listed at the Cassandra Wiki? I know that this strongly depends on individual requirements, but all I need are simple requests for very basic queries. So I would like to emphasize the importance a clear documentation and a stable and well-maintained API. Any hints? Thanks! Carsten -- Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook
Re: for a school project
*because *which *be *with Please see: http://en.wikipedia.org/wiki/Codd's_12_rules http://wiki.apache.org/cassandra/DataModel That should get you going for your school report. If you have more specific questions about terms on the wiki please feel free to ask. On 11/27/12 4:02 AM, davuk...@veleri.hr davuk...@veleri.hr wrote: Hello, I have a question about codd's rules and cassandra cuz im doing a school project sith my colleagues. I know that cassandra is NOSQL type of dbms, but witch of 12 codd's rules can me applied? Thank you very much! :) 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook
counters + replication = awful performance?
Hi, I have a serious problem with counters performance and I can't seem to figure it out. Basically I'm building a system for accumulating some statistics on the fly via Cassandra distributed counters. For this I need counter updates to work really fast and herein lies my problem -- as soon as I enable replication_factor = 2, the performance goes down the drain. This happens in my tests using both 1.0.x and 1.1.6. Let me elaborate: I have two boxes (virtual servers on top of physical servers rented specifically for this purpose, i.e. it's not a cloud, nor it is shared; virtual servers are managed by our admins as a way to limit damage as I suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner because I want to be able to do some range queries. First, I set up Cassandra individually on each box (not in a cluster) and test counter increments performance (exclusively increments, no reads). For tests I use code that is intended to somewhat resemble the expected load pattern -- particularly the majority of increments create new counters with some updating (adding) to already existing counters. In this test each single node exhibits respectable performance - something on the order of 70k (seventy thousand) increments per second. I then join both of these nodes into single cluster (using SimpleSnitch and SimpleStrategy, nothing fancy yet). I then run the same test using replication_factor=1. The performance is on the order of 120k increments per second -- which seems to be a reasonable increase over the single node performance. HOWEVER I then rerun the same test on the two-node cluster using replication_factor=2 -- which is the least I'll need for actual production for redundancy purposes. And the performance I get is absolutely horrible -- much, MUCH worse than even single-node performance -- something on the order of less than 25k increments per second. In addition to clients not being able to push updates fast enough, I also see a lot of 'messages dropped' messages in the Cassandra log under this load. Could anyone advise what could be causing such drastic performance drop under replication_factor=2? I was expecting something on the order of single-node performance, not approximately 3x less. When testing replication_factor=2 on 1.1.6 I can see that CPU usage goes through the roof. On 1.0.x I think it looked more like disk overload, but I'm not sure (being on virtual server I apparently can't see true iostats). I do have Cassandra data on a separate disk, commit log and cache are currently on the same disk as the system. I experimented with commit log flush modes and even with disabling commit log at all -- but it doesn't seem to have noticeable impact on the performance when under replication_factor=2. Any suggestions and hints will be much appreciated :) And please let me know if I need to share additional information about the configuration I'm running on. Best regards, Sergey -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Java high-level client
Hector does not require an outdated version of thift, you are likely using an outdated version of hector. Here is the long and short of it: If the thrift thrift API changes then hector can have compatibility issues. This happens from time to time. The main methods like get() and insert() have remained the same, but the CFMetaData objects have changed. (this causes the incompatible class stuff you are seeing). CQLhas a different version of the same problem, the CQL syntax is version-ed. For example, if you try to execute a CQL3 query as a CQL2query it will likely fail. In the end your code still has to be version aware. With hector you get a compile time problem, with pure CQL you get a runtime problem. I have always had the opinion the project should have shipped hector with Cassandra, this would have made it obvious what version is likely to work. The new CQL transport client is not being shipped with Cassandra either, so you will still have to match up the versions. Although they should be largely compatible some time in the near or far future one of the clients probably wont work with one of the servers. Edward On Tue, Nov 27, 2012 at 11:10 AM, Michael Kjellman mkjell...@barracuda.comwrote: Netflix has a great client https://github.com/Netflix/astyanax On 11/27/12 7:40 AM, Peter Lin wool...@gmail.com wrote: I use hector-client master, which is pretty stable right now. It uses the latest thrift, so you can use hector with thrift 0.9.0. That's assuming you don't mind using the active development branch. On Tue, Nov 27, 2012 at 10:36 AM, Carsten Schnober schno...@ids-mannheim.de wrote: Hi, I'm aware that this has been a frequent question, but answers are still hard to find: what's an appropriate Java high-level client? I actually believe that the lack of a single maintained Java API that is packaged with Cassandra is quite an issue. The way the situation is right now, new users have to pick more or less randomly one of the available options from the Cassandra Wiki and find a suitable solution for their individual requirements through trial implementations. This can cause and lot of wasted time (and frustration). Personally, I've played with Hector before figuring out that it seems to require an outdated Thrift version. Downgrading to Thrift 0.6 is not an option for me though because I use Thrift 0.9.0 in other classes of the same project. So I've had a look at Kundera and at Easy-Cassandra. Both seem to lack a real documentation beyond the examples available in their Github repositories, right? Can more experienced users recommend either one of the two or some of the other options listed at the Cassandra Wiki? I know that this strongly depends on individual requirements, but all I need are simple requests for very basic queries. So I would like to emphasize the importance a clear documentation and a stable and well-maintained API. Any hints? Thanks! Carsten -- Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook
Re: counters + replication = awful performance?
Hi Sergey, I know I've had similar issues with counters which were bottle-necked by network throughput. You might be seeing a problem with throughput between the clients and Cass or between the two Cass nodes. It might not be your case, but that was what happened to me :-) Juan On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir solf.li...@gmail.com wrote: Hi, I have a serious problem with counters performance and I can't seem to figure it out. Basically I'm building a system for accumulating some statistics on the fly via Cassandra distributed counters. For this I need counter updates to work really fast and herein lies my problem -- as soon as I enable replication_factor = 2, the performance goes down the drain. This happens in my tests using both 1.0.x and 1.1.6. Let me elaborate: I have two boxes (virtual servers on top of physical servers rented specifically for this purpose, i.e. it's not a cloud, nor it is shared; virtual servers are managed by our admins as a way to limit damage as I suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner because I want to be able to do some range queries. First, I set up Cassandra individually on each box (not in a cluster) and test counter increments performance (exclusively increments, no reads). For tests I use code that is intended to somewhat resemble the expected load pattern -- particularly the majority of increments create new counters with some updating (adding) to already existing counters. In this test each single node exhibits respectable performance - something on the order of 70k (seventy thousand) increments per second. I then join both of these nodes into single cluster (using SimpleSnitch and SimpleStrategy, nothing fancy yet). I then run the same test using replication_factor=1. The performance is on the order of 120k increments per second -- which seems to be a reasonable increase over the single node performance. HOWEVER I then rerun the same test on the two-node cluster using replication_factor=2 -- which is the least I'll need for actual production for redundancy purposes. And the performance I get is absolutely horrible -- much, MUCH worse than even single-node performance -- something on the order of less than 25k increments per second. In addition to clients not being able to push updates fast enough, I also see a lot of 'messages dropped' messages in the Cassandra log under this load. Could anyone advise what could be causing such drastic performance drop under replication_factor=2? I was expecting something on the order of single-node performance, not approximately 3x less. When testing replication_factor=2 on 1.1.6 I can see that CPU usage goes through the roof. On 1.0.x I think it looked more like disk overload, but I'm not sure (being on virtual server I apparently can't see true iostats). I do have Cassandra data on a separate disk, commit log and cache are currently on the same disk as the system. I experimented with commit log flush modes and even with disabling commit log at all -- but it doesn't seem to have noticeable impact on the performance when under replication_factor=2. Any suggestions and hints will be much appreciated :) And please let me know if I need to share additional information about the configuration I'm running on. Best regards, Sergey -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. -- Learn More: SQI (Social Quality Index) - A Universal Measure of Social Quality
Re: counters + replication = awful performance?
Hi Juan, thanks for your input! In my case, however, I doubt this is the case -- clients are able to push many more updates than I need to saturate replication_factor=2 case (e.g. I'm doing as many as 6x more increments when testing 2-node cluster with replication_factor=1), so bandwidth between clients and server should be sufficient. Bandwidth between nodes in the cluster should also be quite sufficient since they are both in the same DC. But it is something to check, thanks! Best regards, Sergey Juan Valencia wrote Hi Sergey, I know I've had similar issues with counters which were bottle-necked by network throughput. You might be seeing a problem with throughput between the clients and Cass or between the two Cass nodes. It might not be your case, but that was what happened to me :-) Juan On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir lt; solf.lists@ gt; wrote: Hi, I have a serious problem with counters performance and I can't seem to figure it out. Basically I'm building a system for accumulating some statistics on the fly via Cassandra distributed counters. For this I need counter updates to work really fast and herein lies my problem -- as soon as I enable replication_factor = 2, the performance goes down the drain. This happens in my tests using both 1.0.x and 1.1.6. Let me elaborate: I have two boxes (virtual servers on top of physical servers rented specifically for this purpose, i.e. it's not a cloud, nor it is shared; virtual servers are managed by our admins as a way to limit damage as I suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner because I want to be able to do some range queries. First, I set up Cassandra individually on each box (not in a cluster) and test counter increments performance (exclusively increments, no reads). For tests I use code that is intended to somewhat resemble the expected load pattern -- particularly the majority of increments create new counters with some updating (adding) to already existing counters. In this test each single node exhibits respectable performance - something on the order of 70k (seventy thousand) increments per second. I then join both of these nodes into single cluster (using SimpleSnitch and SimpleStrategy, nothing fancy yet). I then run the same test using replication_factor=1. The performance is on the order of 120k increments per second -- which seems to be a reasonable increase over the single node performance. HOWEVER I then rerun the same test on the two-node cluster using replication_factor=2 -- which is the least I'll need for actual production for redundancy purposes. And the performance I get is absolutely horrible -- much, MUCH worse than even single-node performance -- something on the order of less than 25k increments per second. In addition to clients not being able to push updates fast enough, I also see a lot of 'messages dropped' messages in the Cassandra log under this load. Could anyone advise what could be causing such drastic performance drop under replication_factor=2? I was expecting something on the order of single-node performance, not approximately 3x less. When testing replication_factor=2 on 1.1.6 I can see that CPU usage goes through the roof. On 1.0.x I think it looked more like disk overload, but I'm not sure (being on virtual server I apparently can't see true iostats). I do have Cassandra data on a separate disk, commit log and cache are currently on the same disk as the system. I experimented with commit log flush modes and even with disabling commit log at all -- but it doesn't seem to have noticeable impact on the performance when under replication_factor=2. Any suggestions and hints will be much appreciated :) And please let me know if I need to share additional information about the configuration I'm running on. Best regards, Sergey -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993.html Sent from the cassandra-user@.apache mailing list archive at Nabble.com. -- Learn More: SQI (Social Quality Index) - A Universal Measure of Social Quality -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993p7583996.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: counters + replication = awful performance?
Are you writing with QUORUM consistency or ONE? On 11/27/12 9:52 AM, Sergey Olefir solf.li...@gmail.com wrote: Hi Juan, thanks for your input! In my case, however, I doubt this is the case -- clients are able to push many more updates than I need to saturate replication_factor=2 case (e.g. I'm doing as many as 6x more increments when testing 2-node cluster with replication_factor=1), so bandwidth between clients and server should be sufficient. Bandwidth between nodes in the cluster should also be quite sufficient since they are both in the same DC. But it is something to check, thanks! Best regards, Sergey Juan Valencia wrote Hi Sergey, I know I've had similar issues with counters which were bottle-necked by network throughput. You might be seeing a problem with throughput between the clients and Cass or between the two Cass nodes. It might not be your case, but that was what happened to me :-) Juan On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir lt; solf.lists@ gt; wrote: Hi, I have a serious problem with counters performance and I can't seem to figure it out. Basically I'm building a system for accumulating some statistics on the fly via Cassandra distributed counters. For this I need counter updates to work really fast and herein lies my problem -- as soon as I enable replication_factor = 2, the performance goes down the drain. This happens in my tests using both 1.0.x and 1.1.6. Let me elaborate: I have two boxes (virtual servers on top of physical servers rented specifically for this purpose, i.e. it's not a cloud, nor it is shared; virtual servers are managed by our admins as a way to limit damage as I suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner because I want to be able to do some range queries. First, I set up Cassandra individually on each box (not in a cluster) and test counter increments performance (exclusively increments, no reads). For tests I use code that is intended to somewhat resemble the expected load pattern -- particularly the majority of increments create new counters with some updating (adding) to already existing counters. In this test each single node exhibits respectable performance - something on the order of 70k (seventy thousand) increments per second. I then join both of these nodes into single cluster (using SimpleSnitch and SimpleStrategy, nothing fancy yet). I then run the same test using replication_factor=1. The performance is on the order of 120k increments per second -- which seems to be a reasonable increase over the single node performance. HOWEVER I then rerun the same test on the two-node cluster using replication_factor=2 -- which is the least I'll need for actual production for redundancy purposes. And the performance I get is absolutely horrible -- much, MUCH worse than even single-node performance -- something on the order of less than 25k increments per second. In addition to clients not being able to push updates fast enough, I also see a lot of 'messages dropped' messages in the Cassandra log under this load. Could anyone advise what could be causing such drastic performance drop under replication_factor=2? I was expecting something on the order of single-node performance, not approximately 3x less. When testing replication_factor=2 on 1.1.6 I can see that CPU usage goes through the roof. On 1.0.x I think it looked more like disk overload, but I'm not sure (being on virtual server I apparently can't see true iostats). I do have Cassandra data on a separate disk, commit log and cache are currently on the same disk as the system. I experimented with commit log flush modes and even with disabling commit log at all -- but it doesn't seem to have noticeable impact on the performance when under replication_factor=2. Any suggestions and hints will be much appreciated :) And please let me know if I need to share additional information about the configuration I'm running on. Best regards, Sergey -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counter s-replication-awful-performance-tp7583993.html Sent from the cassandra-user@.apache mailing list archive at Nabble.com. -- Learn More: SQI (Social Quality Index) - A Universal Measure of Social Quality -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters- replication-awful-performance-tp7583993p7583996.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook
Re: counters + replication = awful performance?
I'm using ONE like this (Hector): ConfigurableConsistencyLevel consistencyLevel = new ConfigurableConsistencyLevel(); consistencyLevel.setDefaultReadConsistencyLevel(HConsistencyLevel.ONE); consistencyLevel.setDefaultWriteConsistencyLevel(HConsistencyLevel.ONE); -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993p7583998.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: counters + replication = awful performance?
I'm using ONE like this (Java, Hector): ConsistencyLevel consistencyLevel = new ConfigurableConsistencyLevel(); consistencyLevel.setDefaultReadConsistencyLevel(HConsistencyLevel.ONE); consistencyLevel.setDefaultWriteConsistencyLevel(HConsistencyLevel.ONE); Michael Kjellman wrote Are you writing with QUORUM consistency or ONE? On 11/27/12 9:52 AM, Sergey Olefir lt; solf.lists@ gt; wrote: Hi Juan, thanks for your input! In my case, however, I doubt this is the case -- clients are able to push many more updates than I need to saturate replication_factor=2 case (e.g. I'm doing as many as 6x more increments when testing 2-node cluster with replication_factor=1), so bandwidth between clients and server should be sufficient. Bandwidth between nodes in the cluster should also be quite sufficient since they are both in the same DC. But it is something to check, thanks! Best regards, Sergey Juan Valencia wrote Hi Sergey, I know I've had similar issues with counters which were bottle-necked by network throughput. You might be seeing a problem with throughput between the clients and Cass or between the two Cass nodes. It might not be your case, but that was what happened to me :-) Juan On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir lt; solf.lists@ gt; wrote: Hi, I have a serious problem with counters performance and I can't seem to figure it out. Basically I'm building a system for accumulating some statistics on the fly via Cassandra distributed counters. For this I need counter updates to work really fast and herein lies my problem -- as soon as I enable replication_factor = 2, the performance goes down the drain. This happens in my tests using both 1.0.x and 1.1.6. Let me elaborate: I have two boxes (virtual servers on top of physical servers rented specifically for this purpose, i.e. it's not a cloud, nor it is shared; virtual servers are managed by our admins as a way to limit damage as I suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner because I want to be able to do some range queries. First, I set up Cassandra individually on each box (not in a cluster) and test counter increments performance (exclusively increments, no reads). For tests I use code that is intended to somewhat resemble the expected load pattern -- particularly the majority of increments create new counters with some updating (adding) to already existing counters. In this test each single node exhibits respectable performance - something on the order of 70k (seventy thousand) increments per second. I then join both of these nodes into single cluster (using SimpleSnitch and SimpleStrategy, nothing fancy yet). I then run the same test using replication_factor=1. The performance is on the order of 120k increments per second -- which seems to be a reasonable increase over the single node performance. HOWEVER I then rerun the same test on the two-node cluster using replication_factor=2 -- which is the least I'll need for actual production for redundancy purposes. And the performance I get is absolutely horrible -- much, MUCH worse than even single-node performance -- something on the order of less than 25k increments per second. In addition to clients not being able to push updates fast enough, I also see a lot of 'messages dropped' messages in the Cassandra log under this load. Could anyone advise what could be causing such drastic performance drop under replication_factor=2? I was expecting something on the order of single-node performance, not approximately 3x less. When testing replication_factor=2 on 1.1.6 I can see that CPU usage goes through the roof. On 1.0.x I think it looked more like disk overload, but I'm not sure (being on virtual server I apparently can't see true iostats). I do have Cassandra data on a separate disk, commit log and cache are currently on the same disk as the system. I experimented with commit log flush modes and even with disabling commit log at all -- but it doesn't seem to have noticeable impact on the performance when under replication_factor=2. Any suggestions and hints will be much appreciated :) And please let me know if I need to share additional information about the configuration I'm running on. Best regards, Sergey -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counter s-replication-awful-performance-tp7583993.html Sent from the cassandra-user@.apache mailing list archive at Nabble.com. -- Learn More: SQI (Social Quality Index) - A Universal Measure of Social Quality -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters- replication-awful-performance-tp7583993p7583996.html Sent from the cassandra-user@.apache mailing list archive at Nabble.com. 'Like' us on Facebook for exclusive content and other resources on all Barracuda
Re: Java high-level client
FYI, We are using Hector 1.0-5 which comes with cassandra-thrift 1.09 - libthrift 0.6.1. It can work with Cassandra 1.1.6. Totally agree it's a pain to deal with different version of libthrift. We use scribe for logging, a bit messy over there. Thanks. -Wei From: Edward Capriolo edlinuxg...@gmail.com To: user@cassandra.apache.org Sent: Tuesday, November 27, 2012 8:57 AM Subject: Re: Java high-level client Hector does not require an outdated version of thift, you are likely using an outdated version of hector. Here is the long and short of it: If the thrift thrift API changes then hector can have compatibility issues. This happens from time to time. The main methods like get() and insert() have remained the same, but the CFMetaData objects have changed. (this causes the incompatible class stuff you are seeing). CQLhas a different version of the same problem, the CQL syntax is version-ed. For example, if you try to execute a CQL3 query as a CQL2query it will likely fail. In the end your code still has to be version aware. With hector you get a compile time problem, with pure CQL you get a runtime problem. I have always had the opinion the project should have shipped hector with Cassandra, this would have made it obvious what version is likely to work. The new CQL transport client is not being shipped with Cassandra either, so you will still have to match up the versions. Although they should be largely compatible some time in the near or far future one of the clients probably wont work with one of the servers. Edward On Tue, Nov 27, 2012 at 11:10 AM, Michael Kjellman mkjell...@barracuda.com wrote: Netflix has a great client https://github.com/Netflix/astyanax On 11/27/12 7:40 AM, Peter Lin wool...@gmail.com wrote: I use hector-client master, which is pretty stable right now. It uses the latest thrift, so you can use hector with thrift 0.9.0. That's assuming you don't mind using the active development branch. On Tue, Nov 27, 2012 at 10:36 AM, Carsten Schnober schno...@ids-mannheim.de wrote: Hi, I'm aware that this has been a frequent question, but answers are still hard to find: what's an appropriate Java high-level client? I actually believe that the lack of a single maintained Java API that is packaged with Cassandra is quite an issue. The way the situation is right now, new users have to pick more or less randomly one of the available options from the Cassandra Wiki and find a suitable solution for their individual requirements through trial implementations. This can cause and lot of wasted time (and frustration). Personally, I've played with Hector before figuring out that it seems to require an outdated Thrift version. Downgrading to Thrift 0.6 is not an option for me though because I use Thrift 0.9.0 in other classes of the same project. So I've had a look at Kundera and at Easy-Cassandra. Both seem to lack a real documentation beyond the examples available in their Github repositories, right? Can more experienced users recommend either one of the two or some of the other options listed at the Cassandra Wiki? I know that this strongly depends on individual requirements, but all I need are simple requests for very basic queries. So I would like to emphasize the importance a clear documentation and a stable and well-maintained API. Any hints? Thanks! Carsten -- Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook
Re: Pagination
Do you really require page numbers? I usually find them annoying while paging through a forum, especially if it is quite active. Threads from the bottom of the page get bumped to the next page so you end up seeing the same content again. I much prefer the first page being the current N results, and the next page being the next N results after the last updated time of the last thread on the page. It is also much easier to model with Cassandra. On Tue, Nov 27, 2012 at 12:19 PM, Sam Hodgson hodgson_...@hotmail.comwrote: Hi All, Wondering if anyone has any good solutions to pagination? In particular enumerating the number of pages and linking to each page, a common feature in forums. This code is untested (using phpcassa) and may need tweaking to get the correct range of records or maybe completely wrong! however it shows the concept of taking a page number then pulling out a range of posts belonging to that page: $cf_threads looks like: thread_ID = (timestamp = post_id) if($page 1) { $ranger = ($pagenumber * 20); $low_ranger = $ranger - 20; $arr_range = $cf_threads-get($thread_id , $columns=null , $column_start= , $column_finish= , $column_reversed=True, $limit=$ranger); $arr_page = array_slice($arr_range, $low_ranger , $ranger , TRUE); }else { $arr_page = $cf_threads-get($thread_id , $columns=null , $column_start= , $column_finish= , $column_reversed=True, $limit=20); } I think this should be ok? the only concern is if there are some really long threads when im having to pull the entire CF. Another idea involved a schema change and using a super CF to include a page number as follows: Thread_ID = (PageNumber(timestamp = Post_ID)) Probably more efficient but generally page numbers go backwards ie page 1 has newest content so this would complicate things when writing data and cause load if logic was included to reorganise page numbers etc. Cheers Sam http://Newsarc.net -- Derek Williams
RE: Pagination
Well I know what you mean and i have been doing that however im currently migrating an old mysql site onto cass and just trying to keep things consistent on the front end for the guy, i thought i might be missing a trick but if not then yeah I may well ditch the page linkage if it starts causing problems. Cheers Sam Date: Tue, 27 Nov 2012 13:01:48 -0700 Subject: Re: Pagination From: de...@fyrie.net To: user@cassandra.apache.org Do you really require page numbers? I usually find them annoying while paging through a forum, especially if it is quite active. Threads from the bottom of the page get bumped to the next page so you end up seeing the same content again. I much prefer the first page being the current N results, and the next page being the next N results after the last updated time of the last thread on the page. It is also much easier to model with Cassandra. On Tue, Nov 27, 2012 at 12:19 PM, Sam Hodgson hodgson_...@hotmail.com wrote: Hi All, Wondering if anyone has any good solutions to pagination? In particular enumerating the number of pages and linking to each page, a common feature in forums. This code is untested (using phpcassa) and may need tweaking to get the correct range of records or maybe completely wrong! however it shows the concept of taking a page number then pulling out a range of posts belonging to that page: $cf_threads looks like:thread_ID = (timestamp = post_id) if($page 1){$ranger = ($pagenumber * 20);$low_ranger = $ranger - 20; $arr_range = $cf_threads-get($thread_id , $columns=null , $column_start= , $column_finish= , $column_reversed=True, $limit=$ranger); $arr_page = array_slice($arr_range, $low_ranger , $ranger , TRUE); }else{$arr_page = $cf_threads-get($thread_id , $columns=null , $column_start= , $column_finish= , $column_reversed=True, $limit=20); } I think this should be ok? the only concern is if there are some really long threads when im having to pull the entire CF. Another idea involved a schema change and using a super CF to include a page number as follows: Thread_ID = (PageNumber(timestamp = Post_ID)) Probably more efficient but generally page numbers go backwards ie page 1 has newest content so this would complicate things when writing data and cause load if logic was included to reorganise page numbers etc. Cheers Samhttp://Newsarc.net -- Derek Williams
Re: Generic questions over Cassandra 1.1/1.2
I'm not sure I always understand what people mean by schema less exactly and I'm curious. For 'schema less', given this - {{{ cqlsh use example; cqlsh:example CREATE TABLE users ( ... user_name varchar, ... password varchar, ... gender varchar, ... session_token varchar, ... state varchar, ... birth_year bigint, ... PRIMARY KEY (user_name) ... ); }}} I expect this would not cause an unknown identifier error - {{{ INSERT INTO users (user_name, password, extra, moar) VALUES ('bob', 'secret', 'a', 'b'); }}} but definitions vary. Bill On 26/11/12 09:18, Sylvain Lebresne wrote: On Mon, Nov 26, 2012 at 8:41 AM, aaron morton aa...@thelastpickle.com mailto:aa...@thelastpickle.com wrote: Is there any noticeable performance difference between thrift or CQL3? Off the top of my head it's within 5% (maybe 10%) under stress tests. See Eric's talk at the Cassandra SF conference for the exact numbers. Eric's benchmark results was that normal queries were slightly slower but prepared one (and in real life, I see no good reason not to prepare statements) were actually slightly faster. CQL 3 requires a schema, however altering the schema is easier. And in 1.2 will support concurrent schema modifications. Thrift API is still schema less. Sorry to hijack this thread, but I'd be curious (like seriously, I'm not trolling) to understand what you mean by CQL 3 requires a schema but Thrift API is still schema less. Basically I'm not sure I always understand what people mean by schema less exactly and I'm curious. -- Sylvain
Re: counters + replication = awful performance?
The difference between Replication factor =1 and replication factor 1 is significant. Also it sounds like your cluster is 2 node so going from RF=1 to RF=2 means double the load on both nodes. You may want to experiment with the very dangerous column family attribute: - replicate_on_write: Replicate every counter update from the leader to the follower replicas. Accepts the values true and false. Edward On Tue, Nov 27, 2012 at 1:02 PM, Michael Kjellman mkjell...@barracuda.comwrote: Are you writing with QUORUM consistency or ONE? On 11/27/12 9:52 AM, Sergey Olefir solf.li...@gmail.com wrote: Hi Juan, thanks for your input! In my case, however, I doubt this is the case -- clients are able to push many more updates than I need to saturate replication_factor=2 case (e.g. I'm doing as many as 6x more increments when testing 2-node cluster with replication_factor=1), so bandwidth between clients and server should be sufficient. Bandwidth between nodes in the cluster should also be quite sufficient since they are both in the same DC. But it is something to check, thanks! Best regards, Sergey Juan Valencia wrote Hi Sergey, I know I've had similar issues with counters which were bottle-necked by network throughput. You might be seeing a problem with throughput between the clients and Cass or between the two Cass nodes. It might not be your case, but that was what happened to me :-) Juan On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir lt; solf.lists@ gt; wrote: Hi, I have a serious problem with counters performance and I can't seem to figure it out. Basically I'm building a system for accumulating some statistics on the fly via Cassandra distributed counters. For this I need counter updates to work really fast and herein lies my problem -- as soon as I enable replication_factor = 2, the performance goes down the drain. This happens in my tests using both 1.0.x and 1.1.6. Let me elaborate: I have two boxes (virtual servers on top of physical servers rented specifically for this purpose, i.e. it's not a cloud, nor it is shared; virtual servers are managed by our admins as a way to limit damage as I suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner because I want to be able to do some range queries. First, I set up Cassandra individually on each box (not in a cluster) and test counter increments performance (exclusively increments, no reads). For tests I use code that is intended to somewhat resemble the expected load pattern -- particularly the majority of increments create new counters with some updating (adding) to already existing counters. In this test each single node exhibits respectable performance - something on the order of 70k (seventy thousand) increments per second. I then join both of these nodes into single cluster (using SimpleSnitch and SimpleStrategy, nothing fancy yet). I then run the same test using replication_factor=1. The performance is on the order of 120k increments per second -- which seems to be a reasonable increase over the single node performance. HOWEVER I then rerun the same test on the two-node cluster using replication_factor=2 -- which is the least I'll need for actual production for redundancy purposes. And the performance I get is absolutely horrible -- much, MUCH worse than even single-node performance -- something on the order of less than 25k increments per second. In addition to clients not being able to push updates fast enough, I also see a lot of 'messages dropped' messages in the Cassandra log under this load. Could anyone advise what could be causing such drastic performance drop under replication_factor=2? I was expecting something on the order of single-node performance, not approximately 3x less. When testing replication_factor=2 on 1.1.6 I can see that CPU usage goes through the roof. On 1.0.x I think it looked more like disk overload, but I'm not sure (being on virtual server I apparently can't see true iostats). I do have Cassandra data on a separate disk, commit log and cache are currently on the same disk as the system. I experimented with commit log flush modes and even with disabling commit log at all -- but it doesn't seem to have noticeable impact on the performance when under replication_factor=2. Any suggestions and hints will be much appreciated :) And please let me know if I need to share additional information about the configuration I'm running on. Best regards, Sergey -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counter s-replication-awful-performance-tp7583993.html Sent from the cassandra-user@.apache mailing list archive at Nabble.com. -- Learn More: SQI (Social Quality Index) - A Universal Measure of
Re: selective replication of keyspaces
You can do something like this: Divide your nodes up into 4 datacenters art1,art2,art3,core [default@unknown] create keyspace art1 placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{art1:2,core:2}]; [default@unknown] create keyspace art2 placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{art2:2,core:2}]; [default@unknown] create keyspace art3 placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{art3:2,core:2}]; [default@unknown] create keyspace core placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{core:2}]; On Tue, Nov 27, 2012 at 5:02 PM, Artist jer...@simpleartmarketing.comwrote: I have 3 art-servers each has a cassandra cluster. Each of the art-servers has config/state information stored in a Keyspaces respectively called art-server-1-current-state, art-server-2-current-state, art-server-3-current-state in my core server I have a separate Cassandra cluster. I would like to use Cassandra to replicate the current-state of each art-server on the core cassandra server without sharing that information with any of the art-servers. Is there is a way to replicate the keyspaces to a single Cassandra cluster my core without having any peer sharing between the 3 art-servers. - Artist -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/selective-replication-of-keyspaces-tp7584007.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: counters + replication = awful performance?
We're having a similar performance problem. Setting 'replicate_on_write: false' fixes the performance issue in our tests. How dangerous is it? What exactly could go wrong? On 12-11-27 01:44 PM, Edward Capriolo wrote: The difference between Replication factor =1 and replication factor 1 is significant. Also it sounds like your cluster is 2 node so going from RF=1 to RF=2 means double the load on both nodes. You may want to experiment with the very dangerous column family attribute: - replicate_on_write: Replicate every counter update from the leader to the follower replicas. Accepts the values true and false. Edward On Tue, Nov 27, 2012 at 1:02 PM, Michael Kjellman mkjell...@barracuda.com mailto:mkjell...@barracuda.com wrote: Are you writing with QUORUM consistency or ONE? On 11/27/12 9:52 AM, Sergey Olefir solf.li...@gmail.com mailto:solf.li...@gmail.com wrote: Hi Juan, thanks for your input! In my case, however, I doubt this is the case -- clients are able to push many more updates than I need to saturate replication_factor=2 case (e.g. I'm doing as many as 6x more increments when testing 2-node cluster with replication_factor=1), so bandwidth between clients and server should be sufficient. Bandwidth between nodes in the cluster should also be quite sufficient since they are both in the same DC. But it is something to check, thanks! Best regards, Sergey Juan Valencia wrote Hi Sergey, I know I've had similar issues with counters which were bottle-necked by network throughput. You might be seeing a problem with throughput between the clients and Cass or between the two Cass nodes. It might not be your case, but that was what happened to me :-) Juan On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir lt; solf.lists@ gt; wrote: Hi, I have a serious problem with counters performance and I can't seem to figure it out. Basically I'm building a system for accumulating some statistics on the fly via Cassandra distributed counters. For this I need counter updates to work really fast and herein lies my problem -- as soon as I enable replication_factor = 2, the performance goes down the drain. This happens in my tests using both 1.0.x and 1.1.6. Let me elaborate: I have two boxes (virtual servers on top of physical servers rented specifically for this purpose, i.e. it's not a cloud, nor it is shared; virtual servers are managed by our admins as a way to limit damage as I suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner because I want to be able to do some range queries. First, I set up Cassandra individually on each box (not in a cluster) and test counter increments performance (exclusively increments, no reads). For tests I use code that is intended to somewhat resemble the expected load pattern -- particularly the majority of increments create new counters with some updating (adding) to already existing counters. In this test each single node exhibits respectable performance - something on the order of 70k (seventy thousand) increments per second. I then join both of these nodes into single cluster (using SimpleSnitch and SimpleStrategy, nothing fancy yet). I then run the same test using replication_factor=1. The performance is on the order of 120k increments per second -- which seems to be a reasonable increase over the single node performance. HOWEVER I then rerun the same test on the two-node cluster using replication_factor=2 -- which is the least I'll need for actual production for redundancy purposes. And the performance I get is absolutely horrible -- much, MUCH worse than even single-node performance -- something on the order of less than 25k increments per second. In addition to clients not being able to push updates fast enough, I also see a lot of 'messages dropped' messages in the Cassandra log under this load. Could anyone advise what could be causing such drastic performance drop under replication_factor=2? I was expecting something on the order of single-node performance, not approximately 3x less. When testing replication_factor=2 on 1.1.6 I can see that CPU usage goes through the roof. On 1.0.x I think it looked more like disk overload, but I'm not sure (being on virtual server I apparently
Re: counters + replication = awful performance?
I mispoke really. It is not dangerous you just have to understand what it means. this jira discusses it. https://issues.apache.org/jira/browse/CASSANDRA-3868 On Tue, Nov 27, 2012 at 6:13 PM, Scott McKay sco...@mailchannels.comwrote: We're having a similar performance problem. Setting 'replicate_on_write: false' fixes the performance issue in our tests. How dangerous is it? What exactly could go wrong? On 12-11-27 01:44 PM, Edward Capriolo wrote: The difference between Replication factor =1 and replication factor 1 is significant. Also it sounds like your cluster is 2 node so going from RF=1 to RF=2 means double the load on both nodes. You may want to experiment with the very dangerous column family attribute: - replicate_on_write: Replicate every counter update from the leader to the follower replicas. Accepts the values true and false. Edward On Tue, Nov 27, 2012 at 1:02 PM, Michael Kjellman mkjell...@barracuda.com wrote: Are you writing with QUORUM consistency or ONE? On 11/27/12 9:52 AM, Sergey Olefir solf.li...@gmail.com wrote: Hi Juan, thanks for your input! In my case, however, I doubt this is the case -- clients are able to push many more updates than I need to saturate replication_factor=2 case (e.g. I'm doing as many as 6x more increments when testing 2-node cluster with replication_factor=1), so bandwidth between clients and server should be sufficient. Bandwidth between nodes in the cluster should also be quite sufficient since they are both in the same DC. But it is something to check, thanks! Best regards, Sergey Juan Valencia wrote Hi Sergey, I know I've had similar issues with counters which were bottle-necked by network throughput. You might be seeing a problem with throughput between the clients and Cass or between the two Cass nodes. It might not be your case, but that was what happened to me :-) Juan On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir lt; solf.lists@ gt; wrote: Hi, I have a serious problem with counters performance and I can't seem to figure it out. Basically I'm building a system for accumulating some statistics on the fly via Cassandra distributed counters. For this I need counter updates to work really fast and herein lies my problem -- as soon as I enable replication_factor = 2, the performance goes down the drain. This happens in my tests using both 1.0.x and 1.1.6. Let me elaborate: I have two boxes (virtual servers on top of physical servers rented specifically for this purpose, i.e. it's not a cloud, nor it is shared; virtual servers are managed by our admins as a way to limit damage as I suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner because I want to be able to do some range queries. First, I set up Cassandra individually on each box (not in a cluster) and test counter increments performance (exclusively increments, no reads). For tests I use code that is intended to somewhat resemble the expected load pattern -- particularly the majority of increments create new counters with some updating (adding) to already existing counters. In this test each single node exhibits respectable performance - something on the order of 70k (seventy thousand) increments per second. I then join both of these nodes into single cluster (using SimpleSnitch and SimpleStrategy, nothing fancy yet). I then run the same test using replication_factor=1. The performance is on the order of 120k increments per second -- which seems to be a reasonable increase over the single node performance. HOWEVER I then rerun the same test on the two-node cluster using replication_factor=2 -- which is the least I'll need for actual production for redundancy purposes. And the performance I get is absolutely horrible -- much, MUCH worse than even single-node performance -- something on the order of less than 25k increments per second. In addition to clients not being able to push updates fast enough, I also see a lot of 'messages dropped' messages in the Cassandra log under this load. Could anyone advise what could be causing such drastic performance drop under replication_factor=2? I was expecting something on the order of single-node performance, not approximately 3x less. When testing replication_factor=2 on 1.1.6 I can see that CPU usage goes through the roof. On 1.0.x I think it looked more like disk overload, but I'm not sure (being on virtual server I apparently can't see true iostats). I do have Cassandra data on a separate disk, commit log and cache are currently on the same disk as the system. I experimented with commit log flush modes and even with disabling commit log at all -- but it doesn't seem to have noticeable impact on the performance when under replication_factor=2. Any suggestions and hints will be much
Re: counters + replication = awful performance?
Hi, thanks for your suggestions. Regarding replicate=2 vs replicate=1 performance: I expected that below configurations will have similar performance: - single node, replicate = 1 - two nodes, replicate = 2 (okay, this probably should be a bit slower due to additional overhead). However what I'm seeing is that second option (replicate=2) is about THREE times slower than single node. Regarding replicate_on_write -- it is, in fact, a dangerous option. As JIRA discusses, if you make changes to your ring (moving tokens and such) you will *silently* lose data. That is on top of whatever data you might end up losing if you run replicate_on_write=false and the only node that got the data fails. But what is much worse -- with replicate_on_write being false the data will NOT be replicated (in my tests) ever unless you explicitly request the cell. Then it will return the wrong result. And only on subsequent reads it will return adequate results. I haven't tested it, but documentation states that range query will NOT do 'read repair' and thus will not force replication. The test I did went like this: - replicate_on_write = false - write something to node A (which should in theory replicate to node B) - wait for a long time (longest was on the order of 5 hours) - read from node B (and here I was getting null / wrong result) - read from node B again (here you get what you'd expect after read repair) In essence, using replicate_on_write=false with rarely read data will practically defeat the purpose of having replication in the first place (failover, data redundancy). Or, in other words, this option doesn't look to be applicable to my situation. It looks like I will get much better performance by simply writing to two separate clusters rather than using single cluster with replicate=2. Which is kind of stupid :) I think something's fishy with counters and replication. Edward Capriolo wrote I mispoke really. It is not dangerous you just have to understand what it means. this jira discusses it. https://issues.apache.org/jira/browse/CASSANDRA-3868 On Tue, Nov 27, 2012 at 6:13 PM, Scott McKay lt; scottm@ gt;wrote: We're having a similar performance problem. Setting 'replicate_on_write: false' fixes the performance issue in our tests. How dangerous is it? What exactly could go wrong? On 12-11-27 01:44 PM, Edward Capriolo wrote: The difference between Replication factor =1 and replication factor 1 is significant. Also it sounds like your cluster is 2 node so going from RF=1 to RF=2 means double the load on both nodes. You may want to experiment with the very dangerous column family attribute: - replicate_on_write: Replicate every counter update from the leader to the follower replicas. Accepts the values true and false. Edward On Tue, Nov 27, 2012 at 1:02 PM, Michael Kjellman mkjellman@ wrote: Are you writing with QUORUM consistency or ONE? On 11/27/12 9:52 AM, Sergey Olefir lt; solf.lists@ gt; wrote: Hi Juan, thanks for your input! In my case, however, I doubt this is the case -- clients are able to push many more updates than I need to saturate replication_factor=2 case (e.g. I'm doing as many as 6x more increments when testing 2-node cluster with replication_factor=1), so bandwidth between clients and server should be sufficient. Bandwidth between nodes in the cluster should also be quite sufficient since they are both in the same DC. But it is something to check, thanks! Best regards, Sergey Juan Valencia wrote Hi Sergey, I know I've had similar issues with counters which were bottle-necked by network throughput. You might be seeing a problem with throughput between the clients and Cass or between the two Cass nodes. It might not be your case, but that was what happened to me :-) Juan On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir lt; solf.lists@ gt; wrote: Hi, I have a serious problem with counters performance and I can't seem to figure it out. Basically I'm building a system for accumulating some statistics on the fly via Cassandra distributed counters. For this I need counter updates to work really fast and herein lies my problem -- as soon as I enable replication_factor = 2, the performance goes down the drain. This happens in my tests using both 1.0.x and 1.1.6. Let me elaborate: I have two boxes (virtual servers on top of physical servers rented specifically for this purpose, i.e. it's not a cloud, nor it is shared; virtual servers are managed by our admins as a way to limit damage as I suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner because I want to be able to do some range queries. First, I set up Cassandra individually on each box (not in a cluster) and test counter increments performance (exclusively increments, no reads). For tests I use code that is intended to somewhat resemble the expected
Re: counters + replication = awful performance?
Cassandra's counters read on increment. Additionally they are distributed so that can be multiple reads on increment. If they are not fast enough and you have avoided all tuning options add more servers to handle the load. In many cases incrementing the same counter n times can be avoided. Twitter's rainbird did just that. It avoided multiple counter increments by batching them. I have done a similar think using cassandra and Kafka. https://github.com/edwardcapriolo/IronCount/blob/master/src/test/java/com/jointhegrid/ironcount/mockingbird/MockingBirdMessageHandler.java On Tuesday, November 27, 2012, Sergey Olefir solf.li...@gmail.com wrote: Hi, thanks for your suggestions. Regarding replicate=2 vs replicate=1 performance: I expected that below configurations will have similar performance: - single node, replicate = 1 - two nodes, replicate = 2 (okay, this probably should be a bit slower due to additional overhead). However what I'm seeing is that second option (replicate=2) is about THREE times slower than single node. Regarding replicate_on_write -- it is, in fact, a dangerous option. As JIRA discusses, if you make changes to your ring (moving tokens and such) you will *silently* lose data. That is on top of whatever data you might end up losing if you run replicate_on_write=false and the only node that got the data fails. But what is much worse -- with replicate_on_write being false the data will NOT be replicated (in my tests) ever unless you explicitly request the cell. Then it will return the wrong result. And only on subsequent reads it will return adequate results. I haven't tested it, but documentation states that range query will NOT do 'read repair' and thus will not force replication. The test I did went like this: - replicate_on_write = false - write something to node A (which should in theory replicate to node B) - wait for a long time (longest was on the order of 5 hours) - read from node B (and here I was getting null / wrong result) - read from node B again (here you get what you'd expect after read repair) In essence, using replicate_on_write=false with rarely read data will practically defeat the purpose of having replication in the first place (failover, data redundancy). Or, in other words, this option doesn't look to be applicable to my situation. It looks like I will get much better performance by simply writing to two separate clusters rather than using single cluster with replicate=2. Which is kind of stupid :) I think something's fishy with counters and replication. Edward Capriolo wrote I mispoke really. It is not dangerous you just have to understand what it means. this jira discusses it. https://issues.apache.org/jira/browse/CASSANDRA-3868 On Tue, Nov 27, 2012 at 6:13 PM, Scott McKay lt; scottm@ gt;wrote: We're having a similar performance problem. Setting 'replicate_on_write: false' fixes the performance issue in our tests. How dangerous is it? What exactly could go wrong? On 12-11-27 01:44 PM, Edward Capriolo wrote: The difference between Replication factor =1 and replication factor 1 is significant. Also it sounds like your cluster is 2 node so going from RF=1 to RF=2 means double the load on both nodes. You may want to experiment with the very dangerous column family attribute: - replicate_on_write: Replicate every counter update from the leader to the follower replicas. Accepts the values true and false. Edward On Tue, Nov 27, 2012 at 1:02 PM, Michael Kjellman mkjellman@ wrote: Are you writing with QUORUM consistency or ONE? On 11/27/12 9:52 AM, Sergey Olefir lt; solf.lists@ gt; wrote: Hi Juan, thanks for your input! In my case, however, I doubt this is the case -- clients are able to push many more updates than I need to saturate replication_factor=2 case (e.g. I'm doing as many as 6x more increments when testing 2-node cluster with replication_factor=1), so bandwidth between clients and server should be sufficient. Bandwidth between nodes in the cluster should also be quite sufficient since they are both in the same DC. But it is something to check, thanks! Best regards, Sergey Juan Valencia wrote Hi Sergey, I know I've had similar issues with counters which were bottle-necked by network throughput. You might be seeing a problem with throughput between the clients and Cass or between the two Cass nodes. It might not be your case, but that was what happened to me :-) Juan On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir lt; solf.lists@ gt; wrote: Hi, I have a serious problem with counters performance and I can't seem to figure it out. Basically I'm building a system for accumulating some statistics on the fly via Cassandra distributed counters. For this I need counter updates to work really fast and herein lies my problem -- as soon as I enable replication_factor = 2, the
Re: counters + replication = awful performance?
By the way the other issues you are seeing with replicate on write at false could be because you did not repair. You should do that when changing rf. On Tuesday, November 27, 2012, Edward Capriolo edlinuxg...@gmail.com wrote: Cassandra's counters read on increment. Additionally they are distributed so that can be multiple reads on increment. If they are not fast enough and you have avoided all tuning options add more servers to handle the load. In many cases incrementing the same counter n times can be avoided. Twitter's rainbird did just that. It avoided multiple counter increments by batching them. I have done a similar think using cassandra and Kafka. https://github.com/edwardcapriolo/IronCount/blob/master/src/test/java/com/jointhegrid/ironcount/mockingbird/MockingBirdMessageHandler.java On Tuesday, November 27, 2012, Sergey Olefir solf.li...@gmail.com wrote: Hi, thanks for your suggestions. Regarding replicate=2 vs replicate=1 performance: I expected that below configurations will have similar performance: - single node, replicate = 1 - two nodes, replicate = 2 (okay, this probably should be a bit slower due to additional overhead). However what I'm seeing is that second option (replicate=2) is about THREE times slower than single node. Regarding replicate_on_write -- it is, in fact, a dangerous option. As JIRA discusses, if you make changes to your ring (moving tokens and such) you will *silently* lose data. That is on top of whatever data you might end up losing if you run replicate_on_write=false and the only node that got the data fails. But what is much worse -- with replicate_on_write being false the data will NOT be replicated (in my tests) ever unless you explicitly request the cell. Then it will return the wrong result. And only on subsequent reads it will return adequate results. I haven't tested it, but documentation states that range query will NOT do 'read repair' and thus will not force replication. The test I did went like this: - replicate_on_write = false - write something to node A (which should in theory replicate to node B) - wait for a long time (longest was on the order of 5 hours) - read from node B (and here I was getting null / wrong result) - read from node B again (here you get what you'd expect after read repair) In essence, using replicate_on_write=false with rarely read data will practically defeat the purpose of having replication in the first place (failover, data redundancy). Or, in other words, this option doesn't look to be applicable to my situation. It looks like I will get much better performance by simply writing to two separate clusters rather than using single cluster with replicate=2. Which is kind of stupid :) I think something's fishy with counters and replication. Edward Capriolo wrote I mispoke really. It is not dangerous you just have to understand what it means. this jira discusses it. https://issues.apache.org/jira/browse/CASSANDRA-3868 On Tue, Nov 27, 2012 at 6:13 PM, Scott McKay lt; scottm@ gt;wrote: We're having a similar performance problem. Setting 'replicate_on_write: false' fixes the performance issue in our tests. How dangerous is it? What exactly could go wrong? On 12-11-27 01:44 PM, Edward Capriolo wrote: The difference between Replication factor =1 and replication factor 1 is significant. Also it sounds like your cluster is 2 node so going from RF=1 to RF=2 means double the load on both nodes. You may want to experiment with the very dangerous column family attribute: - replicate_on_write: Replicate every counter update from the leader to the follower replicas. Accepts the values true and false. Edward On Tue, Nov 27, 2012 at 1:02 PM, Michael Kjellman mkjellman@ wrote: Are you writing with QUORUM consistency or ONE? On 11/27/12 9:52 AM, Sergey Olefir lt; solf.lists@ gt; wrote: Hi Juan, thanks for your input! In my case, however, I doubt this is the case -- clients are able to push many more updates than I need to saturate replication_factor=2 case (e.g. I'm doing as many as 6x more increments when testing 2-node cluster with replication_factor=1), so bandwidth between clients and server should be sufficient. Bandwidth between nodes in the cluster should also be 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook -- *Scott McKay*, Sr. Software Developer MailChannels Tel: +1 604 685 7488 x 509 www.mailchannels.com -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-replication-awful-performance-tp7583993p7584011.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: counters + replication = awful performance?
I already do a lot of in-memory aggregation before writing to Cassandra. The question here is what is wrong with Cassandra (or its configuration) that causes huge performance drop when moving from 1-replication to 2-replication for counters -- and more importantly how to resolve the problem. 2x-3x drop when moving from 1-replication to 2-replication on two nodes is reasonable. 6x is not. Like I said, with this kind of performance degradation it makes more sense to run two clusters with replication=1 in parallel rather than rely on Cassandra replication. And yes, Rainbird was the inspiration for what we are trying to do here :) Edward Capriolo wrote Cassandra's counters read on increment. Additionally they are distributed so that can be multiple reads on increment. If they are not fast enough and you have avoided all tuning options add more servers to handle the load. In many cases incrementing the same counter n times can be avoided. Twitter's rainbird did just that. It avoided multiple counter increments by batching them. I have done a similar think using cassandra and Kafka. https://github.com/edwardcapriolo/IronCount/blob/master/src/test/java/com/jointhegrid/ironcount/mockingbird/MockingBirdMessageHandler.java On Tuesday, November 27, 2012, Sergey Olefir lt; solf.lists@ gt; wrote: Hi, thanks for your suggestions. Regarding replicate=2 vs replicate=1 performance: I expected that below configurations will have similar performance: - single node, replicate = 1 - two nodes, replicate = 2 (okay, this probably should be a bit slower due to additional overhead). However what I'm seeing is that second option (replicate=2) is about THREE times slower than single node. Regarding replicate_on_write -- it is, in fact, a dangerous option. As JIRA discusses, if you make changes to your ring (moving tokens and such) you will *silently* lose data. That is on top of whatever data you might end up losing if you run replicate_on_write=false and the only node that got the data fails. But what is much worse -- with replicate_on_write being false the data will NOT be replicated (in my tests) ever unless you explicitly request the cell. Then it will return the wrong result. And only on subsequent reads it will return adequate results. I haven't tested it, but documentation states that range query will NOT do 'read repair' and thus will not force replication. The test I did went like this: - replicate_on_write = false - write something to node A (which should in theory replicate to node B) - wait for a long time (longest was on the order of 5 hours) - read from node B (and here I was getting null / wrong result) - read from node B again (here you get what you'd expect after read repair) In essence, using replicate_on_write=false with rarely read data will practically defeat the purpose of having replication in the first place (failover, data redundancy). Or, in other words, this option doesn't look to be applicable to my situation. It looks like I will get much better performance by simply writing to two separate clusters rather than using single cluster with replicate=2. Which is kind of stupid :) I think something's fishy with counters and replication. Edward Capriolo wrote I mispoke really. It is not dangerous you just have to understand what it means. this jira discusses it. https://issues.apache.org/jira/browse/CASSANDRA-3868 On Tue, Nov 27, 2012 at 6:13 PM, Scott McKay lt; scottm@ gt;wrote: We're having a similar performance problem. Setting 'replicate_on_write: false' fixes the performance issue in our tests. How dangerous is it? What exactly could go wrong? On 12-11-27 01:44 PM, Edward Capriolo wrote: The difference between Replication factor =1 and replication factor 1 is significant. Also it sounds like your cluster is 2 node so going from RF=1 to RF=2 means double the load on both nodes. You may want to experiment with the very dangerous column family attribute: - replicate_on_write: Replicate every counter update from the leader to the follower replicas. Accepts the values true and false. Edward On Tue, Nov 27, 2012 at 1:02 PM, Michael Kjellman mkjellman@ wrote: Are you writing with QUORUM consistency or ONE? On 11/27/12 9:52 AM, Sergey Olefir lt; solf.lists@ gt; wrote: Hi Juan, thanks for your input! In my case, however, I doubt this is the case -- clients are able to push many more updates than I need to saturate replication_factor=2 case (e.g. I'm doing as many as 6x more increments when testing 2-node cluster with replication_factor=1), so bandwidth between clients and server should be sufficient. Bandwidth between nodes in the cluster should also be quite sufficient since they are both in the same DC. But it is something to check, thanks! Best regards, Sergey Juan Valencia wrote Hi Sergey, I know I've had similar issues with
Re: selective replication of keyspaces
Thank you. This is a good start I was beginning to think it couldn't be done. When I run the command I get the error syntax error at position 21: missing EOF at 'placement_strategy' that is probably because I still need to set the correct properties in the conf files On November 27, 2012 at 5:41 PM Edward Capriolo edlinuxg...@gmail.com wrote: You can do something like this: Divide your nodes up into 4 datacenters art1,art2,art3,core [default@unknown] create keyspace art1 placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{art1:2,core:2}]; [default@unknown] create keyspace art2 placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{art2:2,core:2}]; [default@unknown] create keyspace art3 placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{art3:2,core:2}]; [default@unknown] create keyspace core placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{core:2}]; On Tue, Nov 27, 2012 at 5:02 PM, Artist jer...@simpleartmarketing.com mailto:jer...@simpleartmarketing.com wrote: I have 3 art-servers each has a cassandra cluster. Each of the art-servers has config/state information stored in a Keyspaces respectively called art-server-1-current-state, art-server-2-current-state, art-server-3-current-state in my core server I have a separate Cassandra cluster. I would like to use Cassandra to replicate the current-state of each art-server on the core cassandra server without sharing that information with any of the art-servers. Is there is a way to replicate the keyspaces to a single Cassandra cluster my core without having any peer sharing between the 3 art-servers. - Artist -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/selective-replication-of-keyspaces-tp7584007.html http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/selective-replication-of-keyspaces-tp7584007.html Sent from the cassandra-u...@incubator.apache.org mailto:cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: counters + replication = awful performance?
Say you are doing 100 inserts rf1 on two nodes. That is 50 inserts a node. If you go to rf2 that is 100 inserts a node. If you were at 75 % capacity on each mode your now at 150% which is not possible so things bog down. To figure out what is going on we would need to see tpstat, iostat , and top information. I think your looking at the performance the wrong way. Starting off at rf 1 is not the way to understand cassandra performance. You do not get the benefits of scala out don't happen until you fix your rf and increment your nodecount. Ie 5 nodes at rf 3 is fast 10 nodes at rf 3 even better. On Tuesday, November 27, 2012, Sergey Olefir solf.li...@gmail.com wrote: I already do a lot of in-memory aggregation before writing to Cassandra. The question here is what is wrong with Cassandra (or its configuration) that causes huge performance drop when moving from 1-replication to 2-replication for counters -- and more importantly how to resolve the problem. 2x-3x drop when moving from 1-replication to 2-replication on two nodes is reasonable. 6x is not. Like I said, with this kind of performance degradation it makes more sense to run two clusters with replication=1 in parallel rather than rely on Cassandra replication. And yes, Rainbird was the inspiration for what we are trying to do here :) Edward Capriolo wrote Cassandra's counters read on increment. Additionally they are distributed so that can be multiple reads on increment. If they are not fast enough and you have avoided all tuning options add more servers to handle the load. In many cases incrementing the same counter n times can be avoided. Twitter's rainbird did just that. It avoided multiple counter increments by batching them. I have done a similar think using cassandra and Kafka. https://github.com/edwardcapriolo/IronCount/blob/master/src/test/java/com/jointhegrid/ironcount/mockingbird/MockingBirdMessageHandler.java On Tuesday, November 27, 2012, Sergey Olefir lt; solf.lists@ gt; wrote: Hi, thanks for your suggestions. Regarding replicate=2 vs replicate=1 performance: I expected that below configurations will have similar performance: - single node, replicate = 1 - two nodes, replicate = 2 (okay, this probably should be a bit slower due to additional overhead). However what I'm seeing is that second option (replicate=2) is about THREE times slower than single node. Regarding replicate_on_write -- it is, in fact, a dangerous option. As JIRA discusses, if you make changes to your ring (moving tokens and such) you will *silently* lose data. That is on top of whatever data you might end up losing if you run replicate_on_write=false and the only node that got the data fails. But what is much worse -- with replicate_on_write being false the data will NOT be replicated (in my tests) ever unless you explicitly request the cell. Then it will return the wrong result. And only on subsequent reads it will return adequate results. I haven't tested it, but documentation states that range query will NOT do 'read repair' and thus will not force replication. The test I did went like this: - replicate_on_write = false - write something to node A (which should in theory replicate to node B) - wait for a long time (longest was on the order of 5 hours) - read from node B (and here I was getting null / wrong result) - read from node B again (here you get what you'd expect after read repair) In essence, using replicate_on_write=false with rarely read data will practically defeat the purpose of having replication in the first place (failover, data redundancy). Or, in other words, this option doesn't look to be applicable to my situation. It looks like I will get much better performance by simply writing to two separate clusters rather than using single cluster with replicate=2. Which is kind of stupid :) I think something's fishy with counters and replication. Edward Capriolo wrote I mispoke really. It is not dangerous you just have to understand what it means. this jira discusses it. https://issues.apache.org/jira/browse/CASSANDRA-3868 On Tue, Nov 27, 2012 at 6:13 PM, Scott McKay lt; scottm@ gt;wrote: We're having a similar performance problem. Setting 'replicate_on_write: false' fixes the performance issue in our tests. How dangerous is it? What exactly could go wrong? On 12-11-27 01:44 PM, Edward Capriolo wrote: The difference between Replication factor =1 and replication factor 1 is significant. Also it sounds like your cluster is 2 node so going from RF=1 to RF=2 means double the load on both nodes. You may want to experiment with the very dangerous column family attribute: - replicate_on_write: Replicate every counter update from the leader to the follower replicas. Accepts the values true and false. Edward On Tue, Nov 27, 2012 at 1:02 PM, Michael Kjellman mkjellman@ wrote: Are you writing with QUORUM consistency or
Re: selective replication of keyspaces
My mistake that is older cli syntax, I wad just showing the concept set up 4 datacenter and selectively replicate keyspaces between them. On Tuesday, November 27, 2012, jer...@simpleartmarketing.com jer...@simpleartmarketing.com wrote: Thank you. This is a good start I was beginning to think it couldn't be done. When I run the command I get the error syntax error at position 21: missing EOF at 'placement_strategy' that is probably because I still need to set the correct properties in the conf files On November 27, 2012 at 5:41 PM Edward Capriolo edlinuxg...@gmail.com wrote: You can do something like this: Divide your nodes up into 4 datacenters art1,art2,art3,core [default@unknown] create keyspace art1 placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{art1:2,core:2}]; [default@unknown] create keyspace art2 placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{art2:2,core:2}]; [default@unknown] create keyspace art3 placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{art3:2,core:2}]; [default@unknown] create keyspace core placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{core:2}]; On Tue, Nov 27, 2012 at 5:02 PM, Artist jer...@simpleartmarketing.com wrote: I have 3 art-servers each has a cassandra cluster. Each of the art-servers has config/state information stored in a Keyspaces respectively called art-server-1-current-state, art-server-2-current-state, art-server-3-current-state in my core server I have a separate Cassandra cluster. I would like to use Cassandra to replicate the current-state of each art-server on the core cassandra server without sharing that information with any of the art-servers. Is there is a way to replicate the keyspaces to a single Cassandra cluster my core without having any peer sharing between the 3 art-servers. - Artist -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/selective-replication-of-keyspaces-tp7584007.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Generic questions over Cassandra 1.1/1.2
@Bill Are you saying that now cassandra is less schema less ? :) Compact storage is the schemaless of old. On Tuesday, November 27, 2012, Bill de hÓra b...@dehora.net wrote: I'm not sure I always understand what people mean by schema less exactly and I'm curious. For 'schema less', given this - {{{ cqlsh use example; cqlsh:example CREATE TABLE users ( ... user_name varchar, ... password varchar, ... gender varchar, ... session_token varchar, ... state varchar, ... birth_year bigint, ... PRIMARY KEY (user_name) ... ); }}} I expect this would not cause an unknown identifier error - {{{ INSERT INTO users (user_name, password, extra, moar) VALUES ('bob', 'secret', 'a', 'b'); }}} but definitions vary. Bill On 26/11/12 09:18, Sylvain Lebresne wrote: On Mon, Nov 26, 2012 at 8:41 AM, aaron morton aa...@thelastpickle.com mailto:aa...@thelastpickle.com wrote: Is there any noticeable performance difference between thrift or CQL3? Off the top of my head it's within 5% (maybe 10%) under stress tests. See Eric's talk at the Cassandra SF conference for the exact numbers. Eric's benchmark results was that normal queries were slightly slower but prepared one (and in real life, I see no good reason not to prepare statements) were actually slightly faster. CQL 3 requires a schema, however altering the schema is easier. And in 1.2 will support concurrent schema modifications. Thrift API is still schema less. Sorry to hijack this thread, but I'd be curious (like seriously, I'm not trolling) to understand what you mean by CQL 3 requires a schema but Thrift API is still schema less. Basically I'm not sure I always understand what people mean by schema less exactly and I'm curious. -- Sylvain
Re: Upgrade
Do you have the error stack ? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 28/11/2012, at 12:28 AM, Everton Lima peitin.inu...@gmail.com wrote: Hello people. I was using cassandra 1.1.6 and use the Object CassandraServer() to create keyspaces by my code. But when I update to version 1.2.0-beta2, my code starts to throw Exception (NullPointerException) in the method: in version 1.1.6 CassandraServer - state() - { SocketAddress remoteSocket = SocketSessionManagementService.remoteSocket.get(); if (remoteSocket == null) return clientState.get(); ClientState cState = SocketSessionManagementService.instance.get(remoteSocket); if (cState == null) { cState = new ClientState(); SocketSessionManagementService.instance.put(remoteSocket, cState); } return cState; } in version 1.2.0 CassandraServer - state() - { return ThriftSessionManager.instance.currentSession(); } currtentSession(){ SocketAddress socket = remoteSocket.get(); assert socket != null; ThriftClientState cState = activeSocketSessions.get(socket); if (cState == null) { cState = new ThriftClientState(); activeSocketSessions.put(socket, cState); } return cState; } So, in version 1.1.6, it verify if has a remote connection, it not it try to get o local. In the version 1.2.0 it try to get a remote connection and apply it to a ThriftClientState, but if does not have a remote connection (like in 1.1.6) it will throw a NullPointerException in line: ThriftClientState cState = activeSocketSessions.get(socket); Is any way to use CassandraServer in the new version?? Thanks! -- Everton Lima Aleixo Bacharel em Ciencia da Computação Universidade Federal de Goiás
Re: How to determine compaction bottlenecks
I've been playing around with trying to figure out what is making compactions run so slow. Is this regular compaction or table upgrades ? I *think* upgrade tables is single threaded. Do you have some compaction logs lines that say Compacted to…? It's handy to see the throughput and the number of keys compacted. snapshot_before_compaction: false in_memory_compaction_limit_in_mb: 256 multithreaded_compaction: true compaction_throughput_mb_per_sec: 128 compaction_preheat_key_cache: true What setting for concurrent_compactors ? I would also check the logs for GC issues. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 28/11/2012, at 4:23 AM, Derek Bromenshenkel derek.bromenshen...@gmail.com wrote: Setup: C* 1.1.6, 6 node (Linux, 64GB RAM, 16 Core CPU, 2x512 SSD), RF=3, 1.65TB total used Background: Client app is off - no reads/writes happening. Doing some cluster maintenance requiring node repairs and upgradesstables. I've been playing around with trying to figure out what is making compactions run so slow. Watching syslogs, it seems to average 3-4MB/s. That just seems so slow for this set up and the fact there is zero external load on the cluster. As far as I can tell: 1. Not I/O bound according to iostat data 2. CPU seems to be idiling also 3. From my understanding, I am using all the correct compaction settings for this setup: Here are those below: snapshot_before_compaction: false in_memory_compaction_limit_in_mb: 256 multithreaded_compaction: true compaction_throughput_mb_per_sec: 128 compaction_preheat_key_cache: true Some other thoughts: - I have turned on DEBUG logging for the Throttle class and played with the live compaction_throughput_mb_per_sec setting. I can see it performing the throttling if I set the value low (say 4), but anything over 8 it is apparently running wide open. [Side note: Although the math for the Throttle class adds up, over all the throttling seems to be very very conservative.] - I accidently turned on DEBUG for the entire ...compaction.* package and that unintentionally created A LOT of I/O from the ParallelCompactionIterable class, and the disk/OS handled that just fine. Perhaps I just don't fully grasp what is going on or have the correct expectations. I am OK with things being slow if the hardware is working hard, but that does not seem to be the case. Anyone have some insight? Thanks
Re: Frame size exceptions occurring with ColumnFamilyInputFormat for very large rows
Hello, I was wondering if anyone had an answer to my previous message below. Seems another is having the same problem, but unfortunately with no response as well. http://mail-archives.apache.org/mod_mbox/cassandra-user/201211.mbox/%3c509a4a1f.8070...@semantico.com%3E Any help would be much appreciated. Thank you, Marko. http://markorodriguez.com On Nov 9, 2012, at 3:02 PM, Marko Rodriguez wrote: Hello, I am trying to run a Hadoop job that pulls data out of Cassandra via ColumnFamilyInputFormat. I am getting a frame size exception. To remedy that, I have set both the thrift_framed_transport_size_in_mb and thrift_max_message_length_in_mb to an infinite amount at 10mb on all nodes. Moreover, I have restarted the cluster and the cassandra.yaml files have been reloaded. However, I am still getting: 12/11/09 21:39:52 INFO mapred.JobClient: map 62% reduce 0% 12/11/09 21:40:09 INFO mapred.JobClient: Task Id : attempt_201211082011_0015_m_000479_2, Status : FAILED java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Frame size (30046945) larger than max length (16384000)! at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:400) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:406) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:324) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:189) Question: Why is 16384000 bytes (I assume) != 10mb? Next, I made this parameter true as a last hail mary attempt: cassandra.input.widerows=true ...still with no luck. Does someone know what I might be missing? Thank you very much for your time, Marko. http://markorodriguez.com
Re: Frame size exceptions occurring with ColumnFamilyInputFormat for very large rows
Hi, Even when setting it to 32m in cassandra.yaml (and restarting Cassandra), the same problem emerges -- its as if Cassandra doesn't register the update (its always locked at 16mb). And I know that Cassandra is reading the property from cassandra.yaml because if I do -1, it complains saying it must be a positive value. Apologies for the back and forth --- though I have no obvious way forward for myself. Thank you, Marko. http://markorodriguez.com P.S. Is a brontobyte an order of magnitude less than a tyranobyte? On Nov 27, 2012, at 8:26 PM, Edward Capriolo wrote: Thrift has to buffer the packet into memory so setting it to 1,000,000 brontobytes is a bad idea. On Tue, Nov 27, 2012 at 9:17 PM, Marko Rodriguez okramma...@gmail.com wrote: Hello, I was wondering if anyone had an answer to my previous message below. Seems another is having the same problem, but unfortunately with no response as well. http://mail-archives.apache.org/mod_mbox/cassandra-user/201211.mbox/%3c509a4a1f.8070...@semantico.com%3E Any help would be much appreciated. Thank you, Marko. http://markorodriguez.com On Nov 9, 2012, at 3:02 PM, Marko Rodriguez wrote: Hello, I am trying to run a Hadoop job that pulls data out of Cassandra via ColumnFamilyInputFormat. I am getting a frame size exception. To remedy that, I have set both the thrift_framed_transport_size_in_mb and thrift_max_message_length_in_mb to an infinite amount at 10mb on all nodes. Moreover, I have restarted the cluster and the cassandra.yaml files have been reloaded. However, I am still getting: 12/11/09 21:39:52 INFO mapred.JobClient: map 62% reduce 0% 12/11/09 21:40:09 INFO mapred.JobClient: Task Id : attempt_201211082011_0015_m_000479_2, Status : FAILED java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Frame size (30046945) larger than max length (16384000)! at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:400) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:406) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:324) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:189) Question: Why is 16384000 bytes (I assume) != 10mb? Next, I made this parameter true as a last hail mary attempt: cassandra.input.widerows=true ...still with no luck. Does someone know what I might be missing? Thank you very much for your time, Marko. http://markorodriguez.com
Re: need some help with row cache
On Tue, Nov 27, 2012 at 8:16 PM, Yiming Sun yiming@gmail.com wrote: Hello, but it is not clear to me where this setting belongs to, because even in the v1.1.6 conf/cassandra.yaml, there is no such property, and apparently adding this property to the yaml causes a fatal configuration error upon server startup, It's a per column family setting that can be applied using the CLI or CQL. With CQL3 it would be ALTER TABLE cf WITH caching = 'rows_only'; to enable the row cache but no key cache for that CF. -Bryan
Re: Hive on Cassandra : issues in setup
Can someone please help me with this or share your experiences if you have tried this before please? From: Naveen Reddy naveen_2...@yahoo.co.in To: user@cassandra.apache.org user@cassandra.apache.org Sent: Monday, 26 November 2012 4:24 PM Subject: Hive on Cassandra : issues in setup Hi, I am trying to setup hive on Cassandra using https://github.com/riptano/hive/wiki/Cassandra-Handler-usage-in-Hive-0.7-with-Cassandra-0.7 . I do not find any better documentation, if you have any other pointers please provide them. I and trying to setup this without using brisk from DataStax. Versions – hive-0.8.1 hadoop-1.0.4 apache-cassandra-1.0.7 - as mentioned in ivy/libraries.properties hive-cassandra-handler-0.8.1.jar : from hive-hive-0.8.1-merge ( https://github.com/riptano/hive ) – downloaded this source and built And I am hitting the below error while running a query from hive. Help please - I tried multiple versions of Cassandra/hive/hadoop, but getting stuck with one or the other exception. Can you please tell me the right combination of versions. hive CREATE EXTERNAL TABLE invites2 (foo INT, bar STRING) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( cassandra.host = localhost, cassandra.port = 9160, cassandra.ks.name = examples, cassandra.cf.name = invites2 ); java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(Unknown Source) at java.security.SecureClassLoader.defineClass(Unknown Source) at java.net.URLClassLoader.defineClass(Unknown Source) at java.net.URLClassLoader.access$000(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClassInternal(Unknown Source) at org.apache.hadoop.hive.cassandra.CassandraClientHolder.initClient(CassandraClientHolder.java:54) at org.apache.hadoop.hive.cassandra.CassandraClientHolder.init(CassandraClientHolder.java:31) at org.apache.hadoop.hive.cassandra.CassandraClientHolder.init(CassandraClientHolder.java:23) at org.apache.hadoop.hive.cassandra.CassandraProxyClient.createConnection(CassandraProxyClient.java:136) at org.apache.hadoop.hive.cassandra.CassandraProxyClient.initializeConnection(CassandraProxyClient.java:152) at org.apache.hadoop.hive.cassandra.CassandraProxyClient.init(CassandraProxyClient.java:101) at org.apache.hadoop.hive.cassandra.CassandraManager.openConnection(CassandraManager.java:88) at org.apache.hadoop.hive.cassandra.CassandraStorageHandler.preCreateTable(CassandraStorageHandler.java:237) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:396) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:540) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3479) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:225) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.DDLTask Thank you Naveen
Re: need some help with row cache
Hi Bryan, Thank you very much for this information. So in other words, the settings such as row_cache_size_in_mb in YAML alone are not enough, and I must also specify the caching attribute on a per column family basis? -- Y. On Tue, Nov 27, 2012 at 11:57 PM, Bryan Talbot btal...@aeriagames.comwrote: On Tue, Nov 27, 2012 at 8:16 PM, Yiming Sun yiming@gmail.com wrote: Hello, but it is not clear to me where this setting belongs to, because even in the v1.1.6 conf/cassandra.yaml, there is no such property, and apparently adding this property to the yaml causes a fatal configuration error upon server startup, It's a per column family setting that can be applied using the CLI or CQL. With CQL3 it would be ALTER TABLE cf WITH caching = 'rows_only'; to enable the row cache but no key cache for that CF. -Bryan
Re: need some help with row cache
Also, what command can I used to see the caching setting? DESC TABLE cf doesn't list caching at all. Thanks. -- Y. On Wed, Nov 28, 2012 at 12:15 AM, Yiming Sun yiming@gmail.com wrote: Hi Bryan, Thank you very much for this information. So in other words, the settings such as row_cache_size_in_mb in YAML alone are not enough, and I must also specify the caching attribute on a per column family basis? -- Y. On Tue, Nov 27, 2012 at 11:57 PM, Bryan Talbot btal...@aeriagames.comwrote: On Tue, Nov 27, 2012 at 8:16 PM, Yiming Sun yiming@gmail.com wrote: Hello, but it is not clear to me where this setting belongs to, because even in the v1.1.6 conf/cassandra.yaml, there is no such property, and apparently adding this property to the yaml causes a fatal configuration error upon server startup, It's a per column family setting that can be applied using the CLI or CQL. With CQL3 it would be ALTER TABLE cf WITH caching = 'rows_only'; to enable the row cache but no key cache for that CF. -Bryan
Re: counters + replication = awful performance?
I think there might be a misunderstanding as to the nature of the problem. Say, I have test set T. And I have two identical servers A and B. - I tested that server A (singly) is able to handle load of T. - I tested that server B (singly) is able to handle load of T. - I then join A and B in the cluster and set replication=2 -- this means that each server in effect has to handle full test load individually (because there are two servers and replication=2 it means that each server effectively has to handle all the data written to the cluster). Under these circumstances it is reasonable to assume that cluster A+B shall be able to handle load T because each server is able to do so individually. HOWEVER, this is not the case. In fact, A+B together are only able to handle less than 1/3 of T DESPITE the fact that A and B individually are able to handle T just fine. I think there's something wrong with Cassandra replication (possibly as simple as me misconfiguring something) -- it shouldn't be three times faster to write to two separate nodes in parallel as compared to writing to 2-node Cassandra cluster with replication=2. Edward Capriolo wrote Say you are doing 100 inserts rf1 on two nodes. That is 50 inserts a node. If you go to rf2 that is 100 inserts a node. If you were at 75 % capacity on each mode your now at 150% which is not possible so things bog down. To figure out what is going on we would need to see tpstat, iostat , and top information. I think your looking at the performance the wrong way. Starting off at rf 1 is not the way to understand cassandra performance. You do not get the benefits of scala out don't happen until you fix your rf and increment your nodecount. Ie 5 nodes at rf 3 is fast 10 nodes at rf 3 even better. On Tuesday, November 27, 2012, Sergey Olefir lt; solf.lists@ gt; wrote: I already do a lot of in-memory aggregation before writing to Cassandra. The question here is what is wrong with Cassandra (or its configuration) that causes huge performance drop when moving from 1-replication to 2-replication for counters -- and more importantly how to resolve the problem. 2x-3x drop when moving from 1-replication to 2-replication on two nodes is reasonable. 6x is not. Like I said, with this kind of performance degradation it makes more sense to run two clusters with replication=1 in parallel rather than rely on Cassandra replication. And yes, Rainbird was the inspiration for what we are trying to do here :) Edward Capriolo wrote Cassandra's counters read on increment. Additionally they are distributed so that can be multiple reads on increment. If they are not fast enough and you have avoided all tuning options add more servers to handle the load. In many cases incrementing the same counter n times can be avoided. Twitter's rainbird did just that. It avoided multiple counter increments by batching them. I have done a similar think using cassandra and Kafka. https://github.com/edwardcapriolo/IronCount/blob/master/src/test/java/com/jointhegrid/ironcount/mockingbird/MockingBirdMessageHandler.java On Tuesday, November 27, 2012, Sergey Olefir lt; solf.lists@ gt; wrote: Hi, thanks for your suggestions. Regarding replicate=2 vs replicate=1 performance: I expected that below configurations will have similar performance: - single node, replicate = 1 - two nodes, replicate = 2 (okay, this probably should be a bit slower due to additional overhead). However what I'm seeing is that second option (replicate=2) is about THREE times slower than single node. Regarding replicate_on_write -- it is, in fact, a dangerous option. As JIRA discusses, if you make changes to your ring (moving tokens and such) you will *silently* lose data. That is on top of whatever data you might end up losing if you run replicate_on_write=false and the only node that got the data fails. But what is much worse -- with replicate_on_write being false the data will NOT be replicated (in my tests) ever unless you explicitly request the cell. Then it will return the wrong result. And only on subsequent reads it will return adequate results. I haven't tested it, but documentation states that range query will NOT do 'read repair' and thus will not force replication. The test I did went like this: - replicate_on_write = false - write something to node A (which should in theory replicate to node B) - wait for a long time (longest was on the order of 5 hours) - read from node B (and here I was getting null / wrong result) - read from node B again (here you get what you'd expect after read repair) In essence, using replicate_on_write=false with rarely read data will practically defeat the purpose of having replication in the first place (failover, data redundancy). Or, in other words, this option doesn't look to be applicable to my situation. It looks like I will get much better performance by simply writing to two
Re: need some help with row cache
Use cassandracli. Thanks. -Wei Sent from my Samsung smartphone on ATT Original message Subject: Re: need some help with row cache From: Yiming Sun yiming@gmail.com To: user@cassandra.apache.org CC: Also, what command can I used to see the caching setting? DESC TABLE cf doesn't list caching at all. Thanks. -- Y. On Wed, Nov 28, 2012 at 12:15 AM, Yiming Sun yiming@gmail.com wrote: Hi Bryan, Thank you very much for this information. So in other words, the settings such as row_cache_size_in_mb in YAML alone are not enough, and I must also specify the caching attribute on a per column family basis? -- Y. On Tue, Nov 27, 2012 at 11:57 PM, Bryan Talbot btal...@aeriagames.com wrote: On Tue, Nov 27, 2012 at 8:16 PM, Yiming Sun yiming@gmail.com wrote: Hello, but it is not clear to me where this setting belongs to, because even in the v1.1.6 conf/cassandra.yaml, there is no such property, and apparently adding this property to the yaml causes a fatal configuration error upon server startup, It's a per column family setting that can be applied using the CLI or CQL. With CQL3 it would be ALTER TABLE cf WITH caching = 'rows_only'; to enable the row cache but no key cache for that CF. -Bryan
Always see strange error in logs on cassandra 1.1.6
Hi. After updating Cassandra from 1.1.5 to 1.1.6. Every schema updates ends with strange error exception in system.log and I must to restart nodes in cluster for whom `describe cluster` says that nodes are unreachable. Neither `nodetool repair` or `nodetool upgragesstables` doesn't help. Only double restarting of service helps, the first restart throws the same error, the second restart doesn't and starts node normally. Error log: ERROR [FlushWriter:8] 2012-11-28 10:57:50,503 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[FlushWriter:8,5,main] java.lang.AssertionError: Keys must not be empty at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:133) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:176) at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:295) at org.apache.cassandra.db.Memtable.access$600(Memtable.java:48) at org.apache.cassandra.db.Memtable$5.runMayThrow(Memtable.java:316) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Best regards -Michael