Problems with node rejoining cluster
We need to do a rolling upgrade of our Cassandra cluster in production, since we are upgrading Cassandra on solaris to Cassandra on CentOS. (We went with solaris initially since most of our other hosts in production are solaris, but were running into some lockup issues during perf tests, and decided to switch to linux) Here are the steps we are following to take the node out of service, and get it back. Can someone comment if we are missing anything (eg. is it recommended to specify tokens in cassandra.yaml, or do something different with the seed hosts than mentioned below) 1. nodetool decommission - wait for the data to be streamed out. 2. Re-image (everything is wiped off the disks) the host to CentOS, with the same Cassandra version 3. Get Cassandra back up. Other details: - Using Cassandra 1.1.5 - We do not specify any tokens in cassandra.yaml relying on bootstrap assigning the tokens automatically. - We are testing with a 4 node cluster, with only one seed host. The seed host is specified in the cassandra.yaml of each node and is not changed at any point. While testing the solaris to linux upgrade path, things seem to work smoothly. The data streams out fine, and streams back in when the node comes back up. However, testing the linux to solaris path (in case we need to rollback), we are facing some issues with the nodes joining back the ring. nodetool indicates that the node has joined back the ring, but no data streams in, the node doesn't know about the keyspaces/column families, etc. We see some errors in the logs of the newly added nodes pasted below. [17/06/2013:14:10:17 PDT] MutationStage:1: ERROR RowMutationVerbHandler.java (line 61) Error in row mutation org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1020 at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:439) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:447) at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:395) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) Thanks, Arindam
Re: How to do a CAS UPDATE on single column CF?
On 06/24/2013 08:35 PM, Arthur Zubarev wrote: On 06/24/2013 11:23 PM, Blair Zajac wrote: CAS UPDATE Since when C* has IF NOT EXISTS in DML part of CQL? It's new in 2.0. https://issues.apache.org/jira/browse/CASSANDRA-5062 https://github.com/riptano/cassandra-dtest/blob/master/cql_tests.py#L3044 Blair
Cassandra as storage for cache data
Hello, we are using Cassandra as a data storage for our caching system. Our application generates about 20 put and get requests per second. An average size of one cache item is about 500 Kb. Cache items are placed into one column family with TTL set to 20 - 60 minutes. Keys and values are bytes (not utf8 strings). Compaction strategy is SizeTieredCompactionStrategy. We setup Cassandra 1.2.6 cluster of 4 nodes. Replication factor is 2. Each node has 10GB of RAM and enough space on HDD. Now when we're putting this cluster into the load it's quickly fills with our runtime data (about 5 GB on every node) and we start observing performance degradation with often timeouts on client side. We see that on each node compaction starts very frequently and lasts for several minutes to complete. It seems that each node usually busy with compaction process. Here the questions: What are the recommended setup configuration for our use case? Is it makes sense to somehow tell Cassandra to keep all data in memory (memtables) to eliminate flushing it to disk (sstables) thus decreasing number of compactions? How to achieve this behavior? Cassandra is starting with default shell script that gives the following command line: jsvc.exec -user cassandra -home /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/../ -pidfile /var/run/cassandra.pid -errfile 1 -outfile /var/log/cassandra/output.log -cp CLASSPATH_SKIPPED -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -XX:HeapDumpPath=/var/lib/cassandra/java_1371805844.hprof -XX:ErrorFile=/var/lib/cassandra/hs_err_1371805844.log -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms2500M -Xmx2500M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false org.apache.cassandra.service.CassandraDaemon -- Best regards, Dmitry Olshansky
Re: Cassandra as storage for cache data
If you have rapidly expiring data, then tombstones are probably filling your disk and your heap (depending on how you order the data on disk). To check to see if your queries are affected by tombstones, you might try using the query tracing that's built-in to 1.2. See: http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets -- has an example of tracing where you can see tombstones affecting the query http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2 You'll want to consider reducing the gc_grace period from the default of 10 days for those column families - with the understanding why gc_grace exists in the first place, see http://wiki.apache.org/cassandra/DistributedDeletes . Then once the gc_grace period has passed, the tombstones will stay around until they are compacted away. So there are two options currently to compact them away more quickly: 1) use leveled compaction - see http://www.datastax.com/dev/blog/when-to-use-leveled-compaction Leveled compaction only requires 10% headroom (as opposed to 50% for size tiered compaction) for amount of disk that needs to be kept free. 2) if 1 doesn't work and you're still seeing performance degrading and the tombstones aren't getting cleared out fast enough, you might consider using size tiered compaction but performing regular major compactions to get rid of expired data. Keep in mind though that if you use gc_grace of 0 and do any kind of manual deletes outside of TTLs, you probably want to do the deletes at ConsistencyLevel.ALL or else if a node goes down, then comes back up, there's a chance that deleted data may be resurrected. That only applies to non-ttl data where you manually delete it. See the explanation of distributed deletes for more information. On 25 Jun 2013, at 13:31, Dmitry Olshansky dmitry.olshan...@gridnine.com wrote: Hello, we are using Cassandra as a data storage for our caching system. Our application generates about 20 put and get requests per second. An average size of one cache item is about 500 Kb. Cache items are placed into one column family with TTL set to 20 - 60 minutes. Keys and values are bytes (not utf8 strings). Compaction strategy is SizeTieredCompactionStrategy. We setup Cassandra 1.2.6 cluster of 4 nodes. Replication factor is 2. Each node has 10GB of RAM and enough space on HDD. Now when we're putting this cluster into the load it's quickly fills with our runtime data (about 5 GB on every node) and we start observing performance degradation with often timeouts on client side. We see that on each node compaction starts very frequently and lasts for several minutes to complete. It seems that each node usually busy with compaction process. Here the questions: What are the recommended setup configuration for our use case? Is it makes sense to somehow tell Cassandra to keep all data in memory (memtables) to eliminate flushing it to disk (sstables) thus decreasing number of compactions? How to achieve this behavior? Cassandra is starting with default shell script that gives the following command line: jsvc.exec -user cassandra -home /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/../ -pidfile /var/run/cassandra.pid -errfile 1 -outfile /var/log/cassandra/output.log -cp CLASSPATH_SKIPPED -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -XX:HeapDumpPath=/var/lib/cassandra/java_1371805844.hprof -XX:ErrorFile=/var/lib/cassandra/hs_err_1371805844.log -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms2500M -Xmx2500M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false org.apache.cassandra.service.CassandraDaemon -- Best regards, Dmitry Olshansky
Re: NREL has released open source Databus on github for time series data
When you say aggregates, do you mean converting 1 minute data to 15 minute data or do you mean summing different streams such that you have the total energy from energy streams A, B, C, etc. Ps. We are working on supporting both….there is a clusterable cron job thing in place right now that does some aggregation already but there is another in the works for moving higher rate data to lower rates. Dean From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, June 24, 2013 9:51 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: NREL has released open source Databus on github for time series data Hi Dean, Does this handle rollup aggregates along with the time series data ? I had a quick look at the links and could not see anything. Cheers Aaron - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 22/06/2013, at 2:51 AM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: NREL has released their open source databus. They spin it as energy data (and a system for campus energy/building energy) but it is very general right now and probably will stay pretty general. More information can be found here http://www.nrel.gov/analysis/databus/ The source code can be found here https://github.com/deanhiller/databus Star the project if you like the idea. NREL just did a big press release and is developing a community around the project. It is in it's early stages but there are users using it and I am helping HP set an instance up this month. If you want to become a committer on the project, let me know as well. Later, Dean
Is nexted selects supported by Cassandra JDBC??
Hi All, Is nested select supported by Cassandra JDBC driver? So for a simple example to get a list of user details from a users column family: Select * from user_details where user_id in (Select user_id from users) Thanks! -Tony
Re: Is nexted selects supported by Cassandra JDBC??
No. CQL3 doesn't support nested selects. -- Sylvain On Tue, Jun 25, 2013 at 5:02 PM, Tony Anecito adanec...@yahoo.com wrote: Hi All, Is nested select supported by Cassandra JDBC driver? So for a simple example to get a list of user details from a users column family: Select * from user_details where user_id in (Select user_id from users) Thanks! -Tony
cassandra-unit 1.2.0.1 is released : CQL3 and Spring
Hi all, Just to let you know that a new release of cassandra-unit is available with CQL3 dataset support and Spring integration. More here : http://www.unchticafe.fr/2013/06/cassandra-unit-1201-is-out-cql3-script.html Regards, -- Jérémy
Re: Is nexted selects supported by Cassandra JDBC??
Ok. So if I have a composite key table instead of a nested select I will have to run 2 queries else denormalize? Unless there is something provided by CQL 3 to do the same thing? Thanks, -Tony From: Sylvain Lebresne sylv...@datastax.com To: user@cassandra.apache.org user@cassandra.apache.org; Tony Anecito adanec...@yahoo.com Sent: Tuesday, June 25, 2013 9:06 AM Subject: Re: Is nexted selects supported by Cassandra JDBC?? No. CQL3 doesn't support nested selects. -- Sylvain On Tue, Jun 25, 2013 at 5:02 PM, Tony Anecito adanec...@yahoo.com wrote: Hi All, Is nested select supported by Cassandra JDBC driver? So for a simple example to get a list of user details from a users column family: Select * from user_details where user_id in (Select user_id from users) Thanks!-Tony
Re: Is nexted selects supported by Cassandra JDBC??
Yes, denormalization is usually the answer to the absence of sub-queries (and joins for that matter) in Cassandra (though sometimes, simply doing 2 queries is fine, depends on your use case and performance requirements). On Tue, Jun 25, 2013 at 6:46 PM, Tony Anecito adanec...@yahoo.com wrote: Ok. So if I have a composite key table instead of a nested select I will have to run 2 queries else denormalize? Unless there is something provided by CQL 3 to do the same thing? Thanks, -Tony -- *From:* Sylvain Lebresne sylv...@datastax.com *To:* user@cassandra.apache.org user@cassandra.apache.org; Tony Anecito adanec...@yahoo.com *Sent:* Tuesday, June 25, 2013 9:06 AM *Subject:* Re: Is nexted selects supported by Cassandra JDBC?? No. CQL3 doesn't support nested selects. -- Sylvain On Tue, Jun 25, 2013 at 5:02 PM, Tony Anecito adanec...@yahoo.com wrote: Hi All, Is nested select supported by Cassandra JDBC driver? So for a simple example to get a list of user details from a users column family: Select * from user_details where user_id in (Select user_id from users) Thanks! -Tony
Re: [Cassandra] Replacing a cassandra node with one of the same IP
On Mon, Jun 24, 2013 at 8:53 PM, aaron morton aa...@thelastpickle.com wrote: so I am just wondering if this means the hinted handoffs are also updated to reflect the new Cassandra node uuid. Without checking the code I would guess not. Because it would involve a potentially large read / write / delete to create a new row with the same data. And Hinted Handoff is an optimisation. So are hints to a given UUID discarded after some period of time with that UUID not present in the cluster? Or might they need to be manually purged? =Rob
Re: Problems with node rejoining cluster
On Mon, Jun 24, 2013 at 11:19 PM, Arindam Barua aba...@247-inc.com wrote: - We do not specify any tokens in cassandra.yaml relying on bootstrap assigning the tokens automatically. As cassandra.yaml comments state, you should never ever do this in a real cluster. I don't know what is causing your underlying issue, but not-specifying tokens is a strong contender. =Rob
Re: Counter value becomes incorrect after several dozen reads writes
On Mon, Jun 24, 2013 at 6:42 PM, Josh Dzielak j...@keen.io wrote: There is only 1 thread running this sequence, and consistency levels are set to ALL. The behavior is fairly repeatable - the unexpectation mutation will happen at least 10% of the time I run this program, but at different points. When it does not go awry, I can run this loop many thousands of times and keep the counter exact. But if it starts happening to a specific counter, the counter will never recover and will continue to maintain it's incorrect value even after successful subsequent writes. Sounds like a corrupt counter shard. Hard to understand how it can happen at ALL. If I were you I would file a JIRA including your repro path... =Rob
Re: copy data between clusters
On Mon, Jun 24, 2013 at 8:35 PM, S C as...@outlook.com wrote: I have a scenario here. I have a cluster A and cluster B running on cassandra 1.1. I need to copy data from Cluster A to Cluster B. Cluster A has few keyspaces that I need to copy over to Cluster B. What are my options? http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra =Rob
Re: Cassandra terminates with OutOfMemory (OOM) error
Your young gen is 1/4 of 1.8G which is 450MB. Also in slice queries, the co-ordinator will get the results from replicas as per consistency level used and merge the results before returning to the client. What is the replication in your keyspace and what consistency you are reading with. Also 55MB on disk will not mean 55MB in memory. The data is compressed on disk and also there are other overheads. On Mon, Jun 24, 2013 at 8:38 PM, Mohammed Guller moham...@glassbeam.comwrote: No deletes. In my test, I am just writing and reading data. There is a lot of GC, but only on the younger generation. Cassandra terminates before the GC for old generation kicks in. I know that our queries are reading an unusual amount of data. However, I expected it to throw a timeout exception instead of crashing. Also, don't understand why 1.8 Gb heap is getting full when the total data stored in the entire Cassandra cluster is less than 55 MB. Mohammed On Jun 21, 2013, at 7:30 PM, sankalp kohli kohlisank...@gmail.com wrote: Looks like you are putting lot of pressure on the heap by doing a slice query on a large row. Do you have lot of deletes/tombstone on the rows? That might be causing a problem. Also why are you returning so many columns as once, you can use auto paginate feature in Astyanax. Also do you see lot of GC happening? On Fri, Jun 21, 2013 at 1:13 PM, Jabbar Azam aja...@gmail.com wrote: Hello Mohammed, You should increase the heap space. You should also tune the garbage collection so young generation objects are collected faster, relieving pressure on heap We have been using jdk 7 and it uses G1 as the default collector. It does a better job than me trying to optimise the JDK 6 GC collectors. Bear in mind though that the OS will need memory, so will the row cache and the filing system. Although memory usage will depend on the workload of your system. I'm sure you'll also get good advice from other members of the mailing list. Thanks Jabbar Azam On 21 June 2013 18:49, Mohammed Guller moham...@glassbeam.com wrote: We have a 3-node cassandra cluster on AWS. These nodes are running cassandra 1.2.2 and have 8GB memory. We didn't change any of the default heap or GC settings. So each node is allocating 1.8GB of heap space. The rows are wide; each row stores around 260,000 columns. We are reading the data using Astyanax. If our application tries to read 80,000 columns each from 10 or more rows at the same time, some of the nodes run out of heap space and terminate with OOM error. Here is the error message: ** ** java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107)*** * at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60) at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126) at org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132) at org.apache.cassandra.db.Table.getRow(Table.java:355) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) ** ** ERROR 02:14:05,351 Exception in thread Thread[Thrift:6,5,main] java.lang.OutOfMemoryError: Java heap space at java.lang.Long.toString(Long.java:269) at java.lang.Long.toString(Long.java:764) at
Re: Cassandra as storage for cache data
Apart from what Jeremy said, you can try these 1) Use replication = 1. It is cache data and you dont need persistence. 2) Try playing with memtable size. 3) Use netflix client library as it will reduce one hop. It will chose the node with data as the co ordinator. 4) Work on your schema. You might want to have fewer columns in each row. With fatter rows, bloom filter will give out more sstables which are eligible. -Sankalp On Tue, Jun 25, 2013 at 9:04 AM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote: If you have rapidly expiring data, then tombstones are probably filling your disk and your heap (depending on how you order the data on disk). To check to see if your queries are affected by tombstones, you might try using the query tracing that's built-in to 1.2. See: http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets -- has an example of tracing where you can see tombstones affecting the query http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2 You'll want to consider reducing the gc_grace period from the default of 10 days for those column families - with the understanding why gc_grace exists in the first place, see http://wiki.apache.org/cassandra/DistributedDeletes . Then once the gc_grace period has passed, the tombstones will stay around until they are compacted away. So there are two options currently to compact them away more quickly: 1) use leveled compaction - see http://www.datastax.com/dev/blog/when-to-use-leveled-compaction Leveled compaction only requires 10% headroom (as opposed to 50% for size tiered compaction) for amount of disk that needs to be kept free. 2) if 1 doesn't work and you're still seeing performance degrading and the tombstones aren't getting cleared out fast enough, you might consider using size tiered compaction but performing regular major compactions to get rid of expired data. Keep in mind though that if you use gc_grace of 0 and do any kind of manual deletes outside of TTLs, you probably want to do the deletes at ConsistencyLevel.ALL or else if a node goes down, then comes back up, there's a chance that deleted data may be resurrected. That only applies to non-ttl data where you manually delete it. See the explanation of distributed deletes for more information. On 25 Jun 2013, at 13:31, Dmitry Olshansky dmitry.olshan...@gridnine.com wrote: Hello, we are using Cassandra as a data storage for our caching system. Our application generates about 20 put and get requests per second. An average size of one cache item is about 500 Kb. Cache items are placed into one column family with TTL set to 20 - 60 minutes. Keys and values are bytes (not utf8 strings). Compaction strategy is SizeTieredCompactionStrategy. We setup Cassandra 1.2.6 cluster of 4 nodes. Replication factor is 2. Each node has 10GB of RAM and enough space on HDD. Now when we're putting this cluster into the load it's quickly fills with our runtime data (about 5 GB on every node) and we start observing performance degradation with often timeouts on client side. We see that on each node compaction starts very frequently and lasts for several minutes to complete. It seems that each node usually busy with compaction process. Here the questions: What are the recommended setup configuration for our use case? Is it makes sense to somehow tell Cassandra to keep all data in memory (memtables) to eliminate flushing it to disk (sstables) thus decreasing number of compactions? How to achieve this behavior? Cassandra is starting with default shell script that gives the following command line: jsvc.exec -user cassandra -home /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/../ -pidfile /var/run/cassandra.pid -errfile 1 -outfile /var/log/cassandra/output.log -cp CLASSPATH_SKIPPED -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -XX:HeapDumpPath=/var/lib/cassandra/java_1371805844.hprof -XX:ErrorFile=/var/lib/cassandra/hs_err_1371805844.log -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms2500M -Xmx2500M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false org.apache.cassandra.service.CassandraDaemon -- Best regards, Dmitry Olshansky
Re: Counter value becomes incorrect after several dozen reads writes
If you can reproduce the invalid behavior 10+% of the time with steps to repro that take 5-10s/iteration, that sounds extremely interesting for getting to the bottom of the invalid shard issue (if that's what the root cause ends up being). Would be very interested in the set up to see if the behavior can be duplicated. Andrew On Tue, Jun 25, 2013 at 2:18 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Jun 24, 2013 at 6:42 PM, Josh Dzielak j...@keen.io wrote: There is only 1 thread running this sequence, and consistency levels are set to ALL. The behavior is fairly repeatable - the unexpectation mutation will happen at least 10% of the time I run this program, but at different points. When it does not go awry, I can run this loop many thousands of times and keep the counter exact. But if it starts happening to a specific counter, the counter will never recover and will continue to maintain it's incorrect value even after successful subsequent writes. Sounds like a corrupt counter shard. Hard to understand how it can happen at ALL. If I were you I would file a JIRA including your repro path... =Rob
Re: Custom 1.2 Authentication plugin will not work unless user is in system_auth.users column family
Sorry for not following up on this one in time. I filed a JIRA (5651) and it seems user lookup is here to stay. https://issues.apache.org/jira/browse/CASSANDRA-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel On a related note, that column family is, by default, set up to have key cached only. It might be a good idea to have its row cached turned on if row cache is enabled. Bao
RE: copy data between clusters
Bob and Arthur - thanks for your inputs. I tried sstableloader but ran into below issue. Anything to do with the configuration to run sstableloader? sstableloader -d 10.225.64.2,10.225.64.3 service/context INFO 14:43:49,937 Opening service/context/service-context-hf-50 (164863 bytes)DEBUG 14:43:50,063 INDEX LOAD TIME for service/context/service-context-hf-50: 128 ms. INFO 14:43:50,063 Opening service/context/service-context-hf-49 (7688939 bytes)DEBUG 14:43:50,076 INDEX LOAD TIME for service/context/service-context-hf-49: 13 ms. INFO 14:43:50,076 Opening service/context/service-context-hf-51 (6703 bytes)DEBUG 14:43:50,078 INDEX LOAD TIME for service/context/service-context-hf-51: 2 ms.Streaming revelant part of service/context/service-context-hf-50-Data.db service/context/service-context-hf-49-Data.db service/context/service-context-hf-51-Data.db to [/10.225.64.2, /10.225.64.3] INFO 14:43:50,124 Stream context metadata [service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 0%], 3 sstables.DEBUG 14:43:50,124 Adding file service/context/service-context-hf-50-Data.db to be streamed.DEBUG 14:43:50,124 Adding file service/context/service-context-hf-49-Data.db to be streamed.DEBUG 14:43:50,124 Adding file service/context/service-context-hf-51-Data.db to be streamed. INFO 14:43:50,136 Streaming to /10.225.64.2DEBUG 14:43:50,144 Files are service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 0% INFO 14:43:50,159 Stream context metadata [service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 0%], 3 sstables.DEBUG 14:43:50,159 Adding file service/context/service-context-hf-50-Data.db to be streamed.DEBUG 14:43:50,159 Adding file service/context/service-context-hf-49-Data.db to be streamed.DEBUG 14:43:50,160 Adding file service/context/service-context-hf-51-Data.db to be streamed. INFO 14:43:50,160 Streaming to /10.225.64.3DEBUG 14:43:50,160 Files are service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 0% progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 14:43:50,225 Failed attempt 1 to connect to /10.225.64.3 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) WARN 14:43:50,241 Failed attempt 1 to connect to /10.225.64.2 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address)progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 14:43:54,227 Failed attempt 2 to connect to /10.225.64.3 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) WARN 14:43:54,244 Failed attempt 2 to connect to /10.225.64.2 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address)progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 14:44:02,229 Failed attempt 3 to connect to /10.225.64.3 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 16000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) WARN 14:44:02,309 Failed attempt 3 to connect to /10.225.64.2 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 16000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address)progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)]DEBUG 14:44:18,231 closing with status falseStreaming session to /10.225.64.3 failedERROR 14:44:18,236 Error in ThreadPoolExecutorjava.lang.RuntimeException: java.net.SocketException: Invalid argument or cannot assign requested address at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:636) at
what happen if coordinator node fails during write
Hi there, I am writing data to Cassandra by thrift client (not hector) and wonder what happen if the coordinator node fails. The same question applies for bulk loader which uses gossip protocol instead of thrift protocol. In my understanding, the HintedHandoff only takes care of the replica node fails. Thanks. -- Regards, Jiaan
Re: copy data between clusters
Hello SC, whilst most of the sstableloader errors stem from incorrect setups I suspect this time you merely have a connectivity issue e.g. a firewall blocking traffic. From: S C Sent: Tuesday, June 25, 2013 5:28 PM To: user@cassandra.apache.org Subject: RE: copy data between clusters Bob and Arthur - thanks for your inputs. I tried sstableloader but ran into below issue. Anything to do with the configuration to run sstableloader? sstableloader -d 10.225.64.2,10.225.64.3 service/context INFO 14:43:49,937 Opening service/context/service-context-hf-50 (164863 bytes) DEBUG 14:43:50,063 INDEX LOAD TIME for service/context/service-context-hf-50: 128 ms. INFO 14:43:50,063 Opening service/context/service-context-hf-49 (7688939 bytes) DEBUG 14:43:50,076 INDEX LOAD TIME for service/context/service-context-hf-49: 13 ms. INFO 14:43:50,076 Opening service/context/service-context-hf-51 (6703 bytes) DEBUG 14:43:50,078 INDEX LOAD TIME for service/context/service-context-hf-51: 2 ms. Streaming revelant part of service/context/service-context-hf-50-Data.db service/context/service-context-hf-49-Data.db service/context/service-context-hf-51-Data.db to [/10.225.64.2, /10.225.64.3] INFO 14:43:50,124 Stream context metadata [service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 0%], 3 sstables. DEBUG 14:43:50,124 Adding file service/context/service-context-hf-50-Data.db to be streamed. DEBUG 14:43:50,124 Adding file service/context/service-context-hf-49-Data.db to be streamed. DEBUG 14:43:50,124 Adding file service/context/service-context-hf-51-Data.db to be streamed. INFO 14:43:50,136 Streaming to /10.225.64.2 DEBUG 14:43:50,144 Files are service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 0% INFO 14:43:50,159 Stream context metadata [service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 0%], 3 sstables. DEBUG 14:43:50,159 Adding file service/context/service-context-hf-50-Data.db to be streamed. DEBUG 14:43:50,159 Adding file service/context/service-context-hf-49-Data.db to be streamed. DEBUG 14:43:50,160 Adding file service/context/service-context-hf-51-Data.db to be streamed. INFO 14:43:50,160 Streaming to /10.225.64.3 DEBUG 14:43:50,160 Files are service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 0% progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 14:43:50,225 Failed attempt 1 to connect to /10.225.64.3 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) WARN 14:43:50,241 Failed attempt 1 to connect to /10.225.64.2 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 14:43:54,227 Failed attempt 2 to connect to /10.225.64.3 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) WARN 14:43:54,244 Failed attempt 2 to connect to /10.225.64.2 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 14:44:02,229 Failed attempt 3 to connect to /10.225.64.3 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 16000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) WARN 14:44:02,309 Failed attempt 3 to connect to /10.225.64.2 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 16000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)]DEBUG 14:44:18,231 closing with status false Streaming session to /10.225.64.3 failed ERROR
Re: what happen if coordinator node fails during write
It depends on cassandra version. As far as I know in 1.2 coordinator logs request before it updates replicas. If it fails it will replay log on startup. In 1.1 you may have inconsistant state, because only part of your request is propagated to replicas. Thank you, Andrey On Tue, Jun 25, 2013 at 5:11 PM, Jiaan Zeng ji...@bloomreach.com wrote: Hi there, I am writing data to Cassandra by thrift client (not hector) and wonder what happen if the coordinator node fails. The same question applies for bulk loader which uses gossip protocol instead of thrift protocol. In my understanding, the HintedHandoff only takes care of the replica node fails. Thanks. -- Regards, Jiaan
Re: Date range queries
You could just separate the history data from the current data. Then when the user's result is updated, just write into two tables. CREATE TABLE all_answers ( user_id uuid, created timeuuid, result text, question_id varint, PRIMARY KEY (user_id, created) ) CREATE TABLE current_answers ( user_id uuid, question_id varint, created timeuuid, result text, PRIMARY KEY (user_id, question_id) ) select * FROM current_answers ; user_id | question_id | result | created --+-++-- 11b1e59c-ddfa-11e2-a28f-0800200c9a66 | 1 | no | f9893ee0-ddfa-11e2-b74c-35d7be46b354 11b1e59c-ddfa-11e2-a28f-0800200c9a66 | 2 | blah | f7af75d0-ddfa-11e2-b74c-35d7be46b354 select * FROM all_answers ; user_id | created | question_id | result --+--+-+ 11b1e59c-ddfa-11e2-a28f-0800200c9a66 | f0141234-ddfa-11e2-b74c-35d7be46b354 | 1 |yes 11b1e59c-ddfa-11e2-a28f-0800200c9a66 | f7af75d0-ddfa-11e2-b74c-35d7be46b354 | 2 | blah 11b1e59c-ddfa-11e2-a28f-0800200c9a66 | f9893ee0-ddfa-11e2-b74c-35d7be46b354 | 1 | no This way you can get the history of answers if you want and there is a simple way to get the most current answers. Just a thought. -Colin B. On 06/24/2013 03:28 PM, Christopher J. Bottaro wrote: Yes, that makes sense and that article helped a lot, but I still have a few questions... The created_at in our answers table is basically used as a version id. When a user updates his answer, we don't overwrite the old answer, but rather insert a new answer with a more recent timestamp (the version). answers --- user_id | created_at | question_id | result --- 1 | 2013-01-01 | 1 | yes 1 | 2013-01-01 | 2 | blah 1 | 2013-01-02 | 1 | no So the queries we really want to run are find me all the answers for a given user at a given time. So given the date of 2013-01-02 and user_id 1, we would want rows 2 and 3 returned (since rows 3 obsoletes row 1). Is it possible to do this with CQL given the current schema? As an aside, we can do this in Postgresql using window functions, not standard SQL, but pretty neat. We can alter our schema like so... answers --- user_id | start_at | end_at | question_id | result Where the start_at and end_at denote when an answer is active. So the example above would become: answers --- user_id | start_at | end_at | question_id | result 1 | 2013-01-01 | 2013-01-02 | 1 | yes 1 | 2013-01-01 | null | 2 | blah 1 | 2013-01-02 | null | 1 | no Now we can query SELECT * FROM answers WHERE user_id = 1 AND start_at = '2013-01-02' AND (end_at '2013-01-02' OR end_at IS NULL). How would one define the partitioning key and cluster columns in CQL to accomplish this? Is it as simple as PRIMARY KEY (user_id, start_at, end_at, question_id) (remembering that we sometimes want to limit by question_id)? Also, we are a bit worried about race conditions. Consider two separate processes updating an answer for a given user_id / question_id. There will be a race condition between the two to update the correct row's end_at field. Does that make sense? I can draw it out with ASCII tables, but I feel like this email is already too long... :P Thanks for the help. On Wed, Jun 19, 2013 at 2:28 PM, David McNelis dmcne...@gmail.com mailto:dmcne...@gmail.com wrote: So, if you want to grab by the created_at and occasionally limit by question id, that is why you'd use created_at. The way the primary keys work is the first part of the primary key is the Partioner key, that field is what essentially is the single cassandra row. The second key is the order preserving key, so you can sort by that key. If you have a third piece, then that is the secondary order preserving key. The reason you'd want to do (user_id, created_at, question_id) is because when you do a query on the keys, if you MUST use the preceding pieces of the primary key. So in your case, you could not do a query with just user_id and question_id with the user-created-question key. Alternatively if you went with (user_id, question_id, created_at), you would not be able to include a range of created_at unless you were also filtering on the question_id. Does that make sense? As for the large rows, 10k is unlikely to cause you too many issues (unless the answer is potentially a big blob of text). Newer versions of cassandra deal with a lot of things in
Re: what happen if coordinator node fails during write
Read this http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2 On Tue, Jun 25, 2013 at 8:45 PM, Andrey Ilinykh ailin...@gmail.com wrote: It depends on cassandra version. As far as I know in 1.2 coordinator logs request before it updates replicas. If it fails it will replay log on startup. In 1.1 you may have inconsistant state, because only part of your request is propagated to replicas. Thank you, Andrey On Tue, Jun 25, 2013 at 5:11 PM, Jiaan Zeng ji...@bloomreach.com wrote: Hi there, I am writing data to Cassandra by thrift client (not hector) and wonder what happen if the coordinator node fails. The same question applies for bulk loader which uses gossip protocol instead of thrift protocol. In my understanding, the HintedHandoff only takes care of the replica node fails. Thanks. -- Regards, Jiaan
RE: copy data between clusters
Is there any configuration reference that help me? Thanks,SC From: arthur.zuba...@aol.com To: user@cassandra.apache.org Subject: Re: copy data between clusters Date: Tue, 25 Jun 2013 20:30:23 -0400 Hello SC, whilst most of the sstableloader errors stem from incorrect setups I suspect this time you merely have a connectivity issue e.g. a firewall blocking traffic. From: S C Sent: Tuesday, June 25, 2013 5:28 PM To: user@cassandra.apache.org Subject: RE: copy data between clusters Bob and Arthur - thanks for your inputs. I tried sstableloader but ran into below issue. Anything to do with the configuration to run sstableloader? sstableloader -d 10.225.64.2,10.225.64.3 service/context INFO 14:43:49,937 Opening service/context/service-context-hf-50 (164863 bytes) DEBUG 14:43:50,063 INDEX LOAD TIME for service/context/service-context-hf-50: 128 ms. INFO 14:43:50,063 Opening service/context/service-context-hf-49 (7688939 bytes) DEBUG 14:43:50,076 INDEX LOAD TIME for service/context/service-context-hf-49: 13 ms. INFO 14:43:50,076 Opening service/context/service-context-hf-51 (6703 bytes) DEBUG 14:43:50,078 INDEX LOAD TIME for service/context/service-context-hf-51: 2 ms. Streaming revelant part of service/context/service-context-hf-50-Data.db service/context/service-context-hf-49-Data.db service/context/service-context-hf-51-Data.db to [/10.225.64.2, /10.225.64.3] INFO 14:43:50,124 Stream context metadata [service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 0%], 3 sstables. DEBUG 14:43:50,124 Adding file service/context/service-context-hf-50-Data.db to be streamed. DEBUG 14:43:50,124 Adding file service/context/service-context-hf-49-Data.db to be streamed. DEBUG 14:43:50,124 Adding file service/context/service-context-hf-51-Data.db to be streamed. INFO 14:43:50,136 Streaming to /10.225.64.2 DEBUG 14:43:50,144 Files are service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 0% INFO 14:43:50,159 Stream context metadata [service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 0%], 3 sstables. DEBUG 14:43:50,159 Adding file service/context/service-context-hf-50-Data.db to be streamed. DEBUG 14:43:50,159 Adding file service/context/service-context-hf-49-Data.db to be streamed. DEBUG 14:43:50,160 Adding file service/context/service-context-hf-51-Data.db to be streamed. INFO 14:43:50,160 Streaming to /10.225.64.3 DEBUG 14:43:50,160 Files are service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 0% progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 14:43:50,225 Failed attempt 1 to connect to /10.225.64.3 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) WARN 14:43:50,241 Failed attempt 1 to connect to /10.225.64.2 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 14:43:54,227 Failed attempt 2 to connect to /10.225.64.3 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) WARN 14:43:54,244 Failed attempt 2 to connect to /10.225.64.2 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 14:44:02,229 Failed attempt 3 to connect to /10.225.64.3 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 16000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) WARN 14:44:02,309 Failed attempt 3 to connect to /10.225.64.2 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 16000 ms. (java.net.SocketException: Invalid
Re: copy data between clusters
This is the best reference I have seen so far http://www.datastax.com/dev/blog/bulk-loading But I must tell it is not updated to match the most recent changes in C*. I suggest you read thru comments, too. From: S C Sent: Tuesday, June 25, 2013 10:23 PM To: user@cassandra.apache.org Subject: RE: copy data between clusters Is there any configuration reference that help me? Thanks, SC From: arthur.zuba...@aol.com To: user@cassandra.apache.org Subject: Re: copy data between clusters Date: Tue, 25 Jun 2013 20:30:23 -0400 Hello SC, whilst most of the sstableloader errors stem from incorrect setups I suspect this time you merely have a connectivity issue e.g. a firewall blocking traffic. From: S C Sent: Tuesday, June 25, 2013 5:28 PM To: user@cassandra.apache.org Subject: RE: copy data between clusters Bob and Arthur - thanks for your inputs. I tried sstableloader but ran into below issue. Anything to do with the configuration to run sstableloader? sstableloader -d 10.225.64.2,10.225.64.3 service/context INFO 14:43:49,937 Opening service/context/service-context-hf-50 (164863 bytes) DEBUG 14:43:50,063 INDEX LOAD TIME for service/context/service-context-hf-50: 128 ms. INFO 14:43:50,063 Opening service/context/service-context-hf-49 (7688939 bytes) DEBUG 14:43:50,076 INDEX LOAD TIME for service/context/service-context-hf-49: 13 ms. INFO 14:43:50,076 Opening service/context/service-context-hf-51 (6703 bytes) DEBUG 14:43:50,078 INDEX LOAD TIME for service/context/service-context-hf-51: 2 ms. Streaming revelant part of service/context/service-context-hf-50-Data.db service/context/service-context-hf-49-Data.db service/context/service-context-hf-51-Data.db to [/10.225.64.2, /10.225.64.3] INFO 14:43:50,124 Stream context metadata [service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 0%], 3 sstables. DEBUG 14:43:50,124 Adding file service/context/service-context-hf-50-Data.db to be streamed. DEBUG 14:43:50,124 Adding file service/context/service-context-hf-49-Data.db to be streamed. DEBUG 14:43:50,124 Adding file service/context/service-context-hf-51-Data.db to be streamed. INFO 14:43:50,136 Streaming to /10.225.64.2 DEBUG 14:43:50,144 Files are service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 0% INFO 14:43:50,159 Stream context metadata [service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 0%], 3 sstables. DEBUG 14:43:50,159 Adding file service/context/service-context-hf-50-Data.db to be streamed. DEBUG 14:43:50,159 Adding file service/context/service-context-hf-49-Data.db to be streamed. DEBUG 14:43:50,160 Adding file service/context/service-context-hf-51-Data.db to be streamed. INFO 14:43:50,160 Streaming to /10.225.64.3 DEBUG 14:43:50,160 Files are service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 0% progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 14:43:50,225 Failed attempt 1 to connect to /10.225.64.3 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) WARN 14:43:50,241 Failed attempt 1 to connect to /10.225.64.2 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 14:43:54,227 Failed attempt 2 to connect to /10.225.64.3 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) WARN 14:43:54,244 Failed attempt 2 to connect to /10.225.64.2 to stream service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot assign requested address) progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 14:44:02,229 Failed attempt 3 to connect to /10.225.64.3 to stream
Re: Cassandra terminates with OutOfMemory (OOM) error
Replication is 3 and read consistency level is one. One of the non-cordinator mode is crashing, so the OOM is happening before aggregation of the data to be returned. Thanks for the info about the space allocated to young generation heap. That is helpful. Mohammed On Jun 25, 2013, at 1:28 PM, sankalp kohli kohlisank...@gmail.commailto:kohlisank...@gmail.com wrote: Your young gen is 1/4 of 1.8G which is 450MB. Also in slice queries, the co-ordinator will get the results from replicas as per consistency level used and merge the results before returning to the client. What is the replication in your keyspace and what consistency you are reading with. Also 55MB on disk will not mean 55MB in memory. The data is compressed on disk and also there are other overheads. On Mon, Jun 24, 2013 at 8:38 PM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: No deletes. In my test, I am just writing and reading data. There is a lot of GC, but only on the younger generation. Cassandra terminates before the GC for old generation kicks in. I know that our queries are reading an unusual amount of data. However, I expected it to throw a timeout exception instead of crashing. Also, don't understand why 1.8 Gb heap is getting full when the total data stored in the entire Cassandra cluster is less than 55 MB. Mohammed On Jun 21, 2013, at 7:30 PM, sankalp kohli kohlisank...@gmail.commailto:kohlisank...@gmail.com wrote: Looks like you are putting lot of pressure on the heap by doing a slice query on a large row. Do you have lot of deletes/tombstone on the rows? That might be causing a problem. Also why are you returning so many columns as once, you can use auto paginate feature in Astyanax. Also do you see lot of GC happening? On Fri, Jun 21, 2013 at 1:13 PM, Jabbar Azam aja...@gmail.commailto:aja...@gmail.com wrote: Hello Mohammed, You should increase the heap space. You should also tune the garbage collection so young generation objects are collected faster, relieving pressure on heap We have been using jdk 7 and it uses G1 as the default collector. It does a better job than me trying to optimise the JDK 6 GC collectors. Bear in mind though that the OS will need memory, so will the row cache and the filing system. Although memory usage will depend on the workload of your system. I'm sure you'll also get good advice from other members of the mailing list. Thanks Jabbar Azam On 21 June 2013 18:49, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: We have a 3-node cassandra cluster on AWS. These nodes are running cassandra 1.2.2 and have 8GB memory. We didn't change any of the default heap or GC settings. So each node is allocating 1.8GB of heap space. The rows are wide; each row stores around 260,000 columns. We are reading the data using Astyanax. If our application tries to read 80,000 columns each from 10 or more rows at the same time, some of the nodes run out of heap space and terminate with OOM error. Here is the error message: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60) at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126) at org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132) at org.apache.cassandra.db.Table.getRow(Table.java:355) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at
Re: Heap is not released and streaming hangs at 0%
bloom_filter_fp_chance value that was changed from default to 0.1, looked at the filters and they are about 2.5G on disk and I have around 8G of heap. I will try increasing the value to 0.7 and report my results. You need to re-write the sstables on disk using nodetool upgradesstables. Otherwise only the new tables with have the 0.1 setting. I will try increasing the value to 0.7 and report my results. No need to, it will probably be something like Oh no, really, what, how, please make it stop :) 0.7 will mean reads will hit most / all of the SSTables for the CF. I covered a high row situation in on of my talks at the summit this month, the slide deck is here http://www.slideshare.net/aaronmorton/cassandra-sf-2013-in-case-of-emergency-break-glass and the videos will soon be up at Planet Cassandra. Rebuild the sstables, then reduce the index_interval if you still need to reduce mem pressure. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 22/06/2013, at 1:17 PM, sankalp kohli kohlisank...@gmail.com wrote: I will take a heap dump and see whats in there rather than guessing. On Fri, Jun 21, 2013 at 4:12 PM, Bryan Talbot btal...@aeriagames.com wrote: bloom_filter_fp_chance = 0.7 is probably way too large to be effective and you'll probably have issues compacting deleted rows and get poor read performance with a value that high. I'd guess that anything larger than 0.1 might as well be 1.0. -Bryan On Fri, Jun 21, 2013 at 5:58 AM, srmore comom...@gmail.com wrote: On Fri, Jun 21, 2013 at 2:53 AM, aaron morton aa...@thelastpickle.com wrote: nodetool -h localhost flush didn't do much good. Do you have 100's of millions of rows ? If so see recent discussions about reducing the bloom_filter_fp_chance and index_sampling. Yes, I have 100's of millions of rows. If this is an old schema you may be using the very old setting of 0.000744 which creates a lot of bloom filters. bloom_filter_fp_chance value that was changed from default to 0.1, looked at the filters and they are about 2.5G on disk and I have around 8G of heap. I will try increasing the value to 0.7 and report my results. It also appears to be a case of hard GC failure (as Rob mentioned) as the heap is never released, even after 24+ hours of idle time, the JVM needs to be restarted to reclaim the heap. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 20/06/2013, at 6:36 AM, Wei Zhu wz1...@yahoo.com wrote: If you want, you can try to force the GC through Jconsole. Memory-Perform GC. It theoretically triggers a full GC and when it will happen depends on the JVM -Wei From: Robert Coli rc...@eventbrite.com To: user@cassandra.apache.org Sent: Tuesday, June 18, 2013 10:43:13 AM Subject: Re: Heap is not released and streaming hangs at 0% On Tue, Jun 18, 2013 at 10:33 AM, srmore comom...@gmail.com wrote: But then shouldn't JVM C G it eventually ? I can still see Cassandra alive and kicking but looks like the heap is locked up even after the traffic is long stopped. No, when GC system fails this hard it is often a permanent failure which requires a restart of the JVM. nodetool -h localhost flush didn't do much good. This adds support to the idea that your heap is too full, and not full of memtables. You could try nodetool -h localhost invalidatekeycache, but that probably will not free enough memory to help you. =Rob
Re: Cassandra 1.0.9 Performance
serving a load of approximately 600GB is that 600GB in the cluster or 600GB per node ? In pre 1.2 days we recommend around 300GB to 500GB per node with spinning disks and 1Gbe networking. It's a soft rule of thumb not a hard rule. Above that size repair and replacing a failed node can take a long time. Does anyone have CPU/memory/network graphs (e.g. Cacti) over the last 1-2 months they are willing to share of their Cassandra database nodes? If you can share yours and any specific concerns you may have we may be able to help. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 24/06/2013, at 1:14 PM, G man gmanli...@gmail.com wrote: Hi All, We are running a 1.0.9 cluster with 3 nodes (RF=3) serving a load of approximately 600GB, and since I am fairly new to Cassandra, I'd like to compare notes with other people running a cluster of similar size (perhaps not in the amount of data, but the number of nodes). Does anyone have CPU/memory/network graphs (e.g. Cacti) over the last 1-2 months they are willing to share of their Cassandra database nodes? Just trying to compare our patterns with others to see if they are normal. Thanks in advance. G
Re: about FlushWriter All time blocked
FlushWriter 0 0191 0 12 This means there were 12 times the code wanted to put an memtable in the queue to be flushed to disk but the queue was full. The length of this queue is controlled by the memtable_flush_queue_size https://github.com/apache/cassandra/blob/cassandra-1.2/conf/cassandra.yaml#L299 and memtable_flush_writers . When this happens an internal lock around the commit log is held which prevents writes from being processed. In general it means the IO system cannot keep up. It can sometimes happen when snapshot is used as all the CF's are flushed to disk at once. I also suspect it happens sometimes when a commit log segment is flushed and their are a lot of dirty CF's. But i've never proved it. Increase memtable_flush_queue_size following the help in the yaml file. If you do not use secondary indexes are you using snapshot? Hope that helps. A - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 24/06/2013, at 3:41 PM, yue.zhang yue.zh...@chinacache.com wrote: 3 node cent os CPU 8core memory 32GB cassandra 1.2.5 my scenario: many counter incr, every node has one client program, performance is 400 wps /every clicent (it’s so slowly) my question: Ø nodetool tpstats - Pool NameActive Pending Completed Blocked All time blocked ReadStage 0 0 8453 0 0 RequestResponseStage 0 0 138303982 0 0 MutationStage 0 0 172002988 0 0 ReadRepairStage 0 0 0 0 0 ReplicateOnWriteStage 0 0 82246354 0 0 GossipStage 0 01052389 0 0 AntiEntropyStage 0 0 0 0 0 MigrationStage0 0 0 0 0 MemtablePostFlusher 0 0670 0 0 FlushWriter 0 0191 0 12 MiscStage 0 0 0 0 0 commitlog_archiver0 0 0 0 0 InternalResponseStage 0 0 0 0 0 HintedHandoff 0 0 56 0 0 --- FlushWriter “All time blocked”=12,I restart the node,but no use,it’s normally ? thx -heipark