Re: Is there any way to fetch all data efficiently from a column family?
Thanks Michael. I will make a benchmark using Hadoop Map/Reduce(example...) in our cluster. and any valuable information I will let you know. :) Best, On Wed, Jan 30, 2013 at 2:39 PM, Michael Kjellman mkjell...@barracuda.comwrote: And finally to make wide rows with C* and Hadoop even better, these problems have already been solved by tickets such as (not inclusive): https://issues.apache.org/jira/browse/CASSANDRA-3264 https://issues.apache.org/jira/browse/CASSANDRA-2878 And a nice more updated doc from the 1.1 branch from Datastax: http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration From: Michael Kjellman mkjell...@barracuda.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Tuesday, January 29, 2013 10:36 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Is there any way to fetch all data efficiently from a column family? Yes, wide rows, but doesn't seem horrible by any means. People have gotten by with Thrift for many many years in the community. If you are running this once a day doesn't sound like latency should be a major concern and I doubt the proto is going to be your primary bottleneck. To answer your question about describing pig: http://pig.apache.org -- Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. Pretty much, pig lets you write in Pig Latin to create Map-Reduce programs without writing an actual java Map Reduce program. Here is a really old wiki article that really needs to be updated about the various Hadoop support built into C*: http://wiki.apache.org/cassandra/HadoopSupport On your last point, compaction deals with tombstones yes but generally you only run minor compactions. A major compaction says, take every sstable for this cf and make one MASSIVE sstable from all the little sstables. This is different than standard C* operations. Map/Reduce doesn't purge anything and has nothing to do with compactions. It is just a somewhat sane idea I thought of to let you iterate over a large amount of data stored in C*, and conveniently C* provides Input and Output formats to Hadoop so you can do fun things like iterate over 500w rows with 1k columns each. Honestly, the best thing you can do is benchmark Hadoop and see how it will work for your work load and specific project requirements. Best, Michael From: dong.yajun dongt...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Tuesday, January 29, 2013 10:11 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Is there any way to fetch all data efficiently from a column family? Thanks Michael. * * * *How many rows in your column families? abort 500w rows, each row has abort 1k data. How often do you need to do this? once a day. example Hadoop map/reduce jobs in the examples folder thanks, I have saw the source code, it uses the *Thrift API* as the recordReader to interate the rows, I don't think it's a high performance method. you could look into Pig could you please describe more details in Pig? So avoid that unless you really know what you're doing which is what ... the step is to purge the bombstones, another option is using the map/reduce job to do the purging things without major compactions. Best Rick. On Wed, Jan 30, 2013 at 1:15 PM, Michael Kjellman mkjell...@barracuda.com wrote: How often do you need to do this? How many rows in your column families? If it's not a frequent operation you can just page the data n number of rows at a time using nothing special but C* and a driver. Or another option is you can write a map/reduce job if you need an entire cf to be an input if you only need one cf to be your input. There are example Hadoop map/reduce jobs in the examples folder included with Cassandra. Or if you don't want to write a M/R job you could look into Pig. Your method sounds a bit crazy IMHO and I'd definitely recommend against it. Better to let the database (C*) do it's thing. If you're super worried about more than 1 sstable you can do major compactions but that's not recommended as it will take a while to get a new sstable big enough to merge with the other big sstable. So avoid that unless you really know what you're doing which is what it sounds like your proposing in point 3 ;) From: dong.yajun dongt...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Tuesday, January 29, 2013 9:02 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Is there any way to fetch all data efficiently from a column family? hey List, I consider a way that can read
how RandomPartitioner calculate tokens
Hi, As per the Datastax Cassandra Documentation 1.2, for single data center deployments, tokens are calculated by dividing the hash range by the number of nodes in the cluster, *does it mean we have to recalculate the tokens of keys when nodes come and go?** * for multiple data center deployments, tokens are calculated per data center so that the hash range is evenly divide for the nodes in each data center. *This is understandable, but when I go to the getToken method of RandomPartitioner, I can't find any datacenter-aware token calculation* *codes. By the way, the documentation doesn't mention how Murmur3Partitioner calculate tokens for multiple data center. Assuming it doesn't calculate tokens per data center, what difference between Murmur3Partitioner and RandomPartitioner has made that unnecessary? *Thanks. * *Manu Zhang* *
Re: how RandomPartitioner calculate tokens
I'll admit that this part of the DataStax documentation is a bit confusing (and I'll reach to the doc writers to make sure this is improved). The partitioner (being it RandomPartitioner, Murmur3Partitioner or OrderPreservingPartitioner) is pretty much only a hash function that defines how to compute the token (it's hash) of a key. In particular, the partitioner has no notion whatsoever of data centers and more generally does not depend in any way of how many nodes you have. However, for actually distribute data, each node is assigned a token (or multiple ones with vnodes). Getting an even distribution of data depends on the exact token picked for your nodes. Now, the sentences of the doc you cite actually refer to how to calculate the tokens you assign to nodes. In particular, what it describes is pretty much what the small token-generator tool that comes with Cassandra (http://goo.gl/rwea9) does, but is not something Cassandra itself actually does. Also, that procedure to compute token is pretty much the same for RandomPartitioner and Murmur3Partitioner, except that the token range for both partitioner is not exactly the same. And as a side note, if you use vnodes, you don't really have to bother about manually assigning tokens for nodes. -- Sylvain On Wed, Jan 30, 2013 at 9:22 AM, Manu Zhang owenzhang1...@gmail.com wrote: Hi, As per the Datastax Cassandra Documentation 1.2, for single data center deployments, tokens are calculated by dividing the hash range by the number of nodes in the cluster, *does it mean we have to recalculate the tokens of keys when nodes come and go?** * for multiple data center deployments, tokens are calculated per data center so that the hash range is evenly divide for the nodes in each data center. *This is understandable, but when I go to the getToken method of RandomPartitioner, I can't find any datacenter-aware token calculation* *codes. By the way, the documentation doesn't mention how Murmur3Partitioner calculate tokens for multiple data center. Assuming it doesn't calculate tokens per data center, what difference between Murmur3Partitioner and RandomPartitioner has made that unnecessary? *Thanks. * *Manu Zhang* *
Re: JDBC, Select * Cql2 vs Cql3 problem ?
Well this is getting stranger, for me with this simple table definition, select key,gender from users is also failing with a null pointer exception Andy On 29 Jan 2013, at 13:50, Andy Cobley acob...@computing.dundee.ac.uk wrote: When connecting to Cassandra 1.2.0 from CQLSH the table was created with: CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1}; cqlsh use test; cqlsh:test create columnfamily users (KEY varchar Primary key, password varchar, gender varchar) ; cqlsh:test INSERT INTO users (KEY, password) VALUES ('jsmith', 'ch@ngem3a'); cqlsh:test INSERT INTO users (KEY, gender) VALUES ('jbrown', 'male'); stack trace (generated et.printStackTrace()) is: Can not execute statement java.lang.NullPointerException at org.apache.cassandra.cql.jdbc.TypedColumn.init(TypedColumn.java:45) at org.apache.cassandra.cql.jdbc.CassandraResultSet.createColumn(CassandraResultSet.java:972) at org.apache.cassandra.cql.jdbc.CassandraResultSet.populateColumns(CassandraResultSet.java:156) at org.apache.cassandra.cql.jdbc.CassandraResultSet.init(CassandraResultSet.java:130) at org.apache.cassandra.cql.jdbc.CassandraStatement.doExecute(CassandraStatement.java:167) at org.apache.cassandra.cql.jdbc.CassandraStatement.executeQuery(CassandraStatement.java:227) at uk.ac.dundee.computing.aec.test.test.doGet(test.java:51) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at javax.servlet.http.HttpServlet.service(HttpServlet.java:728) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Hope that helps ! Andy On 29 Jan 2013, at 07:17, aaron morton aa...@thelastpickle.com wrote: What is your table spec ? Do you have the full stack trace from the exception ? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 29/01/2013, at 8:15 AM, Andy Cobley acob...@computing.dundee.ac.uk wrote: I have the following code in my app using the JDBC (cassandra-jdbc-1.1.2.jar) drivers to CQL: try { rs= stmt.executeQuery(SELECT * FROM users); }catch(Exception et){ System.out.println(Can not execute statement +et); } When connecting to a CQL2 server (cassandra 1.1.5) the code works as expected returning a result set . When connecting to CQL3 (Cassandra 1.2) I catch the following exception: Can not execute statement java.lang.NullPointerException The Select statement (Select * from users) does work from CQLSH as expected. Is there a problem with my code or something else ? Andy C School of Computing University of Dundee. The University of Dundee is a Scottish Registered Charity, No. SC015096. The University of Dundee is a Scottish Registered Charity, No. SC015096. The University of Dundee is a Scottish Registered Charity, No. SC015096.
Re: how RandomPartitioner calculate tokens
On Wed 30 Jan 2013 05:47:59 PM CST, Sylvain Lebresne wrote: I'll admit that this part of the DataStax documentation is a bit confusing (and I'll reach to the doc writers to make sure this is improved). The partitioner (being it RandomPartitioner, Murmur3Partitioner or OrderPreservingPartitioner) is pretty much only a hash function that defines how to compute the token (it's hash) of a key. In particular, the partitioner has no notion whatsoever of data centers and more generally does not depend in any way of how many nodes you have. However, for actually distribute data, each node is assigned a token (or multiple ones with vnodes). Getting an even distribution of data depends on the exact token picked for your nodes. Now, the sentences of the doc you cite actually refer to how to calculate the tokens you assign to nodes. In particular, what it describes is pretty much what the small token-generator tool that comes with Cassandra (http://goo.gl/rwea9) does, but is not something Cassandra itself actually does. Also, that procedure to compute token is pretty much the same for RandomPartitioner and Murmur3Partitioner, except that the token range for both partitioner is not exactly the same. And as a side note, if you use vnodes, you don't really have to bother about manually assigning tokens for nodes. -- Sylvain On Wed, Jan 30, 2013 at 9:22 AM, Manu Zhang owenzhang1...@gmail.com mailto:owenzhang1...@gmail.com wrote: Hi, As per the Datastax Cassandra Documentation 1.2, for single data center deployments, tokens are calculated by dividing the hash range by the number of nodes in the cluster, *does it mean we have to recalculate the tokens of keys when nodes come and go?** * for multiple data center deployments, tokens are calculated per data center so that the hash range is evenly divide for the nodes in each data center. *This is understandable, but when I go to the getToken method of RandomPartitioner, I can't find any datacenter-aware token calculation* *codes. By the way, the documentation doesn't mention how Murmur3Partitioner calculate tokens for multiple data center. Assuming it doesn't calculate tokens per data center, what difference between Murmur3Partitioner and RandomPartitioner has made that unnecessary? *Thanks. * *Manu Zhang* * Thanks Sylvain, it's all clear now.
Multiple Data Center Clusters on Cassandra
Hi, I am running 3 nodes cassandra cluster with replica factor 2 in one DC. Now I need to run multiple data center clusters with cassandra and I have following queries; 1. I want to replicate whole data on another DC and after that both DC's nodes should have complete Data. In which topology is it possible ? 2. If I need backup, what's the command of cluster screen shot? 3. I will use internet connection with VPN facility for traffic and in case disconnection what will happen? Regards, Adeel
RE: cryptic exception in Hadoop/Cassandra job
Hi Brian, Which version of cassandra are you using? And are you using the BOF to write to Cassandra? Kind regards, Pieter -Original Message- From: Brian Jeltema [mailto:brian.jelt...@digitalenvoy.net] Sent: woensdag 30 januari 2013 13:20 To: user@cassandra.apache.org Subject: cryptic exception in Hadoop/Cassandra job I have a Hadoop/Cassandra map/reduce job that performs a simple transformation on a table with very roughly 1 billion columns spread across roughly 4 million rows. During reduction, I see a relative handful of the following: Exception in thread Streaming to /10.4.0.3:1 java.lang.RuntimeException: java.io.EOFException at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:194) at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:104) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more which ultimately leads to job failure. I can't tell if this is a bug in my code or in the underlying framework. Does anyone have suggestions on how to debug this? TIA Brian
Re: Multiple Data Center Clusters on Cassandra
1. I want to replicate whole data on another DC and after that both DC's nodes should have complete Data. In which topology is it possible ? I think NetworkTopology is best suited for such configuration, You may want to use nodetool to generate token accordingly. 2. If I need backup, what's the command of cluster screen shot? You can always create a snapshot/backup files and later can use them for restoration. {http://www.datastax.com/docs/1.0/operations/backup_restore} 3. I will use internet connection with VPN facility for traffic and in case disconnection what will happen? based on configuration, (e.g. hinted-handoff,consistency level, read-repair), it should be fine. On Wed, Jan 30, 2013 at 5:58 PM, adeel.ak...@panasiangroup.com wrote: Hi, I am running 3 nodes cassandra cluster with replica factor 2 in one DC. Now I need to run multiple data center clusters with cassandra and I have following queries; 1. I want to replicate whole data on another DC and after that both DC's nodes should have complete Data. In which topology is it possible ? 2. If I need backup, what's the command of cluster screen shot? 3. I will use internet connection with VPN facility for traffic and in case disconnection what will happen? Regards, Adeel
Re: cryptic exception in Hadoop/Cassandra job
Cassandra 1.1.5, using BulkOutputFormat Brian On Jan 30, 2013, at 7:39 AM, Pieter Callewaert wrote: Hi Brian, Which version of cassandra are you using? And are you using the BOF to write to Cassandra? Kind regards, Pieter -Original Message- From: Brian Jeltema [mailto:brian.jelt...@digitalenvoy.net] Sent: woensdag 30 januari 2013 13:20 To: user@cassandra.apache.org Subject: cryptic exception in Hadoop/Cassandra job I have a Hadoop/Cassandra map/reduce job that performs a simple transformation on a table with very roughly 1 billion columns spread across roughly 4 million rows. During reduction, I see a relative handful of the following: Exception in thread Streaming to /10.4.0.3:1 java.lang.RuntimeException: java.io.EOFException at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:194) at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:104) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more which ultimately leads to job failure. I can't tell if this is a bug in my code or in the underlying framework. Does anyone have suggestions on how to debug this? TIA Brian
Re: Start token sorts after end token
This was unexpected fallout fro the change to murmur partitioner. A jira is open but if you need map red murmers is currently out of the question. On Wednesday, January 30, 2013, Tejas Patil tejas.patil...@gmail.com wrote: While reading data from Cassandra in map-reduce, I am getting InvalidRequestException(why:Start token sorts after end token) Below is the code snippet that I used and the entire stack trace. (I am using Cassandra 1.2.0 and hadoop 0.20.2) Can you point out the issue here ? Code snippet: SlicePredicate predicate = new SlicePredicate(); SliceRange sliceRange = new SliceRange(); sliceRange.start = ByteBuffer.wrap((1.getBytes())); sliceRange.finish = ByteBuffer.wrap((100.getBytes())); sliceRange.reversed = false; //predicate.slice_range = sliceRange; ListByteBuffer colNames = new ArrayListByteBuffer(); colNames.add(ByteBuffer.wrap(url.getBytes())); colNames.add(ByteBuffer.wrap(Parent.getBytes())); predicate.column_names = colNames; ConfigHelper.setInputSlicePredicate(job.getConfiguration(), predicate); Full stack trace: java.lang.RuntimeException: InvalidRequestException(why:Start token sorts after end token) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:184) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:
Re: Start token sorts after end token
Fix is simply to switch to random partitioner. On Wednesday, January 30, 2013, Edward Capriolo edlinuxg...@gmail.com wrote: This was unexpected fallout fro the change to murmur partitioner. A jira is open but if you need map red murmers is currently out of the question. On Wednesday, January 30, 2013, Tejas Patil tejas.patil...@gmail.com wrote: While reading data from Cassandra in map-reduce, I am getting InvalidRequestException(why:Start token sorts after end token) Below is the code snippet that I used and the entire stack trace. (I am using Cassandra 1.2.0 and hadoop 0.20.2) Can you point out the issue here ? Code snippet: SlicePredicate predicate = new SlicePredicate(); SliceRange sliceRange = new SliceRange(); sliceRange.start = ByteBuffer.wrap((1.getBytes())); sliceRange.finish = ByteBuffer.wrap((100.getBytes())); sliceRange.reversed = false; //predicate.slice_range = sliceRange; ListByteBuffer colNames = new ArrayListByteBuffer(); colNames.add(ByteBuffer.wrap(url.getBytes())); colNames.add(ByteBuffer.wrap(Parent.getBytes())); predicate.column_names = colNames; ConfigHelper.setInputSlicePredicate(job.getConfiguration(), predicate); Full stack trace: java.lang.RuntimeException: InvalidRequestException(why:Start token sorts after end token) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:184) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:
Re: JDBC, Select * Cql2 vs Cql3 problem ?
You really can't mix cql2 and cql3. Cql2 does not understand cql3s sparse tables. Technically it ,barfs all over the place. Cql2 is only good for contact tables. On Wednesday, January 30, 2013, Andy Cobley acob...@computing.dundee.ac.uk wrote: Well this is getting stranger, for me with this simple table definition, select key,gender from users is also failing with a null pointer exception Andy On 29 Jan 2013, at 13:50, Andy Cobley acob...@computing.dundee.ac.uk wrote: When connecting to Cassandra 1.2.0 from CQLSH the table was created with: CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1}; cqlsh use test; cqlsh:test create columnfamily users (KEY varchar Primary key, password varchar, gender varchar) ; cqlsh:test INSERT INTO users (KEY, password) VALUES ('jsmith', 'ch@ngem3a '); cqlsh:test INSERT INTO users (KEY, gender) VALUES ('jbrown', 'male'); stack trace (generated et.printStackTrace()) is: Can not execute statement java.lang.NullPointerException at org.apache.cassandra.cql.jdbc.TypedColumn.init(TypedColumn.java:45) at org.apache.cassandra.cql.jdbc.CassandraResultSet.createColumn(CassandraResultSet.java:972) at org.apache.cassandra.cql.jdbc.CassandraResultSet.populateColumns(CassandraResultSet.java:156) at org.apache.cassandra.cql.jdbc.CassandraResultSet.init(CassandraResultSet.java:130) at org.apache.cassandra.cql.jdbc.CassandraStatement.doExecute(CassandraStatement.java:167) at org.apache.cassandra.cql.jdbc.CassandraStatement.executeQuery(CassandraStatement.java:227) at uk.ac.dundee.computing.aec.test.test.doGet(test.java:51) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at javax.servlet.http.HttpServlet.service(HttpServlet.java:728) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java: The University of Dundee is a Scottish Registered Charity, No. SC015096.
RE: cryptic exception in Hadoop/Cassandra job
I have the same issue (but with sstableloaders). Should be fixed in 1.2 release (https://issues.apache.org/jira/browse/CASSANDRA-4813) Kind regards, Pieter -Original Message- From: Brian Jeltema [mailto:brian.jelt...@digitalenvoy.net] Sent: woensdag 30 januari 2013 13:58 To: user@cassandra.apache.org Subject: Re: cryptic exception in Hadoop/Cassandra job Cassandra 1.1.5, using BulkOutputFormat Brian On Jan 30, 2013, at 7:39 AM, Pieter Callewaert wrote: Hi Brian, Which version of cassandra are you using? And are you using the BOF to write to Cassandra? Kind regards, Pieter -Original Message- From: Brian Jeltema [mailto:brian.jelt...@digitalenvoy.net] Sent: woensdag 30 januari 2013 13:20 To: user@cassandra.apache.org Subject: cryptic exception in Hadoop/Cassandra job I have a Hadoop/Cassandra map/reduce job that performs a simple transformation on a table with very roughly 1 billion columns spread across roughly 4 million rows. During reduction, I see a relative handful of the following: Exception in thread Streaming to /10.4.0.3:1 java.lang.RuntimeException: java.io.EOFException at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:194) at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:104) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more which ultimately leads to job failure. I can't tell if this is a bug in my code or in the underlying framework. Does anyone have suggestions on how to debug this? TIA Brian
Re: JDBC, Select * Cql2 vs Cql3 problem ?
Darn auto correct cql2 , is only good for compact tables. Make sure you are setting you cql version. Or frankly just switch to Hector / thrift and use things that are know to work for years now. On Wednesday, January 30, 2013, Edward Capriolo edlinuxg...@gmail.com wrote: You really can't mix cql2 and cql3. Cql2 does not understand cql3s sparse tables. Technically it ,barfs all over the place. Cql2 is only good for contact tables. On Wednesday, January 30, 2013, Andy Cobley acob...@computing.dundee.ac.uk wrote: Well this is getting stranger, for me with this simple table definition, select key,gender from users is also failing with a null pointer exception Andy On 29 Jan 2013, at 13:50, Andy Cobley acob...@computing.dundee.ac.uk wrote: When connecting to Cassandra 1.2.0 from CQLSH the table was created with: CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1}; cqlsh use test; cqlsh:test create columnfamily users (KEY varchar Primary key, password varchar, gender varchar) ; cqlsh:test INSERT INTO users (KEY, password) VALUES ('jsmith', 'ch@ngem3a'); cqlsh:test INSERT INTO users (KEY, gender) VALUES ('jbrown', 'male'); stack trace (generated et.printStackTrace()) is: Can not execute statement java.lang.NullPointerException at org.apache.cassandra.cql.jdbc.TypedColumn.init(TypedColumn.java:45) at org.apache.cassandra.cql.jdbc.CassandraResultSet.createColumn(CassandraResultSet.java:972) at org.apache.cassandra.cql.jdbc.CassandraResultSet.populateColumns(CassandraResultSet.java:156) at org.apache.cassandra.cql.jdbc.CassandraResultSet.init(CassandraResultSet.java:130) at org.apache.cassandra.cql.jdbc.CassandraStatement.doExecute(CassandraStatement.java:167) at org.apache.cassandra.cql.jdbc.CassandraStatement.executeQuery(CassandraStatement.java:227) at uk.ac.dundee.computing.aec.test.test.doGet(test.java:51) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at javax.servlet.http.HttpServlet.service(HttpServlet.java:728) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java: The University of Dundee is a Scottish Registered Charity, No. SC015096.
Re: Node selection when both partition key and secondary index field constrained?
Any query is going to fail quorum + rf3 + 2 nodes down. One thing about 2x indexes (both user defined and built in) is that finding an answer using them requires more nodes to be up then just a single get or slice. On Monday, January 28, 2013, Mike Sample mike.sam...@gmail.com wrote: Thanks Aaron. So basically it's merging the results 2 separate queries: Indexed scan (token-range) intersect foo.flag_index=true where the latter query hits the entire cluster as per the secondary index FAQ entry. Thus the overall query would fail if LOCAL_QUORUM was requested, RF=3 and 2 nodes in a given replication group were down. Darn. Is there any way of efficiently getting around this (ie scope the query to just the nodes in the token range)? On Mon, Jan 28, 2013 at 11:44 AM, aaron morton aa...@thelastpickle.com wrote: It uses the index... cqlsh:dev tracing on; Now tracing requests. cqlsh:dev cqlsh:dev cqlsh:dev SELECT id, flag from foo WHERE TOKEN(id) '-9939393' AND TOKEN(id) = '0' AND flag=true; Tracing session: 128cab90-6982-11e2-8cd1-51eaa232562e activity | timestamp| source| source_elapsed +--+---+ execute_cql3_query | 08:36:55,244 | 127.0.0.1 | 0 Parsing statement | 08:36:55,244 | 127.0.0.1 |600 Peparing statement | 08:36:55,245 | 127.0.0.1 | 1408 Determining replicas to query | 08:36:55,246 | 127.0.0.1 | 1924 Executing indexed scan for (max(-9939393), max(0)] | 08:36:55,247 | 127.0.0.1 | 2956 Executing single-partition query on foo.flag_index | 08:36:55,247 | 127.0.0.1 | 3192 Acquiring sstable references | 08:36:55,247 | 127.0.0.1 | 3220 Merging memtable contents | 08:36:55,247 | 127.0.0.1 | 3265 Scanned 0 rows and matched 0 | 08:36:55,247 | 127.0.0.1 | 3396 Request complete | 08:36:55,247 | 127.0.0.1 | 3644 It reads from the secondary index and discards keys that are outside of the token range. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 28/01/2013, at 4:24 PM, Mike Sample mike.sam...@gmail.com wrote: Does the following FAQ entry hold even when the partion key is also constrained in the query (by token())? http://wiki.apache.org/cassandra/SecondaryIndexes: == Q: How does choice of Consistency Level affect cluster availability when using secondary indexes? A: Because secondary indexes are distributed, you must have CL nodes available for all token ranges in the cluster in order to complete a query. For example, with RF = 3, when two out of three consecutive nodes in the ring are unavailable, all secondary index queries at CL = QUORUM will fail, however secondary index queries at CL = ONE will succeed. This is true regardless of cluster size. == For example: CREATE TABLE foo ( id uuid, seq_num bigint, flag boolean, some_other_data blob, PRIMARY KEY (id,seq_num) ); CREATE INDEX flag_index ON foo (flag); SELECT id, flag from foo WHERE TOKEN(id) '-9939393' AND TOKEN(id) = '0' AND flag=true; Would the above query with LOCAL_QUORUM succeed given the following? IE is the token range used first trim node selection? * the cluster has 18 nodes * foo is in a keyspace with a replication factor of 3 for that data center * 2 nodes in one of the replication groups are down * the token range in the query is not in the range of the down nodes Thanks in advance!
Re: Node selection when both partition key and secondary index field constrained?
I recall someone doing some work in Astyanax and I don't know if it made it back in where astyanax would retry at a lower CL level when 2 nodes were down so things could continue to work which was a VERY VERY cool feature. You may want to look into that….I know at some point, I plan to. Later, Dean From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, January 30, 2013 7:31 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Node selection when both partition key and secondary index field constrained? Any query is going to fail quorum + rf3 + 2 nodes down. One thing about 2x indexes (both user defined and built in) is that finding an answer using them requires more nodes to be up then just a single get or slice. On Monday, January 28, 2013, Mike Sample mike.sam...@gmail.commailto:mike.sam...@gmail.com wrote: Thanks Aaron. So basically it's merging the results 2 separate queries: Indexed scan (token-range) intersect foo.flag_index=true where the latter query hits the entire cluster as per the secondary index FAQ entry. Thus the overall query would fail if LOCAL_QUORUM was requested, RF=3 and 2 nodes in a given replication group were down. Darn. Is there any way of efficiently getting around this (ie scope the query to just the nodes in the token range)? On Mon, Jan 28, 2013 at 11:44 AM, aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote: It uses the index... cqlsh:dev tracing on; Now tracing requests. cqlsh:dev cqlsh:dev cqlsh:dev SELECT id, flag from foo WHERE TOKEN(id) '-9939393' AND TOKEN(id) = '0' AND flag=true; Tracing session: 128cab90-6982-11e2-8cd1-51eaa232562e activity | timestamp| source | source_elapsed +--+---+ execute_cql3_query | 08:36:55,244 | 127.0.0.1 | 0 Parsing statement | 08:36:55,244 | 127.0.0.1 |600 Peparing statement | 08:36:55,245 | 127.0.0.1 | 1408 Determining replicas to query | 08:36:55,246 | 127.0.0.1 | 1924 Executing indexed scan for (max(-9939393), max(0)] | 08:36:55,247 | 127.0.0.1 | 2956 Executing single-partition query on foo.flag_index | 08:36:55,247 | 127.0.0.1 | 3192 Acquiring sstable references | 08:36:55,247 | 127.0.0.1 | 3220 Merging memtable contents | 08:36:55,247 | 127.0.0.1 | 3265 Scanned 0 rows and matched 0 | 08:36:55,247 | 127.0.0.1 | 3396 Request complete | 08:36:55,247 | 127.0.0.1 | 3644 It reads from the secondary index and discards keys that are outside of the token range. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 28/01/2013, at 4:24 PM, Mike Sample mike.sam...@gmail.commailto:mike.sam...@gmail.com wrote: Does the following FAQ entry hold even when the partion key is also constrained in the query (by token())? http://wiki.apache.org/cassandra/SecondaryIndexes: == Q: How does choice of Consistency Level affect cluster availability when using secondary indexes? A: Because secondary indexes are distributed, you must have CL nodes available for all token ranges in the cluster in order to complete a query. For example, with RF = 3, when two out of three consecutive nodes in the ring are unavailable, all secondary index queries at CL = QUORUM will fail, however secondary index queries at CL = ONE will succeed. This is true regardless of cluster size. == For example: CREATE TABLE foo ( id uuid, seq_num bigint, flag boolean, some_other_data blob, PRIMARY KEY (id,seq_num) ); CREATE INDEX flag_index ON foo (flag); SELECT id, flag from foo WHERE TOKEN(id) '-9939393' AND TOKEN(id) = '0' AND flag=true; Would the above query with LOCAL_QUORUM succeed given the following? IE is the token range used first trim node selection? * the cluster has 18 nodes * foo is in a keyspace with a replication factor of 3 for that data center * 2 nodes in one of the replication groups are down * the token range in the query is not in the range of the down nodes Thanks in advance!
Re: cryptic exception in Hadoop/Cassandra job
I'm not sure this is the same problem. I'm getting these even when using a single reducer for the entire job. Brian On Jan 30, 2013, at 9:26 AM, Pieter Callewaert wrote: I have the same issue (but with sstableloaders). Should be fixed in 1.2 release (https://issues.apache.org/jira/browse/CASSANDRA-4813) Kind regards, Pieter -Original Message- From: Brian Jeltema [mailto:brian.jelt...@digitalenvoy.net] Sent: woensdag 30 januari 2013 13:58 To: user@cassandra.apache.org Subject: Re: cryptic exception in Hadoop/Cassandra job Cassandra 1.1.5, using BulkOutputFormat Brian On Jan 30, 2013, at 7:39 AM, Pieter Callewaert wrote: Hi Brian, Which version of cassandra are you using? And are you using the BOF to write to Cassandra? Kind regards, Pieter -Original Message- From: Brian Jeltema [mailto:brian.jelt...@digitalenvoy.net] Sent: woensdag 30 januari 2013 13:20 To: user@cassandra.apache.org Subject: cryptic exception in Hadoop/Cassandra job I have a Hadoop/Cassandra map/reduce job that performs a simple transformation on a table with very roughly 1 billion columns spread across roughly 4 million rows. During reduction, I see a relative handful of the following: Exception in thread Streaming to /10.4.0.3:1 java.lang.RuntimeException: java.io.EOFException at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:194) at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:104) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more which ultimately leads to job failure. I can't tell if this is a bug in my code or in the underlying framework. Does anyone have suggestions on how to debug this? TIA Brian
Re: Poor key cache hit rate
You should not use the row cache and the key vacumed on the same cf. If that is what you are doing it explains your numbers. Some docs suggest you can use them together but in practice I have seen when this is done the key cache rate drops to near 0. On Tuesday, January 29, 2013, Keith kwri...@nanigans.com wrote: Hi all, I am running 1.1.9 with 2 data centers and 3 nodes each. Recently I have been seeing a terrible key cache hit rate (around 1-3%) with a 98% row cache hit rate. The seed node appears to take higher traffic than the other nodes (approximately twice) but I believe I have astyanax configured properly with ring describe and token aware. Any ideas or steps on how to debug? I also see high GC load. Perhaps I need more nodes?
Re: Node selection when both partition key and secondary index field constrained?
Hector has this feature because Hector is awesome sauce, but aystynsnax is new,sexy, and bogged about by netflix. So the new cassandra trend to force everyone to use less functional new stuff is at work here making you wish for something that already exists elsewhere. On Wednesday, January 30, 2013, Hiller, Dean dean.hil...@nrel.gov wrote: I recall someone doing some work in Astyanax and I don't know if it made it back in where astyanax would retry at a lower CL level when 2 nodes were down so things could continue to work which was a VERY VERY cool feature. You may want to look into that….I know at some point, I plan to. Later, Dean From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, January 30, 2013 7:31 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Node selection when both partition key and secondary index field constrained? Any query is going to fail quorum + rf3 + 2 nodes down. One thing about 2x indexes (both user defined and built in) is that finding an answer using them requires more nodes to be up then just a single get or slice. On Monday, January 28, 2013, Mike Sample mike.sam...@gmail.commailto: mike.sam...@gmail.com wrote: Thanks Aaron. So basically it's merging the results 2 separate queries: Indexed scan (token-range) intersect foo.flag_index=true where the latter query hits the entire cluster as per the secondary index FAQ entry. Thus the overall query would fail if LOCAL_QUORUM was requested, RF=3 and 2 nodes in a given replication group were down. Darn. Is there any way of efficiently getting around this (ie scope the query to just the nodes in the token range)? On Mon, Jan 28, 2013 at 11:44 AM, aaron morton aa...@thelastpickle.com mailto:aa...@thelastpickle.com wrote: It uses the index... cqlsh:dev tracing on; Now tracing requests. cqlsh:dev cqlsh:dev cqlsh:dev SELECT id, flag from foo WHERE TOKEN(id) '-9939393' AND TOKEN(id) = '0' AND flag=true; Tracing session: 128cab90-6982-11e2-8cd1-51eaa232562e activity | timestamp| source| source_elapsed +--+---+ execute_cql3_query | 08:36:55,244 | 127.0.0.1 | 0 Parsing statement | 08:36:55,244 | 127.0.0.1 |600 Peparing statement | 08:36:55,245 | 127.0.0.1 | 1408 Determining replicas to query | 08:36:55,246 | 127.0.0.1 | 1924 Executing indexed scan for (max(-9939393), max(0)] | 08:36:55,247 | 127.0.0.1 | 2956 Executing single-partition query on foo.flag_index | 08:36:55,247 | 127.0.0.1 | 3192 Acquiring sstable references | 08:36:55,247 | 127.0.0.1 | 3220 Merging memtable contents | 08:36:55,247 | 127.0.0.1 | 3265 Scanned 0 rows and matched 0 | 08:36:55,247 | 127.0.0.1 | 3396 Request complete | 08:36:55,247 | 127.0.0.1 | 3644 It reads from the secondary index and discards keys that are outside of the token range. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 28/01/2013, at 4:24 PM, Mike Sample mike.sam...@gmail.commailto: mike.sam...@gmail.com wrote: Does the following FAQ entry hold even when the partion key is also constrained in the query (by token())? http://wiki.apache.org/cassandra/SecondaryIndexes: == Q: How does choice of Consistency Level affect cluster availability when using secondary indexes? A: Because secondary indexes are distributed, you must have CL nodes available for all token ranges in the cluster in order to complete a query. For example, with RF = 3, when two out of three consecutive nodes in the ring are unavailable, all secondary index queries at CL = QUORUM will fail, however secondary index queries at CL = ONE will succeed. This is true regardless of cluster size. == For example: CREATE TABLE foo ( id uuid, seq_num bigint, flag boolean, some_other_data blob, PRIMARY KEY (id,seq_num) ); CREATE INDEX flag_index ON foo (flag); SELECT id, flag from foo WHERE TOKEN(id) '-9939393' AND TOKEN(id) = '0' AND flag=true; Would the above query with LOCAL_QUORUM succeed given the following? IE is the token range used first trim node selection? * the cluster has 18 nodes * foo is in a keyspace with a replication factor of 3 for that data center * 2 nodes in one of the replication groups are
Re: Node selection when both partition key and secondary index field constrained?
I'd also point out, Hector has better support for CQL3 features than Astyanax. I contributed some stuff to hector back in December, but I don't have time to apply those changes to astyanax. I have other contributions in mind for hector, which I hope to work on later this year. On Wed, Jan 30, 2013 at 9:45 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Hector has this feature because Hector is awesome sauce, but aystynsnax is new,sexy, and bogged about by netflix. So the new cassandra trend to force everyone to use less functional new stuff is at work here making you wish for something that already exists elsewhere. On Wednesday, January 30, 2013, Hiller, Dean dean.hil...@nrel.gov wrote: I recall someone doing some work in Astyanax and I don't know if it made it back in where astyanax would retry at a lower CL level when 2 nodes were down so things could continue to work which was a VERY VERY cool feature. You may want to look into that….I know at some point, I plan to. Later, Dean From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, January 30, 2013 7:31 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Node selection when both partition key and secondary index field constrained? Any query is going to fail quorum + rf3 + 2 nodes down. One thing about 2x indexes (both user defined and built in) is that finding an answer using them requires more nodes to be up then just a single get or slice. On Monday, January 28, 2013, Mike Sample mike.sam...@gmail.commailto:mike.sam...@gmail.com wrote: Thanks Aaron. So basically it's merging the results 2 separate queries: Indexed scan (token-range) intersect foo.flag_index=true where the latter query hits the entire cluster as per the secondary index FAQ entry. Thus the overall query would fail if LOCAL_QUORUM was requested, RF=3 and 2 nodes in a given replication group were down. Darn. Is there any way of efficiently getting around this (ie scope the query to just the nodes in the token range)? On Mon, Jan 28, 2013 at 11:44 AM, aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote: It uses the index... cqlsh:dev tracing on; Now tracing requests. cqlsh:dev cqlsh:dev cqlsh:dev SELECT id, flag from foo WHERE TOKEN(id) '-9939393' AND TOKEN(id) = '0' AND flag=true; Tracing session: 128cab90-6982-11e2-8cd1-51eaa232562e activity | timestamp| source| source_elapsed +--+---+ execute_cql3_query | 08:36:55,244 | 127.0.0.1 | 0 Parsing statement | 08:36:55,244 | 127.0.0.1 |600 Peparing statement | 08:36:55,245 | 127.0.0.1 | 1408 Determining replicas to query | 08:36:55,246 | 127.0.0.1 | 1924 Executing indexed scan for (max(-9939393), max(0)] | 08:36:55,247 | 127.0.0.1 | 2956 Executing single-partition query on foo.flag_index | 08:36:55,247 | 127.0.0.1 | 3192 Acquiring sstable references | 08:36:55,247 | 127.0.0.1 | 3220 Merging memtable contents | 08:36:55,247 | 127.0.0.1 | 3265 Scanned 0 rows and matched 0 | 08:36:55,247 | 127.0.0.1 | 3396 Request complete | 08:36:55,247 | 127.0.0.1 | 3644 It reads from the secondary index and discards keys that are outside of the token range. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 28/01/2013, at 4:24 PM, Mike Sample mike.sam...@gmail.commailto:mike.sam...@gmail.com wrote: Does the following FAQ entry hold even when the partion key is also constrained in the query (by token())? http://wiki.apache.org/cassandra/SecondaryIndexes: == Q: How does choice of Consistency Level affect cluster availability when using secondary indexes? A: Because secondary indexes are distributed, you must have CL nodes available for all token ranges in the cluster in order to complete a query. For example, with RF = 3, when two out of three consecutive nodes in the ring are unavailable, all secondary index queries at CL = QUORUM will fail, however secondary index queries at CL = ONE will succeed. This is true regardless of cluster size. == For example: CREATE TABLE foo ( id uuid, seq_num bigint, flag boolean, some_other_data blob, PRIMARY KEY (id,seq_num) );
Nodetool can not get to 7199 after migrating to 1.2.1
I migrated my test environment from 1.2.0 to 1.2.1 (DataStax Community) and nodetool can not communicate to 7199, even if it is listening. in one node I get Failed to connect to 'cassandra4:7199': Connection refused in another node I get timeout. Did I do anything wrong, when upgrading? Thanks In advance Shahryar -- Life is what happens while you are making other plans. ~ John Lennon
Suggestion: Move some threads to the client-dev mailing list
A good portion of people and traffic on this list is questions about: 1) asytnax 2) cassandra-jdbc 3) cassandra native client 3) pyhtondra / whatever With the exception of the native transport which is only half way part of Cassandra, none of the these other client issues have much to do with core cassandra at all. If someone authors a client library/driver/etc they should be supporting it outside of the user@cassandra mailing list. My suggestion: At minimum we should re-route these questions to client-dev or simply say, If it is not part of core Cassandra, you are looking in the wrong place for support Edward
Re: Understanding Virtual Nodes on Cassandra 1.2
On Wed 30 Jan 2013 02:29:27 AM CST, Zhong Li wrote: One more question, can I add a virtual node manually without reboot and rebuild a host data? I checked nodetool command, there is no option to add a node. Thanks. Zhong On Jan 29, 2013, at 11:09 AM, Zhong Li wrote: I was misunderstood this http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 , especially If you want to get started with vnodes on a fresh cluster, however, that is fairly straightforward. Just don’t set the |initial_token| parameter in your|conf/cassandra.yaml| and instead enable the |num_tokens| parameter. A good default value for this is 256 Also I couldn't find document about set multiple tokens for cassandra.inital_token Anyway, I just tested, it does work to set comma separated list of tokens. Thanks, Zhong On Jan 29, 2013, at 3:06 AM, aaron morton wrote: After I searched some document on Datastax website and some old ticket, seems that it works for random partitioner only, and leaves order preserved partitioner out of the luck. Links ? or allow add Virtual Nodes manually? If not looked into it but there is a cassandra.inital_token startup param that takes a comma separated list of tokens for the node. There also appears to be support for the ordered partitions to generate random tokens. But you would still have the problem of having to balance your row keys around the token space. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com http://www.thelastpickle.com/ On 29/01/2013, at 10:31 AM, Zhong Li z...@voxeo.com mailto:z...@voxeo.com wrote: Hi All, Virtual Nodes is great feature. After I searched some document on Datastax website and some old ticket, seems that it works for random partitioner only, and leaves order preserved partitioner out of the luck. I may misunderstand, please correct me. if it doesn't love order preserved partitioner, would be possible to add support multiple initial_token(s) for order preserved partitioner or allow add Virtual Nodes manually? Thanks, Zhong You add a physical node and that in turn adds num_token tokens to the ring.
Re: Suggestion: Move some threads to the client-dev mailing list
I totally agree. -Vivek On Wed, Jan 30, 2013 at 8:51 PM, Edward Capriolo edlinuxg...@gmail.comwrote: A good portion of people and traffic on this list is questions about: 1) asytnax 2) cassandra-jdbc 3) cassandra native client 3) pyhtondra / whatever With the exception of the native transport which is only half way part of Cassandra, none of the these other client issues have much to do with core cassandra at all. If someone authors a client library/driver/etc they should be supporting it outside of the user@cassandra mailing list. My suggestion: At minimum we should re-route these questions to client-dev or simply say, If it is not part of core Cassandra, you are looking in the wrong place for support Edward
Re: Inserting via thrift interface to column family created with Compound Key via cql3
Are you using execute_cql3_query() ? On Jan 30, 2013, at 7:31 AM, Oleksandr Petrov oleksandr.pet...@gmail.com wrote: Hi, I'm creating a table via cql3 query like: CREATE TABLE posts ( userid text, blog_name text, entry_title text, posted_at text, PRIMARY KEY (userid, blog_name) ) After that i'm trying to insert into same column family via thrift interface, and i'm getting following exception: Not enough bytes to read value of component 0 Cassandra.java:20833 org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read TServiceClient.java:78 org.apache.thrift.TServiceClient.receiveBase Cassandra.java:964 org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate Cassandra.java:950 org.apache.cassandra.thrift.Cassandra$Client.batch_mutate Thrift client doesn't even display that column family when running describe_keyspace. I may be missing something, and I realize that CQL3 is way to, but i'm still oblivious of wether it's even possible to combine cql3 and thrift things. -- alex p
Re: Inserting via thrift interface to column family created with Compound Key via cql3
Yes, execute_cql3_query, exactly. On Wed, Jan 30, 2013 at 4:37 PM, Michael Kjellman mkjell...@barracuda.comwrote: Are you using execute_cql3_query() ? On Jan 30, 2013, at 7:31 AM, Oleksandr Petrov oleksandr.pet...@gmail.com wrote: Hi, I'm creating a table via cql3 query like: CREATE TABLE posts ( userid text, blog_name text, entry_title text, posted_at text, PRIMARY KEY (userid, blog_name) ) After that i'm trying to insert into same column family via thrift interface, and i'm getting following exception: Not enough bytes to read value of component 0 Cassandra.java:20833 org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read TServiceClient.java:78 org.apache.thrift.TServiceClient.receiveBase Cassandra.java:964 org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate Cassandra.java:950 org.apache.cassandra.thrift.Cassandra$Client.batch_mutate Thrift client doesn't even display that column family when running describe_keyspace. I may be missing something, and I realize that CQL3 is way to, but i'm still oblivious of wether it's even possible to combine cql3 and thrift things. -- alex p -- alex p
Re: Uneven CPU load on a 4 node cluster
The high CPU node got replaced and now I'm not getting abnormally high CPU from one node. They all are evenly balanced now. On 29 January 2013 16:29, Jabbar aja...@gmail.com wrote: Hello, I've been testing a four identical node cassanda 1.2 cluster for a number of days. I have written a c# client using cassandra sharp() which inserts data into a table. The keyspace difinition is CREATE KEYSPACE data WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3}; The table definition is CREATE TABLE datapoints ( siteid bigint, time timestamp, channel int, data float, PRIMARY KEY ((siteid, channel),time) ) I am finding that the CPU load on one of the servers stays at ~90% whilst the load on the other servers stays 40%. All the servers are supposed to be identical. The client library I am using does load balancing between all nodes. I have also used the cassandra stress tool as follows cassandra-stress -d 192.168.21.7,192.168.21.9,192.168.21.12,192.168.21.14 --replication-factor 3 -n 1000 -t 100 and have found that it behaves similarly. Can somebody explain why this happens? -- Thanks A Jabbar Azam -- Thanks A Jabbar Azam
Re: Inserting via thrift interface to column family created with Compound Key via cql3
Did you pack the composite correctly? This exception normally shows up when the composite bytes are malformed On Jan 30, 2013, at 7:45 AM, Oleksandr Petrov oleksandr.pet...@gmail.commailto:oleksandr.pet...@gmail.com wrote: Yes, execute_cql3_query, exactly. On Wed, Jan 30, 2013 at 4:37 PM, Michael Kjellman mkjell...@barracuda.commailto:mkjell...@barracuda.com wrote: Are you using execute_cql3_query() ? On Jan 30, 2013, at 7:31 AM, Oleksandr Petrov oleksandr.pet...@gmail.commailto:oleksandr.pet...@gmail.com wrote: Hi, I'm creating a table via cql3 query like: CREATE TABLE posts ( userid text, blog_name text, entry_title text, posted_at text, PRIMARY KEY (userid, blog_name) ) After that i'm trying to insert into same column family via thrift interface, and i'm getting following exception: Not enough bytes to read value of component 0 Cassandra.java:20833 org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read TServiceClient.java:78 org.apache.thrift.TServiceClient.receiveBase Cassandra.java:964 org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate Cassandra.java:950 org.apache.cassandra.thrift.Cassandra$Client.batch_mutate Thrift client doesn't even display that column family when running describe_keyspace. I may be missing something, and I realize that CQL3 is way to, but i'm still oblivious of wether it's even possible to combine cql3 and thrift things. -- alex p -- alex p
Re: Inserting via thrift interface to column family created with Compound Key via cql3
From src/java/org/apache/cassandra/db/marshal/CompositeType.java /* * The encoding of a CompositeType column name should be: * componentcomponentcomponent ... * where component is: * length of valuevalue'end-of-component' byte * where length of value is a 2 bytes unsigned short the and the * 'end-of-component' byte should always be 0 for actual column name. * However, it can set to 1 for query bounds. This allows to query for the * equivalent of 'give me the full super-column'. That is, if during a slice * query uses: * start = 3foo.getBytes()0 * end = 3foo.getBytes()1 * then he will be sure to get *all* the columns whose first component is foo. * If for a component, the 'end-of-component' is != 0, there should not be any * following component. The end-of-component can also be -1 to allow * non-inclusive query. For instance: * start = 3foo.getBytes()-1 * allows to query everything that is greater than 3foo.getBytes(), but * not 3foo.getBytes() itself. */ Or am I missing the fact that you are inserting with cql3 as well? From: Michael Kjellman mkjell...@barracuda.commailto:mkjell...@barracuda.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, January 30, 2013 8:03 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Inserting via thrift interface to column family created with Compound Key via cql3 Did you pack the composite correctly? This exception normally shows up when the composite bytes are malformed On Jan 30, 2013, at 7:45 AM, Oleksandr Petrov oleksandr.pet...@gmail.commailto:oleksandr.pet...@gmail.com wrote: Yes, execute_cql3_query, exactly. On Wed, Jan 30, 2013 at 4:37 PM, Michael Kjellman mkjell...@barracuda.commailto:mkjell...@barracuda.com wrote: Are you using execute_cql3_query() ? On Jan 30, 2013, at 7:31 AM, Oleksandr Petrov oleksandr.pet...@gmail.commailto:oleksandr.pet...@gmail.com wrote: Hi, I'm creating a table via cql3 query like: CREATE TABLE posts ( userid text, blog_name text, entry_title text, posted_at text, PRIMARY KEY (userid, blog_name) ) After that i'm trying to insert into same column family via thrift interface, and i'm getting following exception: Not enough bytes to read value of component 0 Cassandra.java:20833 org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read TServiceClient.java:78 org.apache.thrift.TServiceClient.receiveBase Cassandra.java:964 org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate Cassandra.java:950 org.apache.cassandra.thrift.Cassandra$Client.batch_mutate Thrift client doesn't even display that column family when running describe_keyspace. I may be missing something, and I realize that CQL3 is way to, but i'm still oblivious of wether it's even possible to combine cql3 and thrift things. -- alex p -- alex p
Chronos - Timeseries with Hector
Hello, I recently open sourced a WIP java library for handling timestamped data. I am looking for feedback/criticism and also interest. It was made primarily to process lots of small numeric values, without having to load the entire set into memory. Anyways, thoughts and feedback appreciated. --Dan
Re: Chronos - Timeseries with Hector
I'm sure it helps if I link the thing: https://github.com/dansimpson/chronos On Wed, Jan 30, 2013 at 8:39 AM, Dan Simpson dan.simp...@gmail.com wrote: Hello, I recently open sourced a WIP java library for handling timestamped data. I am looking for feedback/criticism and also interest. It was made primarily to process lots of small numeric values, without having to load the entire set into memory. Anyways, thoughts and feedback appreciated. --Dan
Re: Understanding Virtual Nodes on Cassandra 1.2
You add a physical node and that in turn adds num_token tokens to the ring. No, I am talking about Virtual Nodes with order preserving partitioner. For an existing host with multiple tokens setting list on cassandra.inital_token. After initial bootstrapping, the host will not aware changes of cassandra.inital_token. If I want add a new token( virtual node), I have to rebuild the host with new token list. My question is if there is way to add a virtual nodes without rebuild it? Thanks, On Jan 30, 2013, at 10:21 AM, Manu Zhang wrote: On Wed 30 Jan 2013 02:29:27 AM CST, Zhong Li wrote: One more question, can I add a virtual node manually without reboot and rebuild a host data? I checked nodetool command, there is no option to add a node. Thanks. Zhong On Jan 29, 2013, at 11:09 AM, Zhong Li wrote: I was misunderstood this http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 , especially If you want to get started with vnodes on a fresh cluster, however, that is fairly straightforward. Just don’t set the |initial_token| parameter in your|conf/cassandra.yaml| and instead enable the |num_tokens| parameter. A good default value for this is 256 Also I couldn't find document about set multiple tokens for cassandra.inital_token Anyway, I just tested, it does work to set comma separated list of tokens. Thanks, Zhong On Jan 29, 2013, at 3:06 AM, aaron morton wrote: After I searched some document on Datastax website and some old ticket, seems that it works for random partitioner only, and leaves order preserved partitioner out of the luck. Links ? or allow add Virtual Nodes manually? If not looked into it but there is a cassandra.inital_token startup param that takes a comma separated list of tokens for the node. There also appears to be support for the ordered partitions to generate random tokens. But you would still have the problem of having to balance your row keys around the token space. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com http://www.thelastpickle.com/ On 29/01/2013, at 10:31 AM, Zhong Li z...@voxeo.com mailto:z...@voxeo.com wrote: Hi All, Virtual Nodes is great feature. After I searched some document on Datastax website and some old ticket, seems that it works for random partitioner only, and leaves order preserved partitioner out of the luck. I may misunderstand, please correct me. if it doesn't love order preserved partitioner, would be possible to add support multiple initial_token(s) for order preserved partitioner or allow add Virtual Nodes manually? Thanks, Zhong You add a physical node and that in turn adds num_token tokens to the ring.
Re: Upcoming conferences
At what level will the NY talks be? I had been planning on attending Datastax's big summer conference and I might not be able to get approval for bothso I'd like to hear more about this one. On Wed, Jan 30, 2013 at 12:40 PM, Jonathan Ellis jbel...@gmail.com wrote: ApacheCon North America (Portland, Feb 26-28) has a Cassandra track on the 28th: http://na.apachecon.com/schedule/ NY C* Tech Day (NY, March 20) is a 2-track, one-day conference devoted to Cassandra: http://datastax.com/nycassandra2013/ See you there! -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Suggestion: Move some threads to the client-dev mailing list
On Wed, Jan 30, 2013 at 7:21 AM, Edward Capriolo edlinuxg...@gmail.com wrote: My suggestion: At minimum we should re-route these questions to client-dev or simply say, If it is not part of core Cassandra, you are looking in the wrong place for support +1, I find myself scanning past all those questions in order to find questions I am able to answer based solely on my operational knowledge of the Cassandra daemon. =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
RE: cluster issues
I am using DseDelegateSnitch Thanks,SC From: aa...@thelastpickle.com Subject: Re: cluster issues Date: Tue, 29 Jan 2013 20:15:45 +1300 To: user@cassandra.apache.org We can always be proactive in keeping the time sync. But, Is there any way to recover from a time drift (in a reactive manner)? Since it was a lab environment, I dropped the KS (deleted data directory)There is a way to remove future dated columns, but it not for the faint hearted. Basically:1) Drop the gc_grace_seconds to 02) Delete the column with a timestamp way in the future, so it is guaranteed to be higher than the value you want to delete. 3) Flush the CF4) Compact all the SSTables that contain the row. The easiest way to do that is a major compaction, but we normally advise not to do that because it creates one big file. You can also do a user defined compaction. Are there any other scenarios that would lead a cluster look like below? Note:Actual topology of the cluster - ONE Cassandra node and TWO Analytic nodes.What snitch are you using?If you have the property file snitch do all nodes have the same configuration ? There is a lot of sickness there. If possible I would scrub and start again. Cheers -Aaron MortonFreelance Cassandra DeveloperNew Zealand @aaronmortonhttp://www.thelastpickle.com On 29/01/2013, at 6:29 AM, S C as...@outlook.com wrote: One of our node in a 3 node cluster drifted by ~ 20-25 seconds. While I figured this pretty quickly, I had few questions that am looking for some answers. We can always be proactive in keeping the time sync. But, Is there any way to recover from a time drift (in a reactive manner)? Since it was a lab environment, I dropped the KS (deleted data directory).Are there any other scenarios that would lead a cluster look like below? Note:Actual topology of the cluster - ONE Cassandra node and TWO Analytic nodes. On 192.168.2.100Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 192.168.2.100 Cassandra rack1 Up Normal 601.34 MB 33.33% 0 192.168.2.101 Analytics rack1 Down Normal 149.75 MB 33.33% 56713727820156410577229101238628035242 192.168.2.102 Analytics rack1 Down Normal ? 33.33% 113427455640312821154458202477256070485 On 192.168.2.101Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 192.168.2.100 Analytics rack1 Down Normal ? 33.33% 0 192.168.2.101 Analytics rack1 Up Normal 158.59 MB 33.33% 56713727820156410577229101238628035242 192.168.2.102 Analytics rack1 Down Normal ? 33.33% 113427455640312821154458202477256070485 On 192.168.2.102Address DC RackStatus State Load OwnsToken 113427455640312821154458202477256070485 192.168.2.100 Analytics rack1 Down Normal ? 33.33% 0 192.168.2.101 Analytics rack1 Down Normal ? 33.33% 56713727820156410577229101238628035242 192.168.2.102 Analytics rack1 Up Normal 117.02 MB 33.33% 113427455640312821154458202477256070485 Appreciate your valuable inputs. Thanks,SC
Re: Understanding Virtual Nodes on Cassandra 1.2
Are there tickets/documents explain how data be replicated on Virtual Nodes? If there are multiple tokens on one physical host, may a chance two or more tokens chosen by replication strategy located on same host? If move/remove/add a token manually, does Cassandra Engine validate the case? Thanks. On Jan 30, 2013, at 12:46 PM, Zhong Li wrote: You add a physical node and that in turn adds num_token tokens to the ring. No, I am talking about Virtual Nodes with order preserving partitioner. For an existing host with multiple tokens setting list on cassandra.inital_token. After initial bootstrapping, the host will not aware changes of cassandra.inital_token. If I want add a new token( virtual node), I have to rebuild the host with new token list. My question is if there is way to add a virtual nodes without rebuild it? Thanks, On Jan 30, 2013, at 10:21 AM, Manu Zhang wrote: On Wed 30 Jan 2013 02:29:27 AM CST, Zhong Li wrote: One more question, can I add a virtual node manually without reboot and rebuild a host data? I checked nodetool command, there is no option to add a node. Thanks. Zhong On Jan 29, 2013, at 11:09 AM, Zhong Li wrote: I was misunderstood this http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 , especially If you want to get started with vnodes on a fresh cluster, however, that is fairly straightforward. Just don’t set the |initial_token| parameter in your|conf/cassandra.yaml| and instead enable the |num_tokens| parameter. A good default value for this is 256 Also I couldn't find document about set multiple tokens for cassandra.inital_token Anyway, I just tested, it does work to set comma separated list of tokens. Thanks, Zhong On Jan 29, 2013, at 3:06 AM, aaron morton wrote: After I searched some document on Datastax website and some old ticket, seems that it works for random partitioner only, and leaves order preserved partitioner out of the luck. Links ? or allow add Virtual Nodes manually? If not looked into it but there is a cassandra.inital_token startup param that takes a comma separated list of tokens for the node. There also appears to be support for the ordered partitions to generate random tokens. But you would still have the problem of having to balance your row keys around the token space. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com http://www.thelastpickle.com/ On 29/01/2013, at 10:31 AM, Zhong Li z...@voxeo.com mailto:z...@voxeo.com wrote: Hi All, Virtual Nodes is great feature. After I searched some document on Datastax website and some old ticket, seems that it works for random partitioner only, and leaves order preserved partitioner out of the luck. I may misunderstand, please correct me. if it doesn't love order preserved partitioner, would be possible to add support multiple initial_token(s) for order preserved partitioner or allow add Virtual Nodes manually? Thanks, Zhong You add a physical node and that in turn adds num_token tokens to the ring.
Re: too many warnings of Heap is full
My guess is that those one or two nodes with the gc pressure also have more rows in your big CF. More rows could be due to imbalanced distribution if your'e not using a random partitioner or from those nodes not yet removing deleted rows which other nodes may have done. JVM heap space is used for a few things which scale with key count including: - bloom filter (for C* 1.2) - index samples Other space is used but can be more easily controlled by tuning for - memtable - compaction - key cache - row cache So, if those nodes have more rows (check using nodetool ring or nodetool cfstats) than the others you can try to: - reduce the number of rows by adding nodes, run manual / tune compactions to remove rows with expired tombstones, etc. - increase bloom filter fp chance - increase jvm heap size (don't go too big) - disable key or row cache - increase index sample interval Not all of those things are generally good especially to the extreme so don't go setting a 20 GB jvm heap without understanding the consequences for example. -Bryan On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero guillermo.barb...@spotbros.com wrote: Hi, I'm viewing a weird behaviour in my cassandra cluster. Most of the warning messages are due to Heap is % full. According to this link ( http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html ) there are two ways to reduce pressure: 1. Decrease the cache sizes 2. Increase the index interval size Most of the flushes are in two column families (users and messages), I guess that's because the most mutations are there. I still have not applied those changes to the production environment. Do you recommend any other meassure? Should I set specific tunning for these two CFs? Should I check another metric? Additionally, the distribution of warning messages is not uniform along the cluster. Why could cassandra be doing this? What should I do to find out how to fix this? cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2) the java version is the following: java version 1.6.0_37 Java(TM) SE Runtime Environment (build 1.6.0_37-b06) Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode) The cassandra system.log is resumed here (numer of messages, cassandra node, class that reports the message, first word of the message) 2013-01-26 5 cassNode0: GCInspector.java Heap 5 cassNode0: StorageService.java Flushing 232 cassNode2: GCInspector.java Heap 232 cassNode2: StorageService.java Flushing 104 cassNode3: GCInspector.java Heap 104 cassNode3: StorageService.java Flushing 3 cassNode4: GCInspector.java Heap 3 cassNode4: StorageService.java Flushing 3 cassNode5: GCInspector.java Heap 3 cassNode5: StorageService.java Flushing 2013-01-27 2 cassNode0: GCInspector.java Heap 2 cassNode0: StorageService.java Flushing 3 cassNode1: GCInspector.java Heap 3 cassNode1: StorageService.java Flushing 189 cassNode2: GCInspector.java Heap 189 cassNode2: StorageService.java Flushing 104 cassNode3: GCInspector.java Heap 104 cassNode3: StorageService.java Flushing 1 cassNode4: GCInspector.java Heap 1 cassNode4: StorageService.java Flushing 1 cassNode5: GCInspector.java Heap 1 cassNode5: StorageService.java Flushing 2013-01-28 2 cassNode0: GCInspector.java Heap 2 cassNode0: StorageService.java Flushing 1 cassNode1: GCInspector.java Heap 1 cassNode1: StorageService.java Flushing 1 cassNode2: AutoSavingCache.java Reducing 343 cassNode2: GCInspector.java Heap 342 cassNode2: StorageService.java Flushing 181 cassNode3: GCInspector.java Heap 181 cassNode3: StorageService.java Flushing 4 cassNode4: GCInspector.java Heap 4 cassNode4: StorageService.java Flushing 3 cassNode5: GCInspector.java Heap 3 cassNode5: StorageService.java Flushing 2013-01-29 2 cassNode0: GCInspector.java Heap 2 cassNode0: StorageService.java Flushing 3 cassNode1: GCInspector.java Heap 3 cassNode1: StorageService.java Flushing 156 cassNode2: GCInspector.java Heap 156 cassNode2: StorageService.java Flushing 71 cassNode3: GCInspector.java Heap 71 cassNode3: StorageService.java Flushing 2 cassNode4: GCInspector.java Heap 2 cassNode4: StorageService.java Flushing 2 cassNode5: GCInspector.java Heap 1 cassNode5: Memtable.java setting 2 cassNode5: StorageService.java Flushing -- Guillermo Barbero - Backend Team Spotbros Technologies
Re: too many warnings of Heap is full
What's the output of nodetool cfstats for those 2 column families on cassNode2 and cassNode3? And what is the replication factor for this cluster? Per the previous reply, nodetool ring should show each of your nodes with ~16.7% of the data if well balanced. Also, the auto-detection for memory sizes in the startup script is a little off w/r/t m1.xlarge because of the 'slightly less than 16gb' of ram. It usually ends up allocating 4g/400m (max and young) whereas 8g/800m will give you some more breathing room. On Wed, Jan 30, 2013 at 12:07 PM, Bryan Talbot btal...@aeriagames.com wrote: My guess is that those one or two nodes with the gc pressure also have more rows in your big CF. More rows could be due to imbalanced distribution if your'e not using a random partitioner or from those nodes not yet removing deleted rows which other nodes may have done. JVM heap space is used for a few things which scale with key count including: - bloom filter (for C* 1.2) - index samples Other space is used but can be more easily controlled by tuning for - memtable - compaction - key cache - row cache So, if those nodes have more rows (check using nodetool ring or nodetool cfstats) than the others you can try to: - reduce the number of rows by adding nodes, run manual / tune compactions to remove rows with expired tombstones, etc. - increase bloom filter fp chance - increase jvm heap size (don't go too big) - disable key or row cache - increase index sample interval Not all of those things are generally good especially to the extreme so don't go setting a 20 GB jvm heap without understanding the consequences for example. -Bryan On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero guillermo.barb...@spotbros.com wrote: Hi, I'm viewing a weird behaviour in my cassandra cluster. Most of the warning messages are due to Heap is % full. According to this link (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html) there are two ways to reduce pressure: 1. Decrease the cache sizes 2. Increase the index interval size Most of the flushes are in two column families (users and messages), I guess that's because the most mutations are there. I still have not applied those changes to the production environment. Do you recommend any other meassure? Should I set specific tunning for these two CFs? Should I check another metric? Additionally, the distribution of warning messages is not uniform along the cluster. Why could cassandra be doing this? What should I do to find out how to fix this? cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2) the java version is the following: java version 1.6.0_37 Java(TM) SE Runtime Environment (build 1.6.0_37-b06) Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode) The cassandra system.log is resumed here (numer of messages, cassandra node, class that reports the message, first word of the message) 2013-01-26 5 cassNode0: GCInspector.java Heap 5 cassNode0: StorageService.java Flushing 232 cassNode2: GCInspector.java Heap 232 cassNode2: StorageService.java Flushing 104 cassNode3: GCInspector.java Heap 104 cassNode3: StorageService.java Flushing 3 cassNode4: GCInspector.java Heap 3 cassNode4: StorageService.java Flushing 3 cassNode5: GCInspector.java Heap 3 cassNode5: StorageService.java Flushing 2013-01-27 2 cassNode0: GCInspector.java Heap 2 cassNode0: StorageService.java Flushing 3 cassNode1: GCInspector.java Heap 3 cassNode1: StorageService.java Flushing 189 cassNode2: GCInspector.java Heap 189 cassNode2: StorageService.java Flushing 104 cassNode3: GCInspector.java Heap 104 cassNode3: StorageService.java Flushing 1 cassNode4: GCInspector.java Heap 1 cassNode4: StorageService.java Flushing 1 cassNode5: GCInspector.java Heap 1 cassNode5: StorageService.java Flushing 2013-01-28 2 cassNode0: GCInspector.java Heap 2 cassNode0: StorageService.java Flushing 1 cassNode1: GCInspector.java Heap 1 cassNode1: StorageService.java Flushing 1 cassNode2: AutoSavingCache.java Reducing 343 cassNode2: GCInspector.java Heap 342 cassNode2: StorageService.java Flushing 181 cassNode3: GCInspector.java Heap 181 cassNode3: StorageService.java Flushing 4 cassNode4: GCInspector.java Heap 4 cassNode4: StorageService.java Flushing 3 cassNode5: GCInspector.java Heap 3 cassNode5: StorageService.java Flushing 2013-01-29 2 cassNode0: GCInspector.java Heap 2 cassNode0: StorageService.java Flushing 3 cassNode1: GCInspector.java Heap 3 cassNode1: StorageService.java Flushing 156 cassNode2: GCInspector.java Heap 156 cassNode2: StorageService.java Flushing 71 cassNode3: GCInspector.java Heap 71 cassNode3:
CASSANDRA-5152
I had the same problem with 1.2.0. The problem went away after readline was easy-installed. Regards, Yen-Fen Hsu
Re: SStable Writer and composite key
This is what a row of your table will look like internally… --- RowKey: id-value = (column=date-value:request-value:, value=, timestamp=1359586739456000) = (column=date-value:request-value:data1, value=64617461312d76616c7565, timestamp=1359586739456000) = (column=date-value:request-value:data2, value=64617461322d76616c7565, timestamp=1359586739456000) where id-value is the value of id column, and date-value is the…. So you need to construct triples for each column value. The first col name is the (date-value, request-value, empty) The second col name is (date-value, request-value, data1) and has the value of data1. Hope that helps. - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 30/01/2013, at 3:11 AM, POUGET Laurent laurent.pou...@carboatmedia.fr wrote: Hi, I have some trouble to request my data. I use SSTableSimpleUnsortedWriter to write SSTable. Writing and Importing works fine. I think, I’m misusing CompositeType.Builder with SSTableSimpleUnsortedWriter. Do you have any idea ? Thanks Here is my case : /** * CREATE STATEMENT */ CREATE TABLE raw_data ( id text, date text, request text, data1 text, data2 text, PRIMARY KEY (id, date, request) ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; /** * JAVA CODE */ ListAbstractType? compositeList = new ArrayListAbstractType?(); compositeList.add(UTF8Type.instance); compositeList.add(UTF8Type.instance); IPartitioner? partitioner = StorageService.getPartitioner(); dir = Directories.create(keyspace.getKeyspaceName(), columnFamily.getName()).getDirectoryForNewSSTables(0); simpleUnsortedWriter = new SSTableSimpleUnsortedWriter(dir, partitioner, keyspace.getKeyspaceName(), columnFamily.getName(), UTF8Type.instance, null, 32); CompositeType.Builder builderRequestDate = new CompositeType.Builder( CompositeType.getInstance (compositeList) ); CompositeType.Builder builderUrl = new CompositeType.Builder( CompositeType.getInstance(compositeList) ); simpleUnsortedWriter.newRow(bytes(id)); builderRequestDate.add(bytes(date)); builderRequestDate.add(bytes(request)); long timestamp = System.currentTimeMillis() * 1000; simpleUnsortedWriter.addColumn(builderRequestDate.build(), bytes(date), timestamp); simpleUnsortedWriter.addColumn(builderUrl.build(), bytes(request), timestamp); simpleUnsortedWriter.addColumn(bytes(data1), bytes(data1), timestamp); simpleUnsortedWriter.addColumn(bytes(data2), bytes(data2), timestamp); simpleUnsortedWriter.close(); Laurent Pouget Ingénieur étude et développement Tel : 01.84.95.11.20 Car Boat Media 22 Rue Joubert 75009 Paris
Re: Problem on node join the ring
erg, that error means it's not really part of the ring. I would try to restart the joining. Shut down the node, and delete everything in /var/lib/data/system. You can leave the data that's already there if you want or delete it. Then try joining again. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 30/01/2013, at 5:40 AM, Daning Wang dan...@netseer.com wrote: Thanks very much Aaron. * Other nodes still report it is in Joining * Here are bootstrap information in the log [ca...@dsat305e.prod:/usr/local/cassy log]$ grep -i boot system.log INFO [main] 2013-01-28 20:16:07,488 StorageService.java (line 774) JOINING: schema complete, ready to bootstrap INFO [main] 2013-01-28 20:16:07,489 StorageService.java (line 774) JOINING: getting bootstrap token INFO [main] 2013-01-28 20:16:37,518 StorageService.java (line 774) JOINING: Starting to bootstrap... * I tried to run repair -pr, but it gives exception [ca...@dsat305e.prod:/usr/local/cassy log]$ nodetool -h localhost repair -pr Exception in thread main java.lang.AssertionError at org.apache.cassandra.locator.TokenMetadata.getToken(TokenMetadata.java:304) at org.apache.cassandra.service.StorageService.getPrimaryRangeForEndpoint(StorageService.java:2080) at org.apache.cassandra.service.StorageService.getLocalPrimaryRange(StorageService.java:211) at org.apache.cassandra.service.StorageService.forceTableRepairPrimaryRange(StorageService.java:1993) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:235) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:250) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) On Mon, Jan 28, 2013 at 11:55 PM, aaron morton aa...@thelastpickle.com wrote: there is no streaming anymore Nodes only bootstrap once, when they are first started. I have turned on the debug, this is what it is doing now(cpu is pretty much idle), no any error message. Looks like it is receiving writes and reads, looks like it's part of the ring. Is this ring output from the Joining node or from one of the others ? Do the other nodes see this node as up or joining ? When starting the node was there a log line with Bootstrap variables ? Anyways I would try running a nodetool repair -pr on the joining node. If you are not using QUOURM / QUOURM you maybe getting inconsistent results now. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 29/01/2013, at 9:51 AM, Daning Wang dan...@netseer.com wrote: I add a new node to ring(version 1.1.6), after more than 30 hours, it is still in the 'Joining' state Address DC RackStatus State Load Effective-Ownership Token 141784319550391026443072753096570088105 10.28.78.123datacenter1 rack1 Up Normal 18.73 GB 50.00% 0 10.4.17.138 datacenter1 rack1 Up Normal 15 GB 39.29% 24305883351495604533098186245126300818 10.93.95.51 datacenter1 rack1 Up Normal 17.96 GB 41.67% 42535295865117307932921825928971026432 10.170.1.26 datacenter1 rack1 Up Joining 6.89 GB 0.00% 56713727820156410577229101238628035242 10.6.115.239datacenter1 rack1 Up Normal 20.3 GB 50.00% 85070591730234615865843651857942052864 10.28.20.200datacenter1 rack1 Up Normal 22.68 GB 60.71% 127605887595351923798765477786913079296 10.240.113.171 datacenter1 rack1 Up Normal 18.4 GB 58.33% 141784319550391026443072753096570088105 since after a while, the cpu usage goes down to 0, looks it is stuck. I have restarted server several times in last 30 hours. when server is just started, you can see streaming in 'nodetool netstats', but after a few
Re: too many warnings of Heap is full
Your latencies and distribution look fine. How big/what types of queries are you issuing? Are you issuing a lot of large multigets? Also, do either of these column families have secondary indexes? On Wed, Jan 30, 2013 at 2:59 PM, Guillermo Barbero guillermo.barb...@spotbros.com wrote: Iep, I missed the attachment... Also, these are the cfstats of 152: Column Family: CF_SBMessages SSTable count: 3 Space used (live): 967238560 Space used (total): 967238560 Number of Keys (estimate): 3263232 Memtable Columns Count: 2112 Memtable Data Size: 577135 Memtable Switch Count: 5 Read Count: 20702 Read Latency: 0.181 ms. Write Count: 19888 Write Latency: 0.059 ms. Pending Tasks: 0 Bloom Filter False Postives: 14 Bloom Filter False Ratio: 0.09375 Bloom Filter Space Used: 6171168 Compacted row minimum size: 536 Compacted row maximum size: 24601 Compacted row mean size: 850 Column Family: CF_users SSTable count: 3 Space used (live): 152205376 Space used (total): 152205376 Number of Keys (estimate): 343040 Memtable Columns Count: 154398 Memtable Data Size: 21159410 Memtable Switch Count: 9 Read Count: 11816 Read Latency: 1.348 ms. Write Count: 128751 Write Latency: 0.090 ms. Pending Tasks: 0 Bloom Filter False Postives: 0 Bloom Filter False Ratio: 0.0 Bloom Filter Space Used: 688464 Compacted row minimum size: 61 Compacted row maximum size: 3311 Compacted row mean size: 1235 And the cfstats of 153: Column Family: CF_SBMessages SSTable count: 5 Space used (live): 965495648 Space used (total): 965495648 Number of Keys (estimate): 3257216 Memtable Columns Count: 56541 Memtable Data Size: 15699960 Memtable Switch Count: 2 Read Count: 22475 Read Latency: 0.142 ms. Write Count: 21719 Write Latency: 0.073 ms. Pending Tasks: 0 Bloom Filter False Postives: 43 Bloom Filter False Ratio: 0.19545 Bloom Filter Space Used: 6161032 Compacted row minimum size: 536 Compacted row maximum size: 11864 Compacted row mean size: 872 Column Family: CF_users SSTable count: 2 Space used (live): 148762893 Space used (total): 148762893 Number of Keys (estimate): 12 Memtable Columns Count: 129725 Memtable Data Size: 17144125 Memtable Switch Count: 4 Read Count: 7440 Read Latency: 1.329 ms. Write Count: 127465 Write Latency: 0.093 ms. Pending Tasks: 0 Bloom Filter False Postives: 0 Bloom Filter False Ratio: 0.0 Bloom Filter Space Used: 694112 Compacted row minimum size: 61 Compacted row maximum size: 3311 Compacted row mean size: 1298 The messages are in a keyspace with a replication factor of 2 and the users are in a keyspace with a replication factor of 3. Ah, and we use the RandomPartitioner. Thanks again 2013/1/30 Guillermo Barbero guillermo.barb...@spotbros.com Guys, First thanks a lot for your answers.. now I have a little more info, lets see if this helps: As I said before we are using Cassandra 1.1.7 on a 6 nodes(Ips: 150-155) (Amazon EC2 XL instances 17GB RAM) and the problem is that in the last weeks we are seing that the performace of the cluster falls.. PHP commands that usually lasted a few milliseconds last up to 15 seconds for just 2/3 minutes every now and then (no specific pattern) (see commands timing in attached picture). Today we have found this: Prior to a fall in performance on all Cassandra nodes log I can see this: Node 150: INFO [GossipTasks:1] 2013-01-30 21:35:23,514 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead. INFO [GossipStage:1] 2013-01-30 21:35:40,666 Gossiper.java (line 817) InetAddress /10.0.0.152 is now UP INFO [HintedHandoff:1] 2013-01-30 21:35:40,667 HintedHandOffManager.java (line 296) Started hinted handoff for token: 56713727820156407428984779325531226112 with IP: /10.0.0.152 INFO [HintedHandoff:1] 2013-01-30 21:35:41,266 ColumnFamilyStore.java (line 659) Enqueuing flush of Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live bytes, 298 ops) INFO [FlushWriter:2879] 2013-01-30 21:35:41,267 Memtable.java (line 264) Writing Memtable-HintsColumnFamily@1264710747(84317/476272 serialized/live bytes, 298 ops) INFO [FlushWriter:2879] 2013-01-30 21:35:41,282 Memtable.java (line 305) Completed flushing /raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db (84559 bytes) for commitlog position ReplayPosition(segmentId=1355151995930, position=32659991) INFO [CompactionExecutor:9098] 2013-01-30 21:35:41,283 CompactionTask.java (line 109) Compacting [SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-275-Data.db'), SSTableReader(path='/raid0/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hf-276-Data.db')] Node 151: INFO [GossipTasks:1] 2013-01-30 21:35:25,689 Gossiper.java (line 831) InetAddress /10.0.0.152 is now dead. INFO [GossipStage:1] 2013-01-30 21:35:40,677 Gossiper.java (line 817) InetAddress /10.0.0.152
Re: Cass returns Incorrect column data on writes during flushing
The looks bug like, can you create a ticket on https://issues.apache.org/jira/browse/CASSANDRA Please include the C* version, the table and insert statements, and if you can repo is using CQL 3. Thanks Aaron - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 30/01/2013, at 8:10 AM, Elden Bishop ebis...@exacttarget.com wrote: Sure thing, Here is a console dump showing the error. Notice that column '9801' is NOT NULL on the first two queries but IS NULL on the last query. I get this behavior constantly on any writes that coincide with a flush. The column is always readable by itself but disappears depending on the other columns being queried. $ $ bin/cqlsh –2 cqlsh cqlsh SELECT '9801' FROM BUGS.Test WHERE KEY='a'; 9801 - 0.02271159951509616 cqlsh SELECT '9801','6814' FROM BUGS.Test WHERE KEY='a'; 9801| 6814 -+ 0.02271159951509616 | 0.6612351709326891 cqlsh SELECT '9801','6814','' FROM BUGS.Test WHERE KEY='a'; 9801 | 6814 | --++ null | 0.6612351709326891 | 0.8921380283891902 cqlsh exit; $ $ From: aaron morton aa...@thelastpickle.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Tuesday, January 29, 2013 12:21 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Cass returns Incorrect column data on writes during flushing Ie. Query for a single column works but the column does not appear in slice queries depending on the other columns in the query cfq.getKey(foo).getColumn(A) returns A cfq.getKey(foo).withColumnSlice(A, B) returns B only cfq.getKey(foo).withColumnSlice(A,B,C) returns A,B and C Can you replicate this using cassandra-cli or CQL ? Makes it clearer what's happening and removes any potential issues with the client or your code. If you cannot repo it show you astynax code. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 29/01/2013, at 1:15 PM, Elden Bishop ebis...@exacttarget.com wrote: I'm trying to track down some really worrying behavior. It appears that writing multiple columns while a table flush is occurring can result in Cassandra recording its data in a way that makes columns visible only to some queries but not others. Ie. Query for a single column works but the column does not appear in slice queries depending on the other columns in the query cfq.getKey(foo).getColumn(A) returns A cfq.getKey(foo).withColumnSlice(A, B) returns B only cfq.getKey(foo).withColumnSlice(A,B,C) returns A,B and C This is a permanent condition meaning that even hours later with no reads or writes the DB will return the same results. I can reproduce this 100% of the time by writing multiple columns and then reading a different set of multiple columns. Columns written during the flush may or may not appear. Details # There are no log errors # All single column queries return correct data. # Slice queries may or may not return the column depending on which other columns are in the query. # This is on a stock unzip and run installation of Cassandra using default options only; basically doing the cassandra getting started tutorial and using the Demo table described in that tutorial. # Cassandra 1.2.0 using Astynax and Java 1.6.0_37. # There are no errors but there is always a flushing high traffic column family that happens right before the incoherent state occurs # to reproduce just update multiple columns at the same time, using random rows and then verify the writes by reading multiple columns. I get can generate the error on 100% of runs. Once the state is screwed up, the multi column read will not contain the column but the single column read will. Log snippet INFO 15:47:49,066 GC for ParNew: 320 ms for 1 collections, 20712 used; max is 1052770304 INFO 15:47:58,076 GC for ParNew: 330 ms for 1 collections, 232839680 used; max is 1052770304 INFO 15:48:00,374 flushing high-traffic column family CFS(Keyspace='BUGS', ColumnFamily='Test') (estimated 50416978 bytes) INFO 15:48:00,374 Enqueuing flush of Memtable-Test@1575891161(4529586/50416978 serialized/live bytes, 279197 ops) INFO 15:48:00,378 Writing Memtable-Test@1575891161(4529586/50416978 serialized/live bytes, 279197 ops) INFO 15:48:01,142 GC for ParNew: 654 ms for 1 collections, 239478568 used; max is 1052770304 INFO 15:48:01,474 Completed flushing /var/lib/cassandra/data/BUGS/Test/BUGS-Test-ia-45-Data.db (4580066 bytes) for commitlog position ReplayPosition(segmentId=1359415964165, position=7462737) Any ideas on what could be going on? I could not find anything like this in the open bugs and the only workaround seems to be never doing multi-column
Re: why set replica placement strategy at keyspace level ?
I think a row mutation is isolated now, but is it across column families? Correct they are isolated, but only for an individual CF. By the way, the wiki page really needs updating. You can update if you would like to. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 30/01/2013, at 12:33 PM, Manu Zhang owenzhang1...@gmail.com wrote: On Tue 29 Jan 2013 03:39:17 PM CST, aaron morton wrote: So If I write to CF Users with rowkey=dean and to CF Schedules with rowkey=dean, it is actually one row? In my mental model that's correct. A RowMutation is a row key and a collection of (internal) ColumnFamilies which contain the columns to write for a single CF. This is the thing that is committed to the log, and then the changes in the ColumnFamilies are applied to each CF in an isolated way. .(must have missed that several times in the documentation). http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 29/01/2013, at 9:28 AM, Hiller, Dean dean.hil...@nrel.gov wrote: If you write to 4 CF's with the same row key that is considered one mutation Hm, I never considered this, never knew either.(very un-intuitive from a user perspective IMHO). So If I write to CF Users with rowkey=dean and to CF Schedules with rowkey=dean, it is actually one row? (it's so un-intuitive that I had to ask to make sure I am reading that correctly). I guess I really don't have that case since most of my row keys are GUID's anyways, but very interesting and unexpected (not sure I really mind, was just taken aback) Ps. Not sure I ever minded losting atomic commits to the same row across CF's as I never expected it in the first place having used cassandra for more than a year.(must have missed that several times in the documentation). Thanks, Dean On 1/28/13 12:41 PM, aaron morton aa...@thelastpickle.com wrote: Another thing that's been confusing me is that when we talk about the data model should the row key be inside or outside a column family? My mental model is: cluster == database keyspace == table row == a row in a table CF == a family of columns in one row (I think that's different to others, but it works for me) Is it important to store rows of different column families that share the same row key to the same node? Makes the failure models a little easier to understand. e.g. Everything key for user amorton is either available or not. Meanwhile, what's the drawback of setting RPS and RF at column family level? Other than it's baked in? We process all mutations for a row at the same time. If you write to 4 CF's with the same row key that is considered one mutation, for one row. That one RowMutation is directed to the replicas using the ReplicationStratagy and atomically applied to the commit log. If you have RS per CF that one mutation would be split into 4, which would then be sent to different replicas. Even if they went to the same replicas they would be written to the commit log as different mutations. So if you have RS per CF you lose atomic commits for writes to the same row. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 28/01/2013, at 11:22 PM, Manu Zhang owenzhang1...@gmail.com wrote: On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote: The row is the unit of replication, all values with the same storage engine row key in a KS are on the same nodes. if they were per CF this would not hold. Not that it would be the end of the world, but that is the first thing that comes to mind. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 27/01/2013, at 4:15 PM, Manu Zhang owenzhang1...@gmail.com wrote: Although I've got to know Cassandra for quite a while, this question only has occurred to me recently: Why are the replica placement strategy and replica factors set at the keyspace level? Would setting them at the column family level offers more flexibility? Is this because it's easier for user to manage an application? Or related to internal implementation? Or it's just that I've overlooked something? Is it important to store rows of different column families that share the same row key to the same node? AFAIK, Cassandra doesn't support get all of them in a single call. Meanwhile, what's the drawback of setting RPS and RF at column family level? Another thing that's been confusing me is that when we talk about the data model should the row key be inside or outside a column family? Thanks From that wiki page, mutations against a single key are atomic but not isolated. I think a row
Re: why set replica placement strategy at keyspace level ?
On Thu 31 Jan 2013 08:55:40 AM CST, aaron morton wrote: I think a row mutation is isolated now, but is it across column families? Correct they are isolated, but only for an individual CF. By the way, the wiki page really needs updating. You can update if you would like to. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 30/01/2013, at 12:33 PM, Manu Zhang owenzhang1...@gmail.com wrote: On Tue 29 Jan 2013 03:39:17 PM CST, aaron morton wrote: So If I write to CF Users with rowkey=dean and to CF Schedules with rowkey=dean, it is actually one row? In my mental model that's correct. A RowMutation is a row key and a collection of (internal) ColumnFamilies which contain the columns to write for a single CF. This is the thing that is committed to the log, and then the changes in the ColumnFamilies are applied to each CF in an isolated way. .(must have missed that several times in the documentation). http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 29/01/2013, at 9:28 AM, Hiller, Dean dean.hil...@nrel.gov wrote: If you write to 4 CF's with the same row key that is considered one mutation Hm, I never considered this, never knew either.(very un-intuitive from a user perspective IMHO). So If I write to CF Users with rowkey=dean and to CF Schedules with rowkey=dean, it is actually one row? (it's so un-intuitive that I had to ask to make sure I am reading that correctly). I guess I really don't have that case since most of my row keys are GUID's anyways, but very interesting and unexpected (not sure I really mind, was just taken aback) Ps. Not sure I ever minded losting atomic commits to the same row across CF's as I never expected it in the first place having used cassandra for more than a year.(must have missed that several times in the documentation). Thanks, Dean On 1/28/13 12:41 PM, aaron morton aa...@thelastpickle.com wrote: Another thing that's been confusing me is that when we talk about the data model should the row key be inside or outside a column family? My mental model is: cluster == database keyspace == table row == a row in a table CF == a family of columns in one row (I think that's different to others, but it works for me) Is it important to store rows of different column families that share the same row key to the same node? Makes the failure models a little easier to understand. e.g. Everything key for user amorton is either available or not. Meanwhile, what's the drawback of setting RPS and RF at column family level? Other than it's baked in? We process all mutations for a row at the same time. If you write to 4 CF's with the same row key that is considered one mutation, for one row. That one RowMutation is directed to the replicas using the ReplicationStratagy and atomically applied to the commit log. If you have RS per CF that one mutation would be split into 4, which would then be sent to different replicas. Even if they went to the same replicas they would be written to the commit log as different mutations. So if you have RS per CF you lose atomic commits for writes to the same row. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 28/01/2013, at 11:22 PM, Manu Zhang owenzhang1...@gmail.com wrote: On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote: The row is the unit of replication, all values with the same storage engine row key in a KS are on the same nodes. if they were per CF this would not hold. Not that it would be the end of the world, but that is the first thing that comes to mind. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 27/01/2013, at 4:15 PM, Manu Zhang owenzhang1...@gmail.com wrote: Although I've got to know Cassandra for quite a while, this question only has occurred to me recently: Why are the replica placement strategy and replica factors set at the keyspace level? Would setting them at the column family level offers more flexibility? Is this because it's easier for user to manage an application? Or related to internal implementation? Or it's just that I've overlooked something? Is it important to store rows of different column families that share the same row key to the same node? AFAIK, Cassandra doesn't support get all of them in a single call. Meanwhile, what's the drawback of setting RPS and RF at column family level? Another thing that's been confusing me is that when we talk about the data model should the row key be inside or outside a column family? Thanks From that wiki page, mutations against a single key are atomic but not isolated. I think a row mutation is isolated now, but is it across column families? By
Re: why set replica placement strategy at keyspace level ?
That should not bother you. For example, if your doing an hbase scan that crosses two column families, that count end up being two (disk) seeks. Having an API that hides the seeks from you does not give you better performance, it only helps you when your debating with people that do not understand the fundamentals.
Re: Cassandra pending compaction tasks keeps increasing
Some updates: Since we still have not fully turned on the system. We did something crazy today. We tried to treat the node as dead one. (My boss wants us to practice replacing a dead node before going to full production) and boot strap it. Here is what we did: * drain the node * check nodetool on other nodes, and this node is marked down (the token for this node is 100) * clear the data, commit log, saved cache * change initial_token from 100 to 99 in the yaml file * start the node * check nodetool, the down node of 100 disappeared by itself (!!) and new node with token 99 showed up * checked log, see the message saying bootstrap completed. But only a couple of MB streamed. * nodetool movetoken 98 * nodetool, see the node with token 98 comes up. * check log, s ee the message saying bootstrap completed. But still only a couple of MB streamed. The only reason I can think of is that the new node has the same IP as the dead node we tried to replace? Will that cause the symptom of no data streamed from other nodes? Other nodes still think the node had all the data? We had to do nodetool repair -pr to bring in the data. After 3 hours, 150G transferred. And no surprise, pending compaction tasks are now at 30K. There are about 30K SStable transferred and I guess all of them needs to be compacted since we use LCS. My concern is that if we did nothing wrong, replacing a dead node will cause such a hugh back log of pending compaction. It might take a week to clear that off. And we have RF = 3, we still need to bring in the data for the other two replicates since we use pr for nodetool repair. It will take about 3 weeks to fully replace a 200G node using LCS? We tried everything we can to speed up the compaction and no luck. The only thing I can think of is to increase the default size of SSTable, so less number of compaction will be needed. Can I just change it in yaml and restart C* and it will correct itself? Any side effect? Since we are using SSD, a bit bigger SSD won't slow down the read too much, I suppose that is the main concern for bigger size of SSTable? I think 1.2 comes with parallel LC which should help the situation. But we are not going to upgrade for a little while. Did I miss anything? It might not be practical to use LCS for 200G node? But if we use Sized compaction, we need to have at least 400G for the HD...Although SSD is cheap now, still hard to convince the management. three replicates + double the Disk for compaction? that is 6 times of the real data size! Sorry for the long email. Any suggestion or advice? Thanks. -Wei - Original Message - From: aaron morton aa...@thelastpickle.com To: Cassandra User user@cassandra.apache.org Sent: Tuesday, January 29, 2013 12:59:42 PM Subject: Re: Cassandra pending compaction tasks keeps increasing * Will try it tomorrow. Do I need to restart server to change the log level? You can set it via JMX, and supposedly log4j is configured to watch the config file. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 29/01/2013, at 9:36 PM, Wei Zhu wz1...@yahoo.com wrote: blockquote Thanks for the reply. Here is some information: Do you have wide rows ? Are you seeing logging about Compacting wide rows ? * I don't see any log about wide rows Are you seeing GC activity logged or seeing CPU steal on a VM ? * There is some GC, but CPU general is under 20%. We have heap size of 8G, RAM is at 72G. Have you tried disabling multithreaded_compaction ? * By default, it's disabled. We enabled it, but doesn't see much difference. Even a little slower with it's enabled. Is it bad to enable it? We have SSD, according to comment in yaml, it should help while using SSD. Are you using Key Caches ? Have you tried disabling compaction_preheat_key_cache? * We have fairly big Key caches, we set as 10% of Heap which is 800M. Yes, compaction_preheat_key_cache is disabled. Can you enabled DEBUG level logging and make them available ? * Will try it tomorrow. Do I need to restart server to change the log level? -Wei - Original Message - From: aaron morton aa...@thelastpickle.com To: user@cassandra.apache.org Sent: Monday, January 28, 2013 11:31:42 PM Subject: Re: Cassandra pending compaction tasks keeps increasing * Why nodetool repair increases the data size that much? It's not likely that much data needs to be repaired. Will that happen for all the subsequent repair? Repair only detects differences in entire rows. If you have very wide rows then small differences in rows can result in a large amount of streaming. Streaming creates new SSTables on the receiving side, which then need to be compacted. So repair often results in compaction doing it's thing for a while. * How to make LCS run faster? After almost a day,
CPU hotspot at BloomFilterSerializer#deserialize
Hi all, We have a situation that CPU loads on some of our nodes in a cluster has spiked occasionally since the last November, which is triggered by requests for rows that reside on two specific sstables. We confirmed the followings(when spiked): version: 1.0.7(current) - 0.8.6 - 0.8.5 - 0.7.8 jdk: Oracle 1.6.0 1. a profiling showed that BloomFilterSerializer#deserialize was the hotspot(70% of the total load by running threads) * the stack trace looked like this(simplified) 90.4% - org.apache.cassandra.db.ReadVerbHandler.doVerb 90.4% - org.apache.cassandra.db.SliceByNamesReadCommand.getRow ... 90.4% - org.apache.cassandra.db.CollationController.collectTimeOrderedData ... 89.5% - org.apache.cassandra.db.columniterator.SSTableNamesIterator.read ... 79.9% - org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter 68.9% - org.apache.cassandra.io.sstable.BloomFilterSerializer.deserialize 66.7% - java.io.DataInputStream.readLong 2. Usually, 1 should be so fast that a profiling by sampling can not detect 3. no pressure on Cassandra's VM heap nor on machine in overal 4. a little I/O traffic for our 8 disks/node(up to 100tps/disk by iostat 1 1000) 5. the problematic Data file contains only 5 to 10 keys data but large(2.4G) 6. the problematic Filter file size is only 256B(could be normal) So now, I am trying to read the Filter file in the same way BloomFilterSerializer#deserialize does as possible as I can, in order to see if the file is something wrong. Could you give me some advise on: 1. what is happening? 2. the best way to simulate the BloomFilterSerializer#deserialize 3. any more info required to proceed? Thanks, Takenori
Error when using CQL driver : No indexed columns present in by-columns clause with equals operator
Hi All, I have created a column family as follows. (With secondary indexes.) create column family users with comparator=UTF8Type and key_validation_class = 'UTF8Type' and default_validation_class = 'UTF8Type' and column_metadata=[{column_name: full_name, validation_class: UTF8Type}, {column_name: birth_year, validation_class: LongType, index_type: KEYS}, {column_name: state, validation_class: UTF8Type, index_type: KEYS}]; And I am using CQL driver-1.1.1 with Cassandra server-1.1.1. Once I try to execute the following query, it gives an exception saying 'No indexed columns present in by-columns clause with equals operator'. CQL : select * from users where birth_year1965 Caused by: java.sql.SQLSyntaxErrorException: No indexed columns present in by-columns clause with equals operator at org.apache.cassandra.cql.jdbc.CassandraPreparedStatement.doExecute(CassandraPreparedStatement.java:155) at org.apache.cassandra.cql.jdbc.CassandraPreparedStatement.executeQuery(CassandraPreparedStatement.java:199) Appreciate any help to resolve this.. Regards, Dinusha.