Re: pig + hadoop
Hi, everything works fine with cassandra 0.7.5, but when I tried with 0.7.3 another errors showed up, but task finished with success whats strange. 2011-04-20 11:45:40,674 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201104201139_0004_m_00_3: Error: java.lang.ClassNotF oundException: org.apache.thrift.TException at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426) at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:456) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java:153) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:105) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2011-04-20 11:45:43,629 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201104201139_0004_m_01_3: org.apache.pig.backend.exe cutionengine.ExecException: ERROR 2044: The type null cannot be collected as a Key type at org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:143) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:105) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2011-04-20 11:42:49,498 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201104201139_0001_m_00_1: Error: java.lang.ClassNotF oundException: org.apache.commons.lang.ArrayUtils at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at org.apache.cassandra.utils.ByteBufferUtil.clinit(ByteBufferUtil.java:75) at org.apache.cassandra.hadoop.pig.CassandraStorage.clinit(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426) at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:456) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java:153) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:105) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2011/4/20 Jeremy Hanna jeremy.hanna1...@gmail.com Just as an example: property namecassandra.thrift.address/name value10.12.34.56/value /property property namecassandra.thrift.port/name value9160/value /property property namecassandra.partitioner.class/name valueorg.apache.cassandra.dht.RandomPartitioner/value /property On Apr 19, 2011, at 10:28 PM, Jeremy Hanna wrote: oh yeah - that's what's going on. what I do is on the machine that I run the pig script from, I set the PIG_CONF variable to my HADOOP_HOME/conf directory and in my mapred-site.xml file found there, I set the three variables. I don't use environment variables when I run against a cluster. On Apr 19, 2011, at 9:54 PM, Jeffrey Wang wrote: Did you set PIG_RPC_PORT in your hadoop-env.sh? I was seeing this error for a while before I added that. -Jeffrey From: pob [mailto:peterob...@gmail.com] Sent: Tuesday,
Re: pig + hadoop
my false, ignore last post. 2011/4/20 pob peterob...@gmail.com Hi, everything works fine with cassandra 0.7.5, but when I tried with 0.7.3 another errors showed up, but task finished with success whats strange. 2011-04-20 11:45:40,674 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201104201139_0004_m_00_3: Error: java.lang.ClassNotF oundException: org.apache.thrift.TException at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426) at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:456) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java:153) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:105) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2011-04-20 11:45:43,629 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201104201139_0004_m_01_3: org.apache.pig.backend.exe cutionengine.ExecException: ERROR 2044: The type null cannot be collected as a Key type at org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:143) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:105) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2011-04-20 11:42:49,498 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201104201139_0001_m_00_1: Error: java.lang.ClassNotF oundException: org.apache.commons.lang.ArrayUtils at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at org.apache.cassandra.utils.ByteBufferUtil.clinit(ByteBufferUtil.java:75) at org.apache.cassandra.hadoop.pig.CassandraStorage.clinit(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:426) at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:456) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getLoadFunc(PigInputFormat.java:153) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:105) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) 2011/4/20 Jeremy Hanna jeremy.hanna1...@gmail.com Just as an example: property namecassandra.thrift.address/name value10.12.34.56/value /property property namecassandra.thrift.port/name value9160/value /property property namecassandra.partitioner.class/name valueorg.apache.cassandra.dht.RandomPartitioner/value /property On Apr 19, 2011, at 10:28 PM, Jeremy Hanna wrote: oh yeah - that's what's going on. what I do is on the machine that I run the pig script from, I set the PIG_CONF variable to my HADOOP_HOME/conf directory and in my mapred-site.xml file found there, I set the three variables. I don't use environment variables when I run against a cluster. On Apr 19, 2011, at 9:54 PM, Jeffrey Wang wrote: Did you set PIG_RPC_PORT in your
Re: Tombstones and memtable_operations
Looks like a bug, I've added a patch here https://issues.apache.org/jira/browse/CASSANDRA-2519 Aaron On 20 Apr 2011, at 13:15, aaron morton wrote: Thats what I was looking for, thanks. At first glance the behaviour looks inconsistent, we count the number of columns in the delete mutation. But when deleting a row the column count is zero. I'll try to take a look later. In the mean time you can force a memtable via JConsole, navigate down to the CF and look for the forceFlush() operation. Aaron On 20 Apr 2011, at 09:39, Héctor Izquierdo Seliva wrote: El mié, 20-04-2011 a las 09:08 +1200, aaron morton escribió: Yes, I saw that. Wanted to know what issue deletes through pelops means so I can work out what command it's sending to cassandra and hopefully I don't waste my time looking in the wrong place. Aaron Oh, sorry. Didn't get what you were asking. I use this code: RowDeletor deletor = Pelops.createRowDeletor(keySpace); deletor.deleteRow(cf, rowId, ConsistencyLevel.QUORUM); which seems to be calling org.apache.cassandra.thrift.Cassandra.Client.remove. I hope this is useful
Re: Tombstones and memtable_operations
El mié, 20-04-2011 a las 23:00 +1200, aaron morton escribió: Looks like a bug, I've added a patch here https://issues.apache.org/jira/browse/CASSANDRA-2519 Aaron That was fast! Thanks Aaron
Question about AbstractType class
Cassandra version 0.7.4 Hi, I created my own java class as an extension of the AbstractType class. But I'm not sure about the following items related to the compare function : # The remaining bytes of the buffer sometimes is zero during thrift get_slice execution, however I never store any zero length column name nor query for it . If normal, what would be the correct handling of the zero remaining bytes? Would it be something like : public int compare(ByteBuffer o1, ByteBuffer o2){ int ar1Rem = o1.remaining(); int ar2Rem = o2.remaining(); if ( ar1Rem == 0 || ar2Rem == 0 ) { if ( ar1Rem != 0 ) { return 1; } else if ( ar2Rem != 0 ) { return -1; } else { return 0; } } //Add the real compare here ...} # Since in version 0.6.3 the same function was passing an array of bytes, I assumed that I could now call the ByteBuffer.array() function in order to get the array of bytes backing up the ByteBuffer. Also the length of the byte array in 0.6.3 seemed always to correspond to the bytes of column name stored. But now in version 0.7.4 that ByteBuffer is not always backed by such an array. I can still get around this by making the needed buffer myself like : int ar2Rem = o2.remaining(); byte[] ar2 = new byte[ar2Rem]; o2.get(ar2, 0, ar2Rem); Question is : Are the remaining bytes the actual bytes for this column name (eg: 20 bytes) or would that ByteBuffer ever be some wrapper around some larger stream of data and the remaining bytes number could be 10 M bytes. Thus I would not be able to detect the end of the column to compare and I would possibly be allocating a large unneeded byte array? #Using the ByteBuffer's 'get' function also updates the position of the ByteBuffer. Is the compare function expected to do that or should it reset the position back to what it was or ...? Or maybe there is some good documentation I should read? Ignace
Re: Different result after restart
Checking the simple things first, are you using the o.a.c.service.EmbeddedCassandraService or the o.a.c.EmbeddedServer in the unit test directory ? The later deletes data, but it does not sound like you are using it. When the server starts it should read any commit logs, roll them forward and then flush all the changes to SSTables. Which will result in the log files been deleted from disk, and you should see INFO level log messages that say Discarding obsolete commit log:... Do you get new SSTables written at start up ? If you wanted to confirm the data was there take a look at bin/sstable2json Hope that helps. Aaron On 20 Apr 2011, at 23:00, Desimpel, Ignace wrote: Cassandra version 0.7.4 Hi, I’m storing (no deletion) in a small test some records to an embedded Cassandra instance. Then I connect using Thrift and I can retrieve the data as excepted. Then I restart the server with the embedded Cassandra, reconnect using Thrift but now the same query gives me no results at all. After restart the commitlog directory get cleared leaving only a small log and a small log.header file. The data directory for the keyspace is still present together with the db files corresponding the column families. Any idea what I would be doing wrong here? Ignace
Re: Question about AbstractType class
On Wed, Apr 20, 2011 at 1:35 PM, Desimpel, Ignace ignace.desim...@nuance.com wrote: Cassandra version 0.7.4 Hi, I created my own java class as an extension of the AbstractType class. But I’m not sure about the following items related to the compare function : # The remaining bytes of the buffer sometimes is zero during thrift get_slice execution, however I never store any zero length column name nor query for it . If normal, what would be the correct handling of the zero remaining bytes? It is normal, the empty ByteBuffer is used in slice queries to indicate the beginning of the row (start=). More generally, compare and validate should work for anything you store but also anything you provide for the 'start' and 'end' argument of slices. Would it be something like : public int compare(ByteBuffer o1, ByteBuffer o2){ int ar1Rem = o1.remaining(); int ar2Rem = o2.remaining(); if ( ar1Rem == 0 || ar2Rem == 0 ) { if ( ar1Rem != 0 ) { return 1; } else if ( ar2Rem != 0 ) { return -1; } else { return 0; } } //Add the real compare here …….} That looks reasonable (though not optimal in the number of comparison :)) # Since in version 0.6.3 the same function was passing an array of bytes, I assumed that I could now call the ByteBuffer.array() function in order to get the array of bytes backing up the ByteBuffer. It's not that simple. First, even if you use ByteBuffer.array(), you'll have to be careful that the ByteBuffer has a position, a limit and an arrayOffset and you should take that into account when accessing the backing array. But there is also no guarantee that the ByteBuffer will have a backing array so you need to handle this case too (I refer you to the ByteBuffer documentation). Also the length of the byte array in 0.6.3 seemed always to correspond to the bytes of column name stored. But now in version 0.7.4 that ByteBuffer is not always backed by such an array. I can still get around this by making the needed buffer myself like : int ar2Rem = o2.remaining(); byte[] ar2 = new byte[ar2Rem]; o2.get(ar2, 0, ar2Rem); Question is : Are the remaining bytes the actual bytes for this column name (eg: 20 bytes) or would that ByteBuffer ever be some wrapper around some larger stream of data and the remaining bytes number could be 10 M bytes. Thus I would not be able to detect the end of the column to compare and I would possibly be allocating a large unneeded byte array? As said above, the remaing bytes won't (always) be the actual bytes. #Using the ByteBuffer’s ‘get’ function also updates the position of the ByteBuffer. Is the compare function expected to do that or should it reset the position back to what it was or …? Neither. You should *not* use any function that change the ByteBuffer position. That is, changing it and resetting it afterward is *not* ok. Instead you should only use only the absolute get() methods, that do not change the position at all. Or, you start your compare function by calling BB.duplicate() on both buffers and then you're free to change the position of the duplicates. -- Sylvain
RE: Different result after restart
I'm using the org.apache.cassandra.thrift.CassandraDeamon implementation. I have done the same with version 0.6.x but now modified the code for version 0.7.4. I could restart without problem in 0.6.x. I get (did not add them all) the following messages : (Keyspace is 'SearchSpace', CF names like 'ReverseStringValues' 'ReverseLabelValues' , Structure', ... ) 2011-04-20 12:27:48 INFO AbstractCassandraDaemon - Logging initialized 2011-04-20 12:27:48 INFO AbstractCassandraDaemon - Heap size: 10719985664/10719985664 2011-04-20 12:27:48 WARN CLibrary - Obsolete version of JNA present; unable to register C library. Upgrade to JNA 3.2.7 or later 2011-04-20 12:27:48 INFO DatabaseDescriptor - Loading settings from file:C:/develop/configs/AnnotationServer7/properties/cassandra.yaml 2011-04-20 12:27:48 INFO DatabaseDescriptor - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap ... 2011-04-20 12:27:49 INFO CommitLogSegment - Creating new commitlog segment ../cassandra/dbcommitlog\CommitLog-1303295269011.log 2011-04-20 12:27:49 INFO CommitLog - Replaying ..\cassandra\dbcommitlog\CommitLog-1303294884815.log 2011-04-20 12:27:54 INFO CommitLog - Finished reading ..\cassandra\dbcommitlog\CommitLog-1303294884815.log 2011-04-20 12:27:54 INFO ColumnFamilyStore - Enqueuing flush of Memtable-ReverseStringValues@249550264(8519832 bytes, 189992 operations) 2011-04-20 12:27:54 INFO Memtable - Writing Memtable-ReverseStringValues@249550264(8519832 bytes, 189992 operations) 2011-04-20 12:27:54 INFO ColumnFamilyStore - Enqueuing flush of Memtable-ReverseLabelValues@1617914474(1339548 bytes, 31894 operations) 2011-04-20 12:27:54 INFO Memtable - Writing Memtable-ReverseLabelValues@1617914474(1339548 bytes, 31894 operations) 2011-04-20 12:27:55 INFO ColumnFamilyStore - Enqueuing flush of Memtable-ForwardLabelValues@1924550782(1339548 bytes, 31894 operations) 2011-04-20 12:27:55 INFO Memtable - Writing Memtable-ForwardLabelValues@1924550782(1339548 bytes, 31894 operations) ... 2011-04-20 12:27:58 INFO CommitLog - Log replay complete 2011-04-20 12:27:57 INFO CompactionManager - Compacting [SSTableReader(path='..\cassandra\dbdatafile\SearchSpace\Structure-f-1-D ata.db'),SSTableReader(path='..\cassandra\dbdatafile\SearchSpace\Structu re-f-2-Data.db')] ... 2011-04-20 12:27:57 INFO ColumnFamilyStore - Enqueuing flush of Memtable-ReverseDoubleValues@1985313813(56946 bytes, 1265 operations) 2011-04-20 12:27:57 INFO Memtable - Writing Memtable-ReverseDoubleValues@1985313813(56946 bytes, 1265 operations) 2011-04-20 12:27:57 INFO ColumnFamilyStore - Enqueuing flush of Memtable-Documents@1715831652(1872209 bytes, 36 operations) 2011-04-20 12:27:57 INFO Memtable - Writing Memtable-Documents@1715831652(1872209 bytes, 36 operations) 2011-04-20 12:27:59 INFO CompactionManager - Compacting [SSTableReader(path='..\cassandra\dbdatafile\system\LocationInfo-f-1-Dat a.db'),SSTableReader(path='..\cassandra\dbdatafile\system\LocationInfo-f -2-Data.db'),SSTableReader(path='..\cassandra\dbdatafile\system\Location Info-f-3-Data.db'),SSTableReader(path='..\cassandra\dbdatafile\system\Lo cationInfo-f-4-Data.db')] 2011-04-20 12:27:59 INFO Mx4jTool - Will not load MX4J, mx4j-tools.jar is not in the classpath 2011-04-20 12:27:59 INFO CassandraDaemon - Binding thrift service to GH-DSK0178.nuance.com/10.184.56.115:9160 2011-04-20 12:27:59 INFO CassandraDaemon - Listening for thrift clients... 2011-04-20 12:27:59 INFO CompactionManager - Compacted to ..\cassandra\dbdatafile\system\LocationInfo-tmp-f-5-Data.db. 751 to 457 (~60% of original) bytes for 3 keys. Time: 370ms. 2011-04-20 12:28:02 INFO SearchServer - Annotation labels present. Count labels : 568 I do not get any message saying 'Discarding obsolete ..'. The replayed commit file however is deleted. The only one left is the new CommitLog-1303295269011. But I thought that is normal at restart. Some data is still there and could be queried and that is why there is my own message saying '..Annotation labels present. Count labels : 568...' The column family I wanted to access in my query is ForwardLabelValues (also present in the log extraction here). And the size of the file on disk is 3.5 m bytes. Also the query I specify is one that should get all the 'records'. I did try the sstable2json but must be doing something wrong. I got : sstable2json C:\develop\configs\AnnotationServer7\cassandra\dbdatafile\SearchSpace\Fo rwardLabelValues-f-1-Data.db no non-system tables are defined Exception in thread main org.apache.cassandra.config.ConfigurationException: no non-system tables are defined at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:457) Thanks, Ignace From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Wednesday, April 20, 2011 1:40 PM To: user@cassandra.apache.org Subject: Re: Different result after restart Checking the simple things first, are you using the
RE: Question about AbstractType class
-Original Message- From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: Wednesday, April 20, 2011 2:07 PM To: user@cassandra.apache.org Subject: Re: Question about AbstractType class On Wed, Apr 20, 2011 at 1:35 PM, Desimpel, Ignace ignace.desim...@nuance.com wrote: Cassandra version 0.7.4 Hi, I created my own java class as an extension of the AbstractType class. But I'm not sure about the following items related to the compare function : # The remaining bytes of the buffer sometimes is zero during thrift get_slice execution, however I never store any zero length column name nor query for it . If normal, what would be the correct handling of the zero remaining bytes? It is normal, the empty ByteBuffer is used in slice queries to indicate the beginning of the row (start=). More generally, compare and validate should work for anything you store but also anything you provide for the 'start' and 'end' argument of slices. Would it be something like : public int compare(ByteBuffer o1, ByteBuffer o2){ int ar1Rem = o1.remaining(); int ar2Rem = o2.remaining(); if ( ar1Rem == 0 || ar2Rem == 0 ) { if ( ar1Rem != 0 ) { return 1; } else if ( ar2Rem != 0 ) { return -1; } else { return 0; } } //Add the real compare here ...} That looks reasonable (though not optimal in the number of comparison :)) -OK # Since in version 0.6.3 the same function was passing an array of bytes, I assumed that I could now call the ByteBuffer.array() function in order to get the array of bytes backing up the ByteBuffer. It's not that simple. First, even if you use ByteBuffer.array(), you'll have to be careful that the ByteBuffer has a position, a limit and an arrayOffset and you should take that into account when accessing the backing array. But there is also no guarantee that the ByteBuffer will have a backing array so you need to handle this case too (I refer you to the ByteBuffer documentation). -OK Also the length of the byte array in 0.6.3 seemed always to correspond to the bytes of column name stored. But now in version 0.7.4 that ByteBuffer is not always backed by such an array. I can still get around this by making the needed buffer myself like : int ar2Rem = o2.remaining(); byte[] ar2 = new byte[ar2Rem]; o2.get(ar2, 0, ar2Rem); Question is : Are the remaining bytes the actual bytes for this column name (eg: 20 bytes) or would that ByteBuffer ever be some wrapper around some larger stream of data and the remaining bytes number could be 10 M bytes. Thus I would not be able to detect the end of the column to compare and I would possibly be allocating a large unneeded byte array? As said above, the remaing bytes won't (always) be the actual bytes. -Then how do I know the end is near? Eg.: If the stored value is a char string, it would be nice to know the end. Unless I also store it before the char string. -Assuming that both ByteBuffers have the same data and the same position and limit, thus same remaining, one can imagine a loop comparing each byte until the remaining is used up. Thus then I can not get any more data and thus I should return 0? #Using the ByteBuffer's 'get' function also updates the position of the ByteBuffer. Is the compare function expected to do that or should it reset the position back to what it was or ...? Neither. You should *not* use any function that change the ByteBuffer position. That is, changing it and resetting it afterward is *not* ok. -OK Instead you should only use only the absolute get() methods, that do not change the position at all. Or, you start your compare function by calling BB.duplicate() on both buffers and then you're free to change the position of the duplicates. -OK -- Sylvain Thanks Sylvain!
RE: Different result after restart
Aaron, Already found out what the problem was. I was using an AbstractType comparator for a column family. That code was changing the given ByteBuffer position and was not supposed to do that (Hinted by Sylvain Lebresne !). Anyway, after correcting that problem I got back the results as before. Still don't grasp how this relates to the restart of the server, but I'am happy as is. Thanks very much Aaron! Ignace From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Wednesday, April 20, 2011 1:40 PM To: user@cassandra.apache.org Subject: Re: Different result after restart Checking the simple things first, are you using the o.a.c.service.EmbeddedCassandraService or the o.a.c.EmbeddedServer in the unit test directory ? The later deletes data, but it does not sound like you are using it. When the server starts it should read any commit logs, roll them forward and then flush all the changes to SSTables. Which will result in the log files been deleted from disk, and you should see INFO level log messages that say Discarding obsolete commit log:... Do you get new SSTables written at start up ? If you wanted to confirm the data was there take a look at bin/sstable2json Hope that helps. Aaron On 20 Apr 2011, at 23:00, Desimpel, Ignace wrote: Cassandra version 0.7.4 Hi, I'm storing (no deletion) in a small test some records to an embedded Cassandra instance. Then I connect using Thrift and I can retrieve the data as excepted. Then I restart the server with the embedded Cassandra, reconnect using Thrift but now the same query gives me no results at all. After restart the commitlog directory get cleared leaving only a small log and a small log.header file. The data directory for the keyspace is still present together with the db files corresponding the column families. Any idea what I would be doing wrong here? Ignace
Re: Question about AbstractType class
On Wed, Apr 20, 2011 at 3:06 PM, Desimpel, Ignace ignace.desim...@nuance.com wrote: As said above, the remaing bytes won't (always) be the actual bytes. Sorry I answered a bit quickly, I meant to say that the actual bytes won't (always) be the full backing array. That is, we never guarantee that BB.arrayOffset() == 0, nor BB.position() == 0, nor BB.limit() == backingArray.length. But the remaining() bytes will be the actual bytes, my bad. -- Sylvain
RE: Question about AbstractType class
Thanks Sylvain. Your answer already helped me out a lot! I was using a ByteBuffer.get function that is changing the ByteBuffer's position. And I got all kinds of stranges effects and exceptions I didn't get in 0.6.x. Changed that code and all problems are gone... Many thanks!! Ignace -Original Message- From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: Wednesday, April 20, 2011 4:04 PM To: user@cassandra.apache.org Subject: Re: Question about AbstractType class On Wed, Apr 20, 2011 at 3:06 PM, Desimpel, Ignace ignace.desim...@nuance.com wrote: As said above, the remaing bytes won't (always) be the actual bytes. Sorry I answered a bit quickly, I meant to say that the actual bytes won't (always) be the full backing array. That is, we never guarantee that BB.arrayOffset() == 0, nor BB.position() == 0, nor BB.limit() == backingArray.length. But the remaining() bytes will be the actual bytes, my bad. -- Sylvain
NotSerializableException of FutureTask
Using own JMX java code and when using the NodeTool I get the following exception when calling the forceFlush function. But it seems that the flushing itself is started although the exception occurred. Any idea? (running jdk 1.6, 64 bits) Ignace 2011-04-20 16:23:45 INFO ColumnFamilyStore - Enqueuing flush of Memtable-ReverseIntegerValues@75939304(2274472 bytes, 48892 operations) 2011-04-20 16:23:45 INFO Memtable - Writing Memtable-ReverseIntegerValues@75939304(2274472 bytes, 48892 operations) java.rmi.UnmarshalException: error unmarshalling return; nested exception is: java.io.WriteAbortedException: writing aborted; java.io.NotSerializableException: java.util.concurrent.FutureTask at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:173) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.inv oke(RMIConnector.java:993) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocati onHandler.java:288) at $Proxy7.forceFlush(Unknown Source) at be.landc.services.search.server.db.indexsearch.store.cassandra.Cassandra Store$CassNodeProbe.doRepairAll(CassandraStore.java:160) at be.landc.services.search.server.db.indexsearch.store.cassandra.Cassandra Store$CassNodeProbe.run(CassandraStore.java:141) Caused by: java.io.WriteAbortedException: writing aborted; java.io.NotSerializableException: java.util.concurrent.FutureTask at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1332) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at sun.rmi.server.UnicastRef.unmarshalValue(UnicastRef.java:306) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:155) ... 7 more Caused by: java.io.NotSerializableException: java.util.concurrent.FutureTask at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1164) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:330) at sun.rmi.server.UnicastRef.marshalValue(UnicastRef.java:274) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:315) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.j ava:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.ja va:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto r.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja va:908) at java.lang.Thread.run(Thread.java:662) 2011-04-20 16:23:45 INFO SearchServer - == Starting flush for column family : SearchSpace / ForwardLongValues 2011-04-20 16:23:45 INFO ColumnFamilyStore - Enqueuing flush of Memtable-ForwardLongValues@710396564(26780468 bytes, 623958 operations)
Re: cluster IP question and Jconsole?
Maki, Yes you are right, 8081 is mx4j port, the JMX_PORT is 8001 in the cassandra-env.sh. in the cassandra Linux server itself, I can run this successfully: nodetool -host x -p 8001 ring x is the actually IP address however when I run the same command in another windows machine(which has the cassandra windows version extracted), I am getting exception like below, one thing puzzled me is that the command trying to connect to ip x, but the exception claimed: Connection refused to host: 127.0.0.1. Is there anything else that I need to config or...? I guess this is probably the reason that jconsole can't connect to port 8001 remotely either? Thanks for any advice! D:\apache-cassandra-0.7.4\binnodetool -host x -p 8001 ring Starting NodeTool Error connection to remote JMX agent! java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: java.net.ConnectException: Connection refused: connect at sun.rmi.transport.tcp.TCPEndpoint.newSocket(Unknown Source) at sun.rmi.transport.tcp.TCPChannel.createConnection(Unknown Source) at sun.rmi.transport.tcp.TCPChannel.newConnection(Unknown Source) at sun.rmi.server.UnicastRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown Source) at javax.management.remote.rmi.RMIConnector.getConnection(Unknown Source) at javax.management.remote.rmi.RMIConnector.connect(Unknown Source) at javax.management.remote.JMXConnectorFactory.connect(Unknown Source) at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:137) at org.apache.cassandra.tools.NodeProbe.init(NodeProbe.java:107) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:511) Caused by: java.net.ConnectException: Connection refused: connect at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(Unknown Source) at java.net.PlainSocketImpl.connectToAddress(Unknown Source) at java.net.PlainSocketImpl.connect(Unknown Source) at java.net.SocksSocketImpl.connect(Unknown Source) at java.net.Socket.connect(Unknown Source) at java.net.Socket.connect(Unknown Source) at java.net.Socket.init(Unknown Source) at java.net.Socket.init(Unknown Source) at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(Unknown Source) at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(Unknown Source) ... 11 more -Original Message- From: Watanabe Maki Sent: Saturday, April 16, 2011 1:45 AM To: user@cassandra.apache.org Cc: user@cassandra.apache.org Subject: Re: cluster IP question and Jconsole? 8081 is your mx4j port, isn't it? You need to connect jconsole to JMX_PORT specified in cassandra-env.sh. maki From iPhone On 2011/04/16, at 13:56, tinhuty he tinh...@hotmail.com wrote: Maki, thanks for your reply. for the second question, I wasn't using the loopback address, I was using the actually IP address for that server. I am able to telnet to that IP on port 8081, but using jconsole failed. -Original Message- From: Maki Watanabe Sent: Friday, April 15, 2011 9:43 PM To: user@cassandra.apache.org Cc: tinhuty he Subject: Re: cluster IP question and Jconsole? 127.0.0.2 to 127.0.0.5 are valid IP addresses. Those are just alias addresses for your loopback interface. Verify: % ifconfig -a 127.0.0.0/8 is for loopback, so you can't connect this address from remote machines. You may be able configure SSH port forwarding from your monitroing host to cassandra node though I haven't try. maki 2011/4/16 tinhuty he tinh...@hotmail.com: I have followed the description here http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/lauching_5_node_cassandra_clusters to created 5 instances of cassandra in one CentOS 5.5 machine. using nodetool shows the 5 nodes are all running fine. Note the 5 nodes are using IP 127.0.0.1 to 127.0.0.5. I understand 127.0.0.1 is pointing to local server, but how about 127.0.0.2 to 127.0.0.5? looks to me that they are not valid IP? how come all 5 nodes are working ok? Another question. I have installed MX4J in instance 127.0.0.1 on port 8081. I am able to connect to http://server:8081/ from the browser. However how do I connect using Jconsole that was installed in another windows machines?(since my CentOS5.5 doesn't have X installed, only SSH allowed). Thanks.
Re: Ec2Snitch + NetworkTopologyStrategy if only in one region?
Also for the new users like me, don't assume DC1 is a keyword like I did. A working example of a keyspace in EC2 is: create keyspace test with replication_factor=3 and strategy_options = [{us-east:3}] and placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy'; For a single DC in EC2 deployment. I felt silly afterwards, but I couldn't find official docs on the structure of strategy_options anywhere. will On Wed, Apr 13, 2011 at 5:14 PM, William Oberman ober...@civicscience.comwrote: One last coda, for other noobs to cassandra like me. If you use NetworkTopologyStrategy with replication_factor 1, make sure you have EC2 instance in multiple availability zones. I was doing baby steps, and tried doing a cluster in one AZ (before spreading to multiple AZs) and was getting the most baffling errors (cassandra_UnavailableException). I finally thought to check the cassandra server logs (after debugging the client code, firewalls, etc... painstakingly for connectivity problems), and it ends up my cassandra cluster was considering itself unavailable as it couldn't replicate as much as it wanted to. I kind of wish a different word than unavailable was chosen for this error condition :-) will On Tue, Apr 12, 2011 at 10:37 PM, aaron morton aa...@thelastpickle.comwrote: If you can use standard + encoded I would go with that. Aaron On 13 Apr 2011, at 07:07, William Oberman wrote: Excellent to know! (and yes, I figure I'll expand someday, so I'm glad I found this out before digging a hole). The other issue I've been pondering is a normal column family of encoded objects (in my case JSON) vs. a super column. Based on my use case, things I've read, etc... right now I'm coming down on normal + encoded. will On Tue, Apr 12, 2011 at 2:57 PM, Jonathan Ellis jbel...@gmail.comwrote: NTS is overkill in the sense that it doesn't really benefit you in a single DC, but if you think you may expand to another DC in the future it's much simpler if you were already using NTS, than first migrating to NTS (changing strategy is painful). I can't think of any downsides to using NTS in a single-DC environment, so that's the safe option. On Tue, Apr 12, 2011 at 1:15 PM, William Oberman ober...@civicscience.com wrote: Hi, I'm getting closer to commiting to cassandra, and now I'm in system/IT issues and questions. I'm in the amazon EC2 cloud. I previously used this forum to discover the best practice for disk layouts (large instance + the two ephemeral disks in RAID0 for data + root volume for everything else). Now I'm hoping to confirm bits and pieces of things I've read about for snitch/replication strategies. I was thinking of using endpoint_snitch: org.apache.cassandra.locator.Ec2Snitch placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy' (for people hitting this from the mailing list or google, I feel obligated to note that the former setting is in cassandra.yaml, and the latter is an option on a keyspace). But, I'm only in one region. Is using the amazon snitch/networktopology overkill given everything I have is in one DC (I believe region==DC and availability_zone==rack). I'm using multiple availability zones for some level of redundancy, I'm just not yet to the point I'm using multiple regions. If someday I move to using multiple regions, would that change the answer? Thanks! -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com
Re: NotSerializableException of FutureTask
You must be using an old Cassandra and/or nodetool; current nodetool calls forceBlockingFlush which does not try to return a Future over JMX. On Wed, Apr 20, 2011 at 9:38 AM, Desimpel, Ignace ignace.desim...@nuance.com wrote: Using own JMX java code and when using the NodeTool I get the following exception when calling the forceFlush function. But it seems that the flushing itself is started although the exception occurred. Any idea? (running jdk 1.6, 64 bits) Ignace 2011-04-20 16:23:45 INFO ColumnFamilyStore - Enqueuing flush of Memtable-ReverseIntegerValues@75939304(2274472 bytes, 48892 operations) 2011-04-20 16:23:45 INFO Memtable - Writing Memtable-ReverseIntegerValues@75939304(2274472 bytes, 48892 operations) java.rmi.UnmarshalException: error unmarshalling return; nested exception is: java.io.WriteAbortedException: writing aborted; java.io.NotSerializableException: java.util.concurrent.FutureTask at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:173) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:993) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:288) at $Proxy7.forceFlush(Unknown Source) at be.landc.services.search.server.db.indexsearch.store.cassandra.CassandraStore$CassNodeProbe.doRepairAll(CassandraStore.java:160) at be.landc.services.search.server.db.indexsearch.store.cassandra.CassandraStore$CassNodeProbe.run(CassandraStore.java:141) Caused by: java.io.WriteAbortedException: writing aborted; java.io.NotSerializableException: java.util.concurrent.FutureTask at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1332) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at sun.rmi.server.UnicastRef.unmarshalValue(UnicastRef.java:306) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:155) ... 7 more Caused by: java.io.NotSerializableException: java.util.concurrent.FutureTask at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1164) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:330) at sun.rmi.server.UnicastRef.marshalValue(UnicastRef.java:274) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:315) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2011-04-20 16:23:45 INFO SearchServer - == Starting flush for column family : SearchSpace / ForwardLongValues 2011-04-20 16:23:45 INFO ColumnFamilyStore - Enqueuing flush of Memtable-ForwardLongValues@710396564(26780468 bytes, 623958 operations) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: cluster IP question and Jconsole?
See the first entry in http://wiki.apache.org/cassandra/JmxGotchas On Wed, Apr 20, 2011 at 9:54 AM, tinhuty he tinh...@hotmail.com wrote: Maki, Yes you are right, 8081 is mx4j port, the JMX_PORT is 8001 in the cassandra-env.sh. in the cassandra Linux server itself, I can run this successfully: nodetool -host x -p 8001 ring x is the actually IP address however when I run the same command in another windows machine(which has the cassandra windows version extracted), I am getting exception like below, one thing puzzled me is that the command trying to connect to ip x, but the exception claimed: Connection refused to host: 127.0.0.1. Is there anything else that I need to config or...? I guess this is probably the reason that jconsole can't connect to port 8001 remotely either? Thanks for any advice! D:\apache-cassandra-0.7.4\binnodetool -host x -p 8001 ring Starting NodeTool Error connection to remote JMX agent! java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: java.net.ConnectException: Connection refused: connect at sun.rmi.transport.tcp.TCPEndpoint.newSocket(Unknown Source) at sun.rmi.transport.tcp.TCPChannel.createConnection(Unknown Source) at sun.rmi.transport.tcp.TCPChannel.newConnection(Unknown Source) at sun.rmi.server.UnicastRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown Source) at javax.management.remote.rmi.RMIConnector.getConnection(Unknown Source) at javax.management.remote.rmi.RMIConnector.connect(Unknown Source) at javax.management.remote.JMXConnectorFactory.connect(Unknown Source) at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:137) at org.apache.cassandra.tools.NodeProbe.init(NodeProbe.java:107) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:511) Caused by: java.net.ConnectException: Connection refused: connect at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(Unknown Source) at java.net.PlainSocketImpl.connectToAddress(Unknown Source) at java.net.PlainSocketImpl.connect(Unknown Source) at java.net.SocksSocketImpl.connect(Unknown Source) at java.net.Socket.connect(Unknown Source) at java.net.Socket.connect(Unknown Source) at java.net.Socket.init(Unknown Source) at java.net.Socket.init(Unknown Source) at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(Unknown Source) at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(Unknown Source) ... 11 more -Original Message- From: Watanabe Maki Sent: Saturday, April 16, 2011 1:45 AM To: user@cassandra.apache.org Cc: user@cassandra.apache.org Subject: Re: cluster IP question and Jconsole? 8081 is your mx4j port, isn't it? You need to connect jconsole to JMX_PORT specified in cassandra-env.sh. maki From iPhone On 2011/04/16, at 13:56, tinhuty he tinh...@hotmail.com wrote: Maki, thanks for your reply. for the second question, I wasn't using the loopback address, I was using the actually IP address for that server. I am able to telnet to that IP on port 8081, but using jconsole failed. -Original Message- From: Maki Watanabe Sent: Friday, April 15, 2011 9:43 PM To: user@cassandra.apache.org Cc: tinhuty he Subject: Re: cluster IP question and Jconsole? 127.0.0.2 to 127.0.0.5 are valid IP addresses. Those are just alias addresses for your loopback interface. Verify: % ifconfig -a 127.0.0.0/8 is for loopback, so you can't connect this address from remote machines. You may be able configure SSH port forwarding from your monitroing host to cassandra node though I haven't try. maki 2011/4/16 tinhuty he tinh...@hotmail.com: I have followed the description here http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/lauching_5_node_cassandra_clusters to created 5 instances of cassandra in one CentOS 5.5 machine. using nodetool shows the 5 nodes are all running fine. Note the 5 nodes are using IP 127.0.0.1 to 127.0.0.5. I understand 127.0.0.1 is pointing to local server, but how about 127.0.0.2 to 127.0.0.5? looks to me that they are not valid IP? how come all 5 nodes are working ok? Another question. I have installed MX4J in instance 127.0.0.1 on port 8081. I am able to connect to http://server:8081/ from the browser. However how do I connect using Jconsole that was installed in another windows machines?(since my CentOS5.5 doesn't have X installed, only SSH allowed). Thanks. -- Tyler Hobbs Software Engineer, DataStax http://datastax.com/ Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra Python client library
Ec2 Stress Results
Does anyone have any Ec2 benchmarks/experiences they can share? I am trying to get a sense for what to expect from a production cluster on Ec2 so that I can compare my application's performance against a sane baseline. What I have done so far is: 1. Lunched a 4 node cluster of m1.xlarge instances in the same availability zone using PyStratus (https://github.com/digitalreasoning/PyStratus). Each node has the following specs (according to Amazon): 15 GB memory 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each) 1,690 GB instance storage 64-bit platform 2. Changed the default PyStratus directories in order to have commit logs on the root partition and data files on ephemeral storage: commitlog_directory: /var/cassandra-logs data_file_directories: [/mnt/cassandra-data] 2. Gave each node 10GB of MAX_HEAP; 1GB HEAP_NEWSIZE in conf/cassandra-env.sh 3. Ran `contrib/stress/bin/stress -d node1,..,node4 -n 1000 -t 100` on a separate m1.large instance: total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time ... 9832712,7120,7120,0.004948514851485148,842 9907616,7490,7490,0.0043189949802413755,852 9978357,7074,7074,0.004560353967289125,863 1000,2164,2164,0.004065933558194335,867 4. Truncated Keyspace1.Standard1: # /usr/local/apache-cassandra/bin/cassandra-cli -host localhost -port 9160 Connected to: Test Cluster on x.x.x.x/9160 Welcome to cassandra CLI. Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] use Keyspace1; Authenticated to keyspace: Keyspace1 [default@Keyspace1] truncate Standard1; null 5. Expanded the cluster to 8 nodes using PyStratus and sanity checked using nodetool: # /usr/local/apache-cassandra/bin/nodetool -h localhost ring Address Status State LoadOwnsToken x.x.x.x Up Normal 1.3 GB 12.50% 21267647932558653966460912964485513216 x.x.x.x Up Normal 3.06 GB 12.50% 42535295865117307932921825928971026432 x.x.x.x Up Normal 1.16 GB 12.50% 63802943797675961899382738893456539648 x.x.x.x Up Normal 2.43 GB 12.50% 85070591730234615865843651857942052864 x.x.x.x Up Normal 1.22 GB 12.50% 106338239662793269832304564822427566080 x.x.x.xUp Normal 2.74 GB 12.50% 127605887595351923798765477786913079296 x.x.x.xUp Normal 1.22 GB 12.50% 148873535527910577765226390751398592512 x.x.x.x Up Normal 2.57 GB 12.50% 170141183460469231731687303715884105728 6. Ran `contrib/stress/bin/stress -d node1,..,node8 -n 1000 -t 100` on a separate m1.large instance again: total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time ... 9880360,9649,9649,0.003210443956226165,720 9942718,6235,6235,0.003206934154398794,731 9997035,5431,5431,0.0032615939761032457,741 1000,296,296,0.002660033726812816,742 In a nutshell, 4 nodes inserted at 11,534 writes/sec and 8 nodes inserted at 13,477 writes/sec. Those numbers seem a little low to me, but I don't have anything to compare to. I'd like to hear others' opinions before I spin my wheels with with number of nodes, threads, memtable, memory, and/or GC settings. Cheers, Alex.
system_* consistency level?
Hi, My unit tests started failing once I upgraded from a single node cassandra cluster to a full N node cluster (I'm starting with 4). I had a few various bugs, mostly due to forgetting to read/write at a quorum level in places I needed stronger consistency guarantees. But, I kept getting random, intermittent failure (the worst kind). I'm 99% sure I see why, after some painful debugging, but I don't know what to do about it. The basic flaw in my understanding of cassandra seems to boil down to: I thought system mutations of keyspaces/column families where of a stronger consistency than ONE, but that appears to not be true. Any way for me to update a cluster at something more like QUORUM? The basic idea is in my unit test.setup() I clone my real keyspace as keyspace_UUID (with all of the exact same CFs) to get a fresh space to play in. In a single node environment, no issues. But, in a cluster, it seems that it takes a while for the system_add_keyspace call to propagate. No worries I think, I just modify my setup() to do describe_keyspace(keyspace_UUID) in a while loop until the cluster is ready. My random failures drop considerably, but every once and awhile I see a similar kind of failure. Then I find out that schema updates seem to propagate on a per node basis. At least, that's what I have to assume as I'm using phpcassa which uses a connection pool, and I see in my logging that my setup() succeeds because one connection in the pool sees the new keyspace, but when my tests run I grab a connection from the pool that is missing it! Do I have a solution other than changing my setup yet again to loop over all cassandra servers doing a describe_keyspace()? -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com
Re: Internal error processing get_range_slices
internal error means an error on the server. check the server log for the stacktrace. On Wed, Apr 20, 2011 at 11:54 AM, Renato Bacelar da Silveira renat...@indabamobile.co.za wrote: Hi all I am just augmenting the information on the following error: -- error -- org.apache.thrift.TApplicationException: Internal error processing get_range_slices at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:724) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704) at com.indaba.cassandra.thrift.ThriftManager.multigetSliceAcrossAllUsers(ThriftManager.java:180) at com.indaba.cassandra.thrift.ThriftManager.testMultigetSlice(ThriftManager.java:210) at com.indaba.cassandra.thrift.ThriftManager.main(ThriftManager.java:260) -- error -- Was able to access the method that sends the cassandra query call, and I do not see anything different then to what I have specified, as in any values changed, missing or such... Basically I have a range slice query with and empty star_key and end_key keyrange, with a Integer.MAX_VALUE [return count value]. I have also a SlicePredicate with 3 column names I would like to find within my column family. I also specify a column parent, but NO super column name, as to roll through the entire range of super columns. From the documentation I gathered one could leave that out from the Column Parent object path so to cause this multiget to work (should be getting a book), so I think I have it covered. Below is the code I am using, and just after, the variable values at the time when Cassandra.send_get_range_slices(ColumnParent column_parent, SlicePredicate predicate, KeyRange range, ConsistencyLevel consistency_level) is called: -- code -- public MapString, ListColumnOrSuperColumn multigetSliceAcrossAllUsers(String[] colNames){ ColumnParent cp; MapString, ListColumnOrSuperColumn slicemap = new TreeMapString, ListColumnOrSuperColumn(); ListKeySlice lstKeyslice; ListByteBuffer lstColNames = new ArrayListByteBuffer(); for(String s : colNames) { lstColNames.add(ByteBufferUtil.bytes(s)); } try { ListCfDef lstColFamDef = client.describe_keyspace(getCurrentKeyspaceName()).getCf_defs(); for(CfDef def : lstColFamDef) { cp = new ColumnParent(); cp.setColumn_family(def.getName()); SlicePredicate slicePrd = new SlicePredicate(); slicePrd.setColumn_names(lstColNames); KeyRange kr = new KeyRange(); kr.setCount(Integer.MAX_VALUE); kr.setStart_key(new byte[0]); kr.setEnd_key(new byte[0]); kr.setStart_keyIsSet(true); kr.setEnd_keyIsSet(true); try { lstKeyslice = client.get_range_slices(cp, slicePrd, kr, ConsistencyLevel.ANY); for(KeySlice kslc : lstKeyslice) { slicemap.put(new String(kslc.getKey()), kslc.getColumns()); } } catch (UnavailableException e) { e.printStackTrace(); } catch (TimedOutException e) { e.printStackTrace(); } } } catch (NotFoundException e) { e.printStackTrace(); } catch (InvalidRequestException e) { e.printStackTrace(); } catch (TException e) { e.printStackTrace(); } return slicemap; } -- code -- -- arguments -- argsCassandra$get_range_slices_args (id=77) column_parentColumnParent (id=46) column_familyUserKey_38 (id=97) super_columnnull consistency_levelConsistencyLevel (id=58) nameANY (id=86) ordinal5 value6 predicateSlicePredicate (id=54) column_namesArrayListE (id=31) [0]HeapByteBuffer (id=137) [1]HeapByteBuffer (id=138) [2]HeapByteBuffer (id=139) slice_rangenull rangeKeyRange (id=56) __isset_bit_vectorBitSet (id=88) count2147483647 end_keyHeapByteBuffer (id=90) address0 bigEndiantrue capacity0 hb (id=118) isReadOnlyfalse limit0 mark-1 nativeByteOrderfalse offset0 position0 end_token
Re: system_* consistency level?
See the comments for describe_schema_versions. On Wed, Apr 20, 2011 at 4:59 PM, William Oberman ober...@civicscience.com wrote: Hi, My unit tests started failing once I upgraded from a single node cassandra cluster to a full N node cluster (I'm starting with 4). I had a few various bugs, mostly due to forgetting to read/write at a quorum level in places I needed stronger consistency guarantees. But, I kept getting random, intermittent failure (the worst kind). I'm 99% sure I see why, after some painful debugging, but I don't know what to do about it. The basic flaw in my understanding of cassandra seems to boil down to: I thought system mutations of keyspaces/column families where of a stronger consistency than ONE, but that appears to not be true. Any way for me to update a cluster at something more like QUORUM? The basic idea is in my unit test.setup() I clone my real keyspace as keyspace_UUID (with all of the exact same CFs) to get a fresh space to play in. In a single node environment, no issues. But, in a cluster, it seems that it takes a while for the system_add_keyspace call to propagate. No worries I think, I just modify my setup() to do describe_keyspace(keyspace_UUID) in a while loop until the cluster is ready. My random failures drop considerably, but every once and awhile I see a similar kind of failure. Then I find out that schema updates seem to propagate on a per node basis. At least, that's what I have to assume as I'm using phpcassa which uses a connection pool, and I see in my logging that my setup() succeeds because one connection in the pool sees the new keyspace, but when my tests run I grab a connection from the pool that is missing it! Do I have a solution other than changing my setup yet again to loop over all cassandra servers doing a describe_keyspace()? -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Cannot find row when using 3 indices for search, able to find it using only 2
Cassandra 0.7.4 on 4 nodes Linux Ubuntu 10.10 i386 , 32 bit root@bigcouch-106:/etc/cassandra# nodetool -h 172.16.1.106 ring Address Status State LoadOwnsToken 172.16.1.104Up Normal 1.8 GB 22.33% 4778396862879243066278530647513341098 172.16.1.8 Up Normal 1.48 GB 28.12% 52627163731801348483758292043565262417 172.16.1.106Up Normal 1.21 GB 27.22% 98934176951395683802275136006692518904 172.16.1.110Up Normal 1.12 GB 22.33% 136934291168078629024171054299313117062 I am using keyspace 'bnd' , columnfamily 'pet' described as update column family pet with column_metadata = [ {column_name: P_cui, validation_class:UTF8Type, index_type: KEYS}, {column_name: P_nume, validation_class:UTF8Type, index_type: KEYS}, {column_name: P_prenume, validation_class:UTF8Type, index_type: KEYS} ]; Trying to find a row using 2 indices (P_cui and P_prenume) works: [default@bnd] get pet where P_cui='1670518330770' and P_prenume='CONSTANTIN'; --- RowKey: RO1492360605 = (column=A1RO35486663, value=313a463a323030332d30342d30313a32333730, timestamp=1303181522507175) = (column=P_adresa, value=4c4954454e49, timestamp=1303181522507175) = (column=P_cui, value=1670518330770, timestamp=1303181522507175) = (column=P_nume, value=Manoliu, timestamp=1303181522507175) = (column=P_prenume, value=CONSTANTIN, timestamp=1303181522507175) = (column=P_tip, value=36, timestamp=1303253832349129) 1 Row Returned. I am able to find it using the other 2 indices (P_prenume and P_nume) works fine: [default@bnd] get pet where P_prenume='CONSTANTIN' and P_nume='Manoliu'; --- RowKey: RO1492360605 = (column=A1RO35486663, value=313a463a323030332d30342d30313a32333730, timestamp=1303181522507175) = (column=P_adresa, value=4c4954454e49, timestamp=1303181522507175) = (column=P_cui, value=1670518330770, timestamp=1303181522507175) = (column=P_nume, value=Manoliu, timestamp=1303181522507175) = (column=P_prenume, value=CONSTANTIN, timestamp=1303181522507175) = (column=P_tip, value=36, timestamp=1303253832349129) 1 Row Returned. -- Trying to find the same row using 3 indices not working: [default@bnd] get pet where P_cui='1670518330770' and P_prenume='CONSTANTIN' and P_nume='Manoliu'; 0 Row Returned. Any clues? Teo
Re: Cannot find row when using 3 indices for search, able to find it using only 2
Thank you, I'll wait for 0.7.5 distribution when it will be shipped to test it again! Up to now, I'm satisfied with cassandra, we are evaluating it for migrating our PostgreSQL solution to a mixed [couchdb + bigcouch + cassandra] architecture ! Best regards, Teo On Thu, Apr 21, 2011 at 1:15 AM, Jonathan Ellis jbel...@gmail.com wrote: sounds like https://issues.apache.org/jira/browse/CASSANDRA-2347
CQL in future 8.0 cassandra will work as I'm expecting ?
My use case is as follows: we are using in 70% of the jobs information retrieval using keys, column names and ranges and up to now, what we have tested suits our need. However, the rest of 30% of the jobs involve full sequential scan of all records in the database. I found some web pages describing the next good thing for cassandra 0.8 release, CQL, and I'm wondering: the CQL execution will involve separate processes running simultaneously on all nodes in the cluster that will do the filtering and pre-sorting phase on the local stored data (using indexes when available) and then execute the merge phase on a single node (that one that have received the request) ? Best regards, Teo
Re: Multi-DC Deployment
Assuming that you generally put an API on top of this, delivering to two or more systems then boils down to a message queue issue or some similar mechanism which handles secure delivery of messages. Maybe not trivial, but there are many products that can help you with this, and it is a lot easier to implement than a fully distributed storage system. Yes, ideally Cassandra will not distribute corruption, but the reason you pay up to have 2 fully redundant setups in 2 different datacenters is because we do not live in an ideal world. Anyone having tested Cassandra since 0.7.0 with any real data will be able to testify how well it can mess things up. This is not specific to Cassandra, in fact, I would argue thats this is in the blood of any distributed system. You want them to distribute after all and the tighter the coupling is between nodes, the better they distribute bad stuff as well as good stuff. There is a bigger risk for a complete failure with 2 tightly coupled redundant systems than with 2 almost completely isolated ones. The logic here is so simple it is really somewhat beyond discussion. There are a few other advantages of isolating the systems. Especially in terms of operation, 2 isolated systems would be much easier as you could relatively risk fee try out a new cassandra in one datacenter or upgrade one datacenter at a time if you needed major operational changes such as schema changes or other large changes to the data. I see the 2 copies in one datacenters + 1(or maybe 2) in another as a low cost middleway between 2 full N+2 (RF=3) systems in both data centers. That is, in a traditional design where you need 1 node for normal service, you would have 1 extra replicate for redundancy and one replica more (N+2 redundancy) so you can do maintenance and still be redundant. If I have redundancy across datacenters, I would probably still want 2 replicas to avoid network traffic between DCs in case of a node recovery, but N+2 may not be needed as my risk policy may find it acceptable to run one datacenters without redundancy for a time limited period for maintenance. That is, if my original requirement is 1 node, I could do with 3x the HW which is not all that much more than the 3x I need for one DC and a lot less than the 6x I need for 2 full N+2 systems. However, all of the above is really beyond the point of my original suggestion. Regardless of datacenters, redundancy and distribution of bad or good stuff, it would be good to have a way to return whatever data is there, but with a flag or similar stating that the consistency level was not met. Again, for a lot of services, it is fully acceptable, and a lot better, to return an almost complete (or maybe even complete, but no verified by quorum) result than no result at all. As far as I remember from the code, this just boils down to returning whatever you collected from the cluster and setting the proper flag or similar on the resultset rather than returning an error. Terje On Thu, Apr 21, 2011 at 5:01 AM, Adrian Cockcroft adrian.cockcr...@gmail.com wrote: Hi Terje, If you feed data to two rings, you will get inconsistency drift as an update to one succeeds and to the other fails from time to time. You would have to build your own read repair. This all starts to look like I don't trust Cassandra code to work, so I will write my own buggy one off versions of Cassandra functionality. I lean towards using Cassandra features rather than rolling my own because there is a large community testing, fixing and extending Cassandra, and making sure that the algorithms are robust. Distributed systems are very hard to get right, I trust lots of users and eyeballs on the code more than even the best engineer working alone. Cassandra doesn't replicate sstable corruptions. It detects corrupt data and only replicates good data. Also data isn't replicated to three identical nodes in the way you imply, it's replicated around the ring. If you lose three nodes, you don't lose a whole node's worth of data. We configure each replica to be in a different availability zone so that we can lose a third of our nodes (a whole zone) and still work. On a 300 node system with RF=3 and no zones, losing one or two nodes you still have all your data, and can repair the loss quickly. With three nodes dead at once you don't lose 1% of the data (3/300) I think you lose 1/(300*300*300) of the data (someone check my math?). If you want to always get a result, then you use read one, if you want to get a highly available better quality result use local quorum. That is a per-query option. Adrian On Tue, Apr 19, 2011 at 6:46 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: If you have RF=3 in both datacenters, it could be discussed if there is a point to use the built in replication in Cassandra at all vs. feeding the data to both datacenters and get 2 100% isolated cassandra instances that cannot replicate sstable corruptions
Re: system_* consistency level?
That was the trick. Thanks! On Apr 20, 2011, at 6:05 PM, Jonathan Ellis jbel...@gmail.com wrote: See the comments for describe_schema_versions. On Wed, Apr 20, 2011 at 4:59 PM, William Oberman ober...@civicscience.com wrote: Hi, My unit tests started failing once I upgraded from a single node cassandra cluster to a full N node cluster (I'm starting with 4). I had a few various bugs, mostly due to forgetting to read/write at a quorum level in places I needed stronger consistency guarantees. But, I kept getting random, intermittent failure (the worst kind). I'm 99% sure I see why, after some painful debugging, but I don't know what to do about it. The basic flaw in my understanding of cassandra seems to boil down to: I thought system mutations of keyspaces/column families where of a stronger consistency than ONE, but that appears to not be true. Any way for me to update a cluster at something more like QUORUM? The basic idea is in my unit test.setup() I clone my real keyspace as keyspace_UUID (with all of the exact same CFs) to get a fresh space to play in. In a single node environment, no issues. But, in a cluster, it seems that it takes a while for the system_add_keyspace call to propagate. No worries I think, I just modify my setup() to do describe_keyspace(keyspace_UUID) in a while loop until the cluster is ready. My random failures drop considerably, but every once and awhile I see a similar kind of failure. Then I find out that schema updates seem to propagate on a per node basis. At least, that's what I have to assume as I'm using phpcassa which uses a connection pool, and I see in my logging that my setup() succeeds because one connection in the pool sees the new keyspace, but when my tests run I grab a connection from the pool that is missing it! Do I have a solution other than changing my setup yet again to loop over all cassandra servers doing a describe_keyspace()? -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Multi-DC Deployment
Queues replicate bad data just as well as anything else. The biggest source of bad data is broken app code... You will still need to implement a reconciliation/repair checker, as queues have their own failure modes when they get backed up. We have also looked at using queues to bounce data between cassandra clusters for other reasons, and they have their place. However it is a lot more work to implement than using existing well tested Cassandra functionality to do it for us. I think your code needs to retry a failed local-quorum read with a read-one to get the behavior you are asking for. Our approach to bad data and corruption issues is backups, wind back to the last good snapshot. We have figured out incremental backups as well as full. Our code has some local dependencies, but could be the basis for a generic solution. Adrian On Wed, Apr 20, 2011 at 6:08 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: Assuming that you generally put an API on top of this, delivering to two or more systems then boils down to a message queue issue or some similar mechanism which handles secure delivery of messages. Maybe not trivial, but there are many products that can help you with this, and it is a lot easier to implement than a fully distributed storage system. Yes, ideally Cassandra will not distribute corruption, but the reason you pay up to have 2 fully redundant setups in 2 different datacenters is because we do not live in an ideal world. Anyone having tested Cassandra since 0.7.0 with any real data will be able to testify how well it can mess things up. This is not specific to Cassandra, in fact, I would argue thats this is in the blood of any distributed system. You want them to distribute after all and the tighter the coupling is between nodes, the better they distribute bad stuff as well as good stuff. There is a bigger risk for a complete failure with 2 tightly coupled redundant systems than with 2 almost completely isolated ones. The logic here is so simple it is really somewhat beyond discussion. There are a few other advantages of isolating the systems. Especially in terms of operation, 2 isolated systems would be much easier as you could relatively risk fee try out a new cassandra in one datacenter or upgrade one datacenter at a time if you needed major operational changes such as schema changes or other large changes to the data. I see the 2 copies in one datacenters + 1(or maybe 2) in another as a low cost middleway between 2 full N+2 (RF=3) systems in both data centers. That is, in a traditional design where you need 1 node for normal service, you would have 1 extra replicate for redundancy and one replica more (N+2 redundancy) so you can do maintenance and still be redundant. If I have redundancy across datacenters, I would probably still want 2 replicas to avoid network traffic between DCs in case of a node recovery, but N+2 may not be needed as my risk policy may find it acceptable to run one datacenters without redundancy for a time limited period for maintenance. That is, if my original requirement is 1 node, I could do with 3x the HW which is not all that much more than the 3x I need for one DC and a lot less than the 6x I need for 2 full N+2 systems. However, all of the above is really beyond the point of my original suggestion. Regardless of datacenters, redundancy and distribution of bad or good stuff, it would be good to have a way to return whatever data is there, but with a flag or similar stating that the consistency level was not met. Again, for a lot of services, it is fully acceptable, and a lot better, to return an almost complete (or maybe even complete, but no verified by quorum) result than no result at all. As far as I remember from the code, this just boils down to returning whatever you collected from the cluster and setting the proper flag or similar on the resultset rather than returning an error. Terje On Thu, Apr 21, 2011 at 5:01 AM, Adrian Cockcroft adrian.cockcr...@gmail.com wrote: Hi Terje, If you feed data to two rings, you will get inconsistency drift as an update to one succeeds and to the other fails from time to time. You would have to build your own read repair. This all starts to look like I don't trust Cassandra code to work, so I will write my own buggy one off versions of Cassandra functionality. I lean towards using Cassandra features rather than rolling my own because there is a large community testing, fixing and extending Cassandra, and making sure that the algorithms are robust. Distributed systems are very hard to get right, I trust lots of users and eyeballs on the code more than even the best engineer working alone. Cassandra doesn't replicate sstable corruptions. It detects corrupt data and only replicates good data. Also data isn't replicated to three identical nodes in the way you imply, it's replicated around the ring. If you lose three nodes, you don't lose a whole node's
seed faq
I made self answered faqs on seed after reading the wiki and code. If I misunderstand something, please point out to me. == What are seeds? == Seeds, or seed nodes are the nodes which new nodes refer to on bootstrap to know ring information. When you add a new node to ring, you need to specify at least one live seed to contact. Once a node join the ring, it learns about the other nodes, so it doesn't need seed on subsequent boot. There is no special configuration for seed node itself. In stable and static ring, you can point non-seed node as seed on bootstrap though it is not recommended. Nodes in the ring tend to send Gossip message to seeds more often by design, so it is probable that seeds have most recent and updated information of the ring. ( Refer to [[ArchitectureGossip]] for more details ) == Does single seed mean single point of failure? == If you are using replicated CF on the ring, only one seed in the ring doesn't mean single point of failure. The ring can operate or boot without the seed. But it is recommended to have multiple seeds in production system to maintain the ring. Thanks -- maki
Re: CQL in future 8.0 cassandra will work as I'm expecting ?
You want to run map/reduce jobs for your use case. You can already do this with Cassandra (http://wiki.apache.org/cassandra/HadoopSupport), and DataStax is introducing Brisk soon to make it easier: http://www.datastax.com/products/brisk On Wed, Apr 20, 2011 at 9:36 PM, Jonathan Ellis jbel...@gmail.com wrote: CQL changes the API, that is all. On Wed, Apr 20, 2011 at 5:40 PM, Constantin Teodorescu braila...@gmail.com wrote: My use case is as follows: we are using in 70% of the jobs information retrieval using keys, column names and ranges and up to now, what we have tested suits our need. However, the rest of 30% of the jobs involve full sequential scan of all records in the database. I found some web pages describing the next good thing for cassandra 0.8 release, CQL, and I'm wondering: the CQL execution will involve separate processes running simultaneously on all nodes in the cluster that will do the filtering and pre-sorting phase on the local stored data (using indexes when available) and then execute the merge phase on a single node (that one that have received the request) ? Best regards, Teo -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com