Re: Many creation/inserts in parallel
1) We’ve tested 100 threads in parallel and each thread created 10 tables. I think we will change our data model, but another problem may occur. About 80% of these CFs should be truncated every day and if we decrease many CF by creating one key field in one CF, a huge amount of tombstones will appear. What do think about it? 2) Tables appear with delay. Driver switches connections by Round-Robin. I think that CF was created in one node and after a moment the data was inserted in another node. And schema doesn't have time to synchronize. 2013/4/28 aaron morton aa...@thelastpickle.com At first many CF are being created in parallel (about 1000 CF). Can you explain this in a bit more detail ? By in parallel do you mean multiple threads creating CF's at the same time ? I would also recommend taking a second look at your data model, you probably do not want to create so many CF's. During tests we're receiving some exceptions from driver, e.g.: The CF you are trying to read / write from does not exist. Check if the table exists using cqlsh / cassandra-cli. Check your code to make sure it was created. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 26/04/2013, at 10:49 PM, Sasha Yanushkevich yanus...@gmail.com wrote: Hi All We are testing Cassandra 1.2.3 (3 nodes with RF:2) with FluentCassandra driver. At first many CF are being created in parallel (about 1000 CF). After creation is done follows many insertions of little amount of data into the DB. During tests we're receiving some exceptions from driver, e.g.: FluentCassandra.Operations.CassandraOperationException: unconfigured columnfamily table_78_9 and FluentCassandra.Operations.CassandraOperationException: Connection to Cassandra has timed out Though in Cassandra's logs there are no exceptions. What should we do to handle these exceptions? -- Best regards, Alexander -- Best regards, Alexander
Understanding the source code
Dear all, I am trying to understand and analyze the source code of Cassandra. What I expect (and see in other codes) is that there should be three sections in a code. 1) Initialization and input reading, 2) Core computation and 3) Finalizing and gathering the output. However I can not find such structure in the source files. Any comment is appreciated. Regards, Mahmood
Re: Adding nodes in 1.2 with vnodes requires huge disks
is this understanding correct we had a 12 node cluster with 256 vnodes on each node (upgraded from 1.1), we added two additional nodes that streamed so much data (600+Gb when other nodes had 150-200GB) during the joining phase that they filled their local disks and had to be killed ? Can you raise a ticket on https://issues.apache.org/jira/browse/CASSANDRA and update the thread with the ticket number. Can you show the output from nodetool status so we can get a feel for the ring? Can you include the logs from one of the nodes that failed to join ? Thanks - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 29/04/2013, at 10:01 AM, John Watson j...@disqus.com wrote: On Sun, Apr 28, 2013 at 2:19 PM, aaron morton aa...@thelastpickle.com wrote: We're going to try running a shuffle before adding a new node again... maybe that will help I don't think hurt but I doubt it will help. We had to bail on shuffle since we need to add capacity ASAP and not in 20 days. It seems when new nodes join, they are streamed *all* sstables in the cluster. How many nodes did you join, what was the num_tokens ? Did you notice streaming from all nodes (in the logs) or are you saying this in response to the cluster load increasing ? Was only adding 2 nodes at the time (planning to add a total of 12.) Starting with a cluster of 12, but now 11 since 1 node entered some weird state when one of the new nodes ran out disk space. num_tokens is set to 256 on all nodes. Yes, nearly all current nodes were streaming to the new ones (which was great until disk space was an issue.) The purple line machine, I just stopped the joining process because the main cluster was dropping mutation messages at this point on a few nodes (and it still had dozens of sstables to stream.) Which were the new nodes ? Can you show the output from nodetool status? The new nodes are the purple and gray lines above all the others. nodetool status doesn't show joining nodes. I think I saw a bug already filed for this but I can't seem to find it. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 27/04/2013, at 9:35 AM, Bryan Talbot btal...@aeriagames.com wrote: I believe that nodetool rebuild is used to add a new datacenter, not just a new host to an existing cluster. Is that what you ran to add the node? -Bryan On Fri, Apr 26, 2013 at 1:27 PM, John Watson j...@disqus.com wrote: Small relief we're not the only ones that had this issue. We're going to try running a shuffle before adding a new node again... maybe that will help - John On Fri, Apr 26, 2013 at 5:07 AM, Francisco Nogueira Calmon Sobral fsob...@igcorp.com.br wrote: I am using the same version and observed something similar. I've added a new node, but the instructions from Datastax did not work for me. Then I ran nodetool rebuild on the new node. After finished this command, it contained two times the load of the other nodes. Even when I ran nodetool cleanup on the older nodes, the situation was the same. The problem only seemed to disappear when nodetool repair was applied to all nodes. Regards, Francisco Sobral. On Apr 25, 2013, at 4:57 PM, John Watson j...@disqus.com wrote: After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and running upgradesstables, I figured it would be safe to start adding nodes to the cluster. Guess not? It seems when new nodes join, they are streamed *all* sstables in the cluster. https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png The gray the line machine ran out disk space and for some reason cascaded into errors in the cluster about 'no host id' when trying to store hints for it (even though it hadn't joined yet). The purple line machine, I just stopped the joining process because the main cluster was dropping mutation messages at this point on a few nodes (and it still had dozens of sstables to stream.) I followed this: http://www.datastax.com/docs/1.2/operations/add_replace_nodes Is there something missing in that documentation? Thanks, John
Re: CQL Clarification
Not really, I've passed on the comments to the doc teams. The column timestamp is just a 64 bit int like I said. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 29/04/2013, at 10:06 AM, Michael Theroux mthero...@yahoo.com wrote: Yes, that does help, So, in the link I provided: http://www.datastax.com/docs/1.0/references/cql/UPDATE It states: You can specify these options: • Consistency level • Time-to-live (TTL) • Timestamp for the written columns. Where timestamp is a link to Working with dates and times and mentions the 64bit millisecond value. Is that incorrect? -Mike On Apr 28, 2013, at 11:42 AM, Michael Theroux wrote: Hello, Just wondering if I can get a quick clarification on some simple CQL. We utilize Thrift CQL Queries to access our cassandra setup. As clarified in a previous question I had, when using CQL and Thrift, timestamps on the cassandra column data is assigned by the server, not the client, unless AND TIMESTAMP is utilized in the query, for example: http://www.datastax.com/docs/1.0/references/cql/UPDATE According to the Datastax documentation, this timestamp should be: Values serialized with the timestamp type are encoded as 64-bit signed integers representing a number of milliseconds since the standard base time known as the epoch: January 1 1970 at 00:00:00 GMT. However, my testing showed that updates didn't work when I used a timestamp of this format. Looking at the Cassandra code, it appears that cassandra will assign a timestamp of System.currentTimeMillis() * 1000 when a timestamp is not specified, which would be the number of nanoseconds since the stand base time. In my test environment, setting the timestamp to be the current time * 1000 seems to work. It seems that if you have an older installation without TIMESTAMP being specified in the CQL, or a mixed environment, the timestamp should be * 1000. Just making sure I'm reading everything properly... improperly setting the timestamp could cause us some serious damage. Thanks, -Mike
Re: cassandra-shuffle time to completion and required disk space
An alternative to running shuffle is to do a rolling bootstrap/decommission. You would set num_tokens on the existing hosts (and restart them) so that they split their ranges, then bootstrap in N new hosts, then decommission the old ones. On 28 April 2013 22:21, John Watson j...@disqus.com wrote: The amount of time/space cassandra-shuffle requires when upgrading to using vnodes should really be apparent in documentation (when some is made). Only semi-noticeable remark about the exorbitant amount of time is a bullet point in: http://wiki.apache.org/cassandra/VirtualNodes/Balance Shuffling will entail moving a lot of data around the cluster and so has the potential to consume a lot of disk and network I/O, and to take a considerable amount of time. For this to be an online operation, the shuffle will need to operate on a lower priority basis to other streaming operations, and should be expected to take days or weeks to complete. We tried running shuffle on a QA version of our cluster and 2 things were brought to light: - Even with no reads/writes it was going to take 20 days - Each machine needed enough free diskspace to potentially hold the entire cluster's sstables on disk Regards, John -- Sam Overton Acunu | http://www.acunu.com | @acunu
Fwd: Inter-DC communication optimization
Hello. I would like to know whether updates are propagated from local DC to remote DCs simultaneously (so All-to-All network connections are preferable) or Cassandra can somehow determine nearest DCs and send updates only to them (so these nearest DCs have to propagate updates further)? Is there some optimizations for multiple DCs placed sequentially on a single link like DC1 - DC2 - ... - DCn? Thanks in advance, Sergey Naumov.
Cass 1.1.1 and 1.1.11 Exception during compactions
We saw this exception with 1.1.1 and also with 1.1.11 (we upgraded for unrelated reasons, to fix the FD leak during slice queries) -- name of the CF replaced with * for confidentiality: 10419 ERROR [CompactionExecutor:36] 2013-04-29 07:50:49,060 AbstractCassandraDaemon.java (line 132) Exception in thread T hread[CompactionExecutor:36,1,main] 10420 java.lang.RuntimeException: Last written key DecoratedKey(138024912283272996716128964353306009224, 6138633035613062 2d61362d376330612d666531662d373738616630636265396535) = current key DecoratedKey(12706537740594940274338371890 1402082101, 64323962636163652d646561372d333039322d386166322d663064346132363963386131) writing into *-tmp-hf-7372-Data.db 10421 at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134) 10422 at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153) 10423 at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:160) 10424 at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) 10425 at org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164) 10426 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) 10427 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 10428 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) 10429 at java.util.concurrent.FutureTask.run(FutureTask.java:166) 10430 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 10431 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 10432 at java.lang.Thread.run(Thread.java:722) ANy thoughts ? Should I be concerned about data being lost ? -- Regards, Oleg Dulin NYC Java Big Data Engineer http://www.olegdulin.com/
normal thread counts?
Hi, I'm having some issues. I keep getting: ERROR [GossipStage:1] 2013-04-28 07:48:48,876 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[GossipStage:1,5,main] java.lang.OutOfMemoryError: unable to create new native thread -- after a day or two of runtime. I've checked and my system settings seem acceptable: memlock=unlimited nofiles=10 nproc=122944 I've messed with heap sizes from 6-12GB (15 physical, m1.xlarge in AWS), and I keep OOM'ing with the above error. I've found some (what seem to me) to be obscure references to the stack size interacting with # of threads. If I'm understanding it correctly, to reason about Java mem usage I have to think of OS + Heap as being locked down, and the stack gets the leftovers of physical memory and each thread gets a stack. For me, the system ulimit setting on stack is 10240k (no idea if java sees or respects this setting). My -Xss for cassandra is the default (I hope, don't remember messing with it) of 180k. I used JMX to check current number of threads in a production cassandra machine, and it was ~27,000. Is that a normal thread count? Could my OOM be related to stack + number of threads, or am I overlooking something more simple? will
Re: Deletes, null values
I created it almost a year ago with cassandra-cli. Now show_schema returns: create column family myCF with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and populate_io_cache_on_flush = false and gc_grace = 864000 and min_compaction_threshold = 4 and max_compaction_threshold = 12 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' and caching = 'KEYS_ONLY' and bloom_filter_fp_chance = 0.01 and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'}; The output looks correct to me. CQL table return values, including null, for all of the selected columns. I thought that C* had no null values... I use a lot of CF in which only the columns name are filled up and I request a range of column to see which references (like 1228#**16866) exists. So I would like those column to simply disappear from the table. Alain 2013/4/28 aaron morton aa...@thelastpickle.com What's your table definition ? select '1228#16857','1228#16866','1228#16875','1237#16544','1237#16553' from myCF where key = 'all'; The output looks correct to me. CQL table return values, including null, for all of the selected columns. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 27/04/2013, at 12:48 AM, Sorin Manolache sor...@gmail.com wrote: On 2013-04-26 11:55, Alain RODRIGUEZ wrote: Of course: From CQL 2 (cqlsh -2): delete '183#16684','183#16714','183#16717' from myCF where key = 'all'; And selecting this data as follow gives me the result above: select '1228#16857','1228#16866','1228#16875','1237#16544','1237#16553' from myCF where key = 'all'; From thrift (phpCassa client): $pool = new ConnectionPool('myKeyspace',array('192.168.100.201'),6,0,3,3); $my_cf= new ColumnFamily($pool, 'myCF', true, true, ConsistencyLevel::QUORUM, ConsistencyLevel::QUORUM); $my_cf-remove('all', array('1228#16857','1228#16866','1228#16875')); I see. I'm sorry, I know nothing about phpCassa. I use batch_mutation with deletions and it works. But I guess phpCassa must use the same thrift primitives. Sorin 2013/4/25 Sorin Manolache sor...@gmail.com mailto:sor...@gmail.comsor...@gmail.com On 2013-04-25 11:48, Alain RODRIGUEZ wrote: Hi, I tried to delete some columns using cql2 as well as thrift on C*1.2.2 and instead of being unreachable, deleted columns have a null value. I am using no value in this CF, the only information I use is the existence of the column. So when I select all the column for a given key I have the following returned: 1228#16857 | 1228#16866 | 1228#16875 | 1237#16544 | 1237#16553 ---+--__+--+--__-+__-- null | null | null | | This is quite annoying since my app thinks that I have 5 columns there when I should have 2 only. I first thought that this was a visible marker of tombstones but they didn't vanish after a major compaction. How can I get rid of these null/ghost columns and why does it happen ? I do something similar but I don't see null values. Could you please post the code where you delete the columns? Sorin
Exception when setting tokens for the cassandra nodes
Hi, I am testing out Cassandra 1.2 on two of my local servers. But I face problems with assigning tokens to my nodes. When I use nodetool to set token, I end up getting an java Exception. My test setup is as follows, Node1: local ip 1 (seed) Node2: local ip 2 (seed) Since I have two nodes, i calculated the tokens as 0 and 2^127/2 = 85070591730234615865843651857942052864. I was able to set token 0 for my first node using nodetool move 0 , but when i am trying to set 85070591730234615865843651857942052864 for my second node, it throws a main UndeclaredThrowableException. Full stack is attached bellow. user@server~$ nodetool move 85070591730234615865843651857942052864 Exception in thread main java.lang.reflect.UndeclaredThrowableException at $Proxy0.getTokenToEndpointMap(Unknown Source) at org.apache.cassandra.tools.NodeProbe.getTokenToEndpointMap(NodeProbe.java:288) at org.apache.cassandra.tools.NodeCmd.printRing(NodeCmd.java:215) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1051) Caused by: javax.management.InstanceNotFoundException: org.apache.cassandra.db:type=StorageService at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:643) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:668) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1463) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:96) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1327) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1419) at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:656) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:273) at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:251) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:160) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.getAttribute(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.getAttribute(RMIConnector.java:901) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:280) Any suggestions towards solving this problem would be deeply appreciated. thanks, rahul -- Rahul Erai www.cse.iitk.ac.in/~rahule
RE: Exception when setting tokens for the cassandra nodes
For starters: If you are using the Murmur3 partitioner, which is the default in cassandra.yaml, then you need to calculate the tokens using: python -c 'print [str(((2**64 / 2) * i) - 2**63) for i in range(2)]' which gives the following values: ['-9223372036854775808', '0'] From: Rahul [mailto:rahule...@gmail.com] Sent: Monday, April 29, 2013 7:23 PM To: user@cassandra.apache.org Subject: Exception when setting tokens for the cassandra nodes Hi, I am testing out Cassandra 1.2 on two of my local servers. But I face problems with assigning tokens to my nodes. When I use nodetool to set token, I end up getting an java Exception. My test setup is as follows, Node1: local ip 1 (seed) Node2: local ip 2 (seed) Since I have two nodes, i calculated the tokens as 0 and 2^127/2 = 85070591730234615865843651857942052864. I was able to set token 0 for my first node using nodetool move 0 , but when i am trying to set 85070591730234615865843651857942052864 for my second node, it throws a main UndeclaredThrowableException. Full stack is attached bellow. user@server~$ nodetool move 85070591730234615865843651857942052864 Exception in thread main java.lang.reflect.UndeclaredThrowableException at $Proxy0.getTokenToEndpointMap(Unknown Source) at org.apache.cassandra.tools.NodeProbe.getTokenToEndpointMap(NodeProbe.java:288) at org.apache.cassandra.tools.NodeCmd.printRing(NodeCmd.java:215) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1051) Caused by: javax.management.InstanceNotFoundException: org.apache.cassandra.db:type=StorageService at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:643) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:668) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1463) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:96) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1327) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1419) at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:656) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:273) at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:251) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:160) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.getAttribute(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.getAttribute(RMIConnector.java:901) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:280) Any suggestions towards solving this problem would be deeply appreciated. thanks, rahul -- Rahul Erai www.cse.iitk.ac.in/~rahulehttp://www.cse.iitk.ac.in/~rahule ___ This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer. For
Fwd: error casandra ring an hadoop connection ¿?
*hi all:* * * *i can run pig with cassandra and hadoop in EC2.* * * *I ,m trying to run pig with cassandra ring and hadoop * *The ring cassandra have the tasktrackers and datanodes , too. * * * *and i running pig from another machine where i have intalled the namenode-jobtracker.* *ihave a simple script to load data ffrom pygmalion keyspace adn columfalimily account and dump result to test.* *I installed another simple local cassandra in namenode-job tacker machine and i can run pig jobs ok, but when i try to run script in cassandra ring config changig the config of envitronment variable PIG_INITIAL_ADDRESS to the IP of one of the nodes of cassandra ring i have this error:* * * *---* * * * * *java.lang.RuntimeException: UnavailableException() at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:184) at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(CassandraStorage.java:226) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: UnavailableException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12924) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346) ... 17 more* * * * * *can anybody help me o have any idea?* *Thanks in advance* *pd:* *1.- the ports **are open in EC2 * *2 The keyspace and cF are created in the cassandra cluster EC2 too nad likey at the name node cassandra installation.* *3.-i have this bash_profile configuration:* *# .bash_profile # Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi # User specific environment and startup programs PATH=$PATH:$HOME/.local/bin:$HOME/bin export PATH=$PATH:/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64 export CASSANDRA_HOME=/home/ec2-user/apache-cassandra-1.2.4 export PIG_HOME=/home/ec2-user/pig-0.11.1-src export PIG_INITIAL_ADDRESS=10.210.164.233 #export PIG_INITIAL_ADDRESS=127.0.0.1 export PIG_RPC_PORT=9160 export PIG_CONF_DIR=/home/ec2-user/hadoop-1.1.1/conf export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner #export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner* * * *4.- I export all cassandrasjars in the hadoop-env.sh for all nodes of hadoop* *5.- i have the same error running PIG in local mode * * * 6.- if i change to ramdonpartioner an reload changes i have this error: java.lang.RuntimeException: InvalidRequestException(why:Start token sorts after end token) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:184) at
Re: cassandra-shuffle time to completion and required disk space
That's what we tried first before the shuffle. And ran into the space issue. That's detailed in another thread title: Adding nodes in 1.2 with vnodes requires huge disks On Mon, Apr 29, 2013 at 4:08 AM, Sam Overton s...@acunu.com wrote: An alternative to running shuffle is to do a rolling bootstrap/decommission. You would set num_tokens on the existing hosts (and restart them) so that they split their ranges, then bootstrap in N new hosts, then decommission the old ones. On 28 April 2013 22:21, John Watson j...@disqus.com wrote: The amount of time/space cassandra-shuffle requires when upgrading to using vnodes should really be apparent in documentation (when some is made). Only semi-noticeable remark about the exorbitant amount of time is a bullet point in: http://wiki.apache.org/cassandra/VirtualNodes/Balance Shuffling will entail moving a lot of data around the cluster and so has the potential to consume a lot of disk and network I/O, and to take a considerable amount of time. For this to be an online operation, the shuffle will need to operate on a lower priority basis to other streaming operations, and should be expected to take days or weeks to complete. We tried running shuffle on a QA version of our cluster and 2 things were brought to light: - Even with no reads/writes it was going to take 20 days - Each machine needed enough free diskspace to potentially hold the entire cluster's sstables on disk Regards, John -- Sam Overton Acunu | http://www.acunu.com | @acunu
Re: Adding nodes in 1.2 with vnodes requires huge disks
Did you update num_tokens on the existing hosts and restart them, before you tried bootstrapping in the new node? If the new node tried to stream all the data in the cluster then this would be consistent with you having missed that step. You should see Calculating new tokens in the logs of the existing hosts if you performed that step correctly, and nodetool ring should show that the existing hosts each have 256 tokens which are contiguous in the ring. If you missed this step then the new node will be taking 256 tokens in a ring with only N tokens (1 per existing host) and so will end up with 256/(256+N) of the data (almost all of it). On 28 April 2013 23:01, John Watson j...@disqus.com wrote: On Sun, Apr 28, 2013 at 2:19 PM, aaron morton aa...@thelastpickle.comwrote: We're going to try running a shuffle before adding a new node again... maybe that will help I don't think hurt but I doubt it will help. We had to bail on shuffle since we need to add capacity ASAP and not in 20 days. It seems when new nodes join, they are streamed *all* sstables in the cluster. How many nodes did you join, what was the num_tokens ? Did you notice streaming from all nodes (in the logs) or are you saying this in response to the cluster load increasing ? Was only adding 2 nodes at the time (planning to add a total of 12.) Starting with a cluster of 12, but now 11 since 1 node entered some weird state when one of the new nodes ran out disk space. num_tokens is set to 256 on all nodes. Yes, nearly all current nodes were streaming to the new ones (which was great until disk space was an issue.) The purple line machine, I just stopped the joining process because the main cluster was dropping mutation messages at this point on a few nodes (and it still had dozens of sstables to stream.) Which were the new nodes ? Can you show the output from nodetool status? The new nodes are the purple and gray lines above all the others. nodetool status doesn't show joining nodes. I think I saw a bug already filed for this but I can't seem to find it. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 27/04/2013, at 9:35 AM, Bryan Talbot btal...@aeriagames.com wrote: I believe that nodetool rebuild is used to add a new datacenter, not just a new host to an existing cluster. Is that what you ran to add the node? -Bryan On Fri, Apr 26, 2013 at 1:27 PM, John Watson j...@disqus.com wrote: Small relief we're not the only ones that had this issue. We're going to try running a shuffle before adding a new node again... maybe that will help - John On Fri, Apr 26, 2013 at 5:07 AM, Francisco Nogueira Calmon Sobral fsob...@igcorp.com.br wrote: I am using the same version and observed something similar. I've added a new node, but the instructions from Datastax did not work for me. Then I ran nodetool rebuild on the new node. After finished this command, it contained two times the load of the other nodes. Even when I ran nodetool cleanup on the older nodes, the situation was the same. The problem only seemed to disappear when nodetool repair was applied to all nodes. Regards, Francisco Sobral. On Apr 25, 2013, at 4:57 PM, John Watson j...@disqus.com wrote: After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and running upgradesstables, I figured it would be safe to start adding nodes to the cluster. Guess not? It seems when new nodes join, they are streamed *all* sstables in the cluster. https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png The gray the line machine ran out disk space and for some reason cascaded into errors in the cluster about 'no host id' when trying to store hints for it (even though it hadn't joined yet). The purple line machine, I just stopped the joining process because the main cluster was dropping mutation messages at this point on a few nodes (and it still had dozens of sstables to stream.) I followed this: http://www.datastax.com/docs/1.2/operations/add_replace_nodes Is there something missing in that documentation? Thanks, John -- Sam Overton Acunu | http://www.acunu.com | @acunu
Compaction, Slow Ring, and bad behavior
Hi, we have a 9-node ring on m1.xlarge AWS hosts. We started having some trouble a while ago, and it's making me pull out all of my hair. The host in position #3 has been replaced 4 times. Each time, the host joins the ring, I do a nodetool repair -pr, and she seems fine for about a day. Then she gets real slow, sometimes OOMs, sometimes takes down the host in position #5, sometimes gets stuck on a compaction with near-idle disk throughput, and eventually dies without any kind of error message or reason for failing. Sometimes our cluster gets so slow that it is almost unusable - we get timeout errors from our application, AWS sends us voluminous alerts about latency. I've tried changing the amount of RAM between 8G and 12G, changing the MAX_HEAP_SIZE and HEAP_NEWSIZE, repeatedly forcing a stop compaction, setting astronomical ulimit values, and praying to available gods. I'm a bit confused. We're not using super-wide rows, most things are default. EL5, Cassandra 1.1.9, Java 1.6.0 -- Drew from Zhrodague lolcat divinator d...@zhrodague.net
Re: Cass 1.1.1 and 1.1.11 Exception during compactions
nodetool scrub will repair out of order rows in the source SSTables for the compaction process. Or you can stop the node and use the offline bin/sstablescrub tool Not sure how they got there, there was a ticket for similar problems in 1.1.1 Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 30/04/2013, at 1:31 AM, Oleg Dulin oleg.du...@gmail.com wrote: We saw this exception with 1.1.1 and also with 1.1.11 (we upgraded for unrelated reasons, to fix the FD leak during slice queries) -- name of the CF replaced with * for confidentiality: 10419 ERROR [CompactionExecutor:36] 2013-04-29 07:50:49,060 AbstractCassandraDaemon.java (line 132) Exception in thread T hread[CompactionExecutor:36,1,main] 10420 java.lang.RuntimeException: Last written key DecoratedKey(138024912283272996716128964353306009224, 6138633035613062 2d61362d376330612d666531662d373738616630636265396535) = current key DecoratedKey(12706537740594940274338371890 1402082101, 64323962636163652d646561372d333039322d386166322d663064346132363963386131) writing into *-tmp-hf-7372-Data.db 10421 at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134) 10422 at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153) 10423 at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:160) 10424 at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) 10425 at org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164) 10426 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) 10427 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 10428 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) 10429 at java.util.concurrent.FutureTask.run(FutureTask.java:166) 10430 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 10431 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 10432 at java.lang.Thread.run(Thread.java:722) ANy thoughts ? Should I be concerned about data being lost ? -- Regards, Oleg Dulin NYC Java Big Data Engineer http://www.olegdulin.com/
Re: normal thread counts?
I used JMX to check current number of threads in a production cassandra machine, and it was ~27,000. That does not sound too good. My first guess would be lots of client connections. What client are you using, does it do connection pooling ? See the comments in cassandra.yaml around rpc_server_type, the default uses sync uses one thread per connection, you may be better with HSHA. But if your app is leaking connection you should probably deal with that first. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 30/04/2013, at 3:07 AM, William Oberman ober...@civicscience.com wrote: Hi, I'm having some issues. I keep getting: ERROR [GossipStage:1] 2013-04-28 07:48:48,876 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[GossipStage:1,5,main] java.lang.OutOfMemoryError: unable to create new native thread -- after a day or two of runtime. I've checked and my system settings seem acceptable: memlock=unlimited nofiles=10 nproc=122944 I've messed with heap sizes from 6-12GB (15 physical, m1.xlarge in AWS), and I keep OOM'ing with the above error. I've found some (what seem to me) to be obscure references to the stack size interacting with # of threads. If I'm understanding it correctly, to reason about Java mem usage I have to think of OS + Heap as being locked down, and the stack gets the leftovers of physical memory and each thread gets a stack. For me, the system ulimit setting on stack is 10240k (no idea if java sees or respects this setting). My -Xss for cassandra is the default (I hope, don't remember messing with it) of 180k. I used JMX to check current number of threads in a production cassandra machine, and it was ~27,000. Is that a normal thread count? Could my OOM be related to stack + number of threads, or am I overlooking something more simple? will
Re: setcompactionthroughput and setstreamthroughput have no effect
Same behavior on 1.1.3, 1.1.5 and 1.1.9. Currently: 1.2.3 On Mon, Apr 29, 2013 at 11:43 AM, Robert Coli rc...@eventbrite.com wrote: On Sun, Apr 28, 2013 at 2:28 PM, John Watson j...@disqus.com wrote: Running these 2 commands are noop IO wise: nodetool setcompactionthroughput 0 nodetool setstreamtrhoughput 0 What version of cassandra? =Rob
Re: setcompactionthroughput and setstreamthroughput have no effect
On Mon, Apr 29, 2013 at 3:52 PM, John Watson j...@disqus.com wrote: Same behavior on 1.1.3, 1.1.5 and 1.1.9. Currently: 1.2.3 (below snippets are from trunk) ./src/java/org/apache/cassandra/tools/NodeCmd.java case SETCOMPACTIONTHROUGHPUT : if (arguments.length != 1) { badUse(Missing value argument.); } probe.setCompactionThroughput(Integer.parseInt(arguments[0])); break; ./src/java/org/apache/cassandra/tools/NodeProbe.java public void setCompactionThroughput(int value) { ssProxy.setCompactionThroughputMbPerSec(value); ./src/java/org/apache/cassandra/service/StorageService.java public void setCompactionThroughputMbPerSec(int value) { DatabaseDescriptor.setCompactionThroughputMbPerSec(value); } ./src/java/org/apache/cassandra/config/DatabaseDescriptor.java public static void setCompactionThroughputMbPerSec(int value) { conf.compaction_throughput_mb_per_sec = value; } ... public static int getCompactionThroughputMbPerSec() { return conf.compaction_throughput_mb_per_sec; } ./src/java/org/apache/cassandra/db/compaction/CompactionController.java public int targetThroughput() { if (DatabaseDescriptor.getCompactionThroughputMbPerSec() 1 || StorageService.instance.isBootstrapMode()) // throttling disabled return 0; // total throughput int totalBytesPerMS = DatabaseDescriptor.getCompactionThroughputMbPerSec() * 1024 * 1024 / 1000; // per stream throughput (target bytes per MS) return totalBytesPerMS / Math.max(1, CompactionManager.instance.getActiveCompactions()) So, a value of 0 means disable throttling. ./src/java/org/apache/cassandra/utils/Throttle.java int newTargetBytesPerMS = fun.targetThroughput(); if (newTargetBytesPerMS 1) // throttling disabled return; And returning 0 from targetThroughput should result in throttling being disabled. I see in the actual throttle code this log line : logger.trace(String.format(%s actual throughput was %d bytes in %d ms: throttling for %d ms, So you could enable TRACE log level for this class to determine if it's making it into that codepath. As an aside, there is no bounds checking when setting configuration options via JMX. Be careful. https://issues.apache.org/jira/browse/CASSANDRA-4967 =Rob
Re: How to use Write Consistency 'ANY' with SSTABLELOADER - DSE Cassandra 1.1.9
On Mon, Apr 29, 2013 at 1:17 PM, aaron morton aa...@thelastpickle.com wrote: Bulk Loader does not use CL, it's more like a repair / bootstrap. If you have to skip a node then use repair. The bulk loader (sstableloader) can ignore replica nodes via -i option : ./src/java/org/apache/cassandra/tools/BulkLoader.java options.addOption(i, IGNORE_NODES_OPTION, NODES, don't stream to this (comma separated) list of nodes); This is one of the major differences between the bulkLoad JMX call and sstableloader, if bulkLoad fails on a single replica you have to send to all replicas again. Also of note, it is confusing that JMX bulkLoad operation actually uses sstableloader class and not BulkLoader class, and sstableloader tool uses BulkLoader class which . Perhaps this line in BulkLoader java should be changed from : private static final String TOOL_NAME = sstableloader; to private static final String TOOL_NAME = bulkloader; ? =Rob
Re: Adding nodes in 1.2 with vnodes requires huge disks
Opened a ticket: https://issues.apache.org/jira/browse/CASSANDRA-5525 On Mon, Apr 29, 2013 at 2:24 AM, aaron morton aa...@thelastpickle.comwrote: is this understanding correct we had a 12 node cluster with 256 vnodes on each node (upgraded from 1.1), we added two additional nodes that streamed so much data (600+Gb when other nodes had 150-200GB) during the joining phase that they filled their local disks and had to be killed ? Can you raise a ticket on https://issues.apache.org/jira/browse/CASSANDRA and update the thread with the ticket number. Can you show the output from nodetool status so we can get a feel for the ring? Can you include the logs from one of the nodes that failed to join ? Thanks - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 29/04/2013, at 10:01 AM, John Watson j...@disqus.com wrote: On Sun, Apr 28, 2013 at 2:19 PM, aaron morton aa...@thelastpickle.comwrote: We're going to try running a shuffle before adding a new node again... maybe that will help I don't think hurt but I doubt it will help. We had to bail on shuffle since we need to add capacity ASAP and not in 20 days. It seems when new nodes join, they are streamed *all* sstables in the cluster. How many nodes did you join, what was the num_tokens ? Did you notice streaming from all nodes (in the logs) or are you saying this in response to the cluster load increasing ? Was only adding 2 nodes at the time (planning to add a total of 12.) Starting with a cluster of 12, but now 11 since 1 node entered some weird state when one of the new nodes ran out disk space. num_tokens is set to 256 on all nodes. Yes, nearly all current nodes were streaming to the new ones (which was great until disk space was an issue.) The purple line machine, I just stopped the joining process because the main cluster was dropping mutation messages at this point on a few nodes (and it still had dozens of sstables to stream.) Which were the new nodes ? Can you show the output from nodetool status? The new nodes are the purple and gray lines above all the others. nodetool status doesn't show joining nodes. I think I saw a bug already filed for this but I can't seem to find it. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 27/04/2013, at 9:35 AM, Bryan Talbot btal...@aeriagames.com wrote: I believe that nodetool rebuild is used to add a new datacenter, not just a new host to an existing cluster. Is that what you ran to add the node? -Bryan On Fri, Apr 26, 2013 at 1:27 PM, John Watson j...@disqus.com wrote: Small relief we're not the only ones that had this issue. We're going to try running a shuffle before adding a new node again... maybe that will help - John On Fri, Apr 26, 2013 at 5:07 AM, Francisco Nogueira Calmon Sobral fsob...@igcorp.com.br wrote: I am using the same version and observed something similar. I've added a new node, but the instructions from Datastax did not work for me. Then I ran nodetool rebuild on the new node. After finished this command, it contained two times the load of the other nodes. Even when I ran nodetool cleanup on the older nodes, the situation was the same. The problem only seemed to disappear when nodetool repair was applied to all nodes. Regards, Francisco Sobral. On Apr 25, 2013, at 4:57 PM, John Watson j...@disqus.com wrote: After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and running upgradesstables, I figured it would be safe to start adding nodes to the cluster. Guess not? It seems when new nodes join, they are streamed *all* sstables in the cluster. https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png The gray the line machine ran out disk space and for some reason cascaded into errors in the cluster about 'no host id' when trying to store hints for it (even though it hadn't joined yet). The purple line machine, I just stopped the joining process because the main cluster was dropping mutation messages at this point on a few nodes (and it still had dozens of sstables to stream.) I followed this: http://www.datastax.com/docs/1.2/operations/add_replace_nodes Is there something missing in that documentation? Thanks, John
Kundera 2.5 released
Hi All, We are happy to announce the release of Kundera 2.5. Kundera is a JPA 2.0 compliant, object-datastore mapping library for NoSQL datastores. The idea behind Kundera is to make working with NoSQL databases drop-dead simple and fun. It currently supports Cassandra, HBase, MongoDB, Redis, OracleNoSQL, Neo4j and relational databases. *Major Changes:* ** 1) Support for OracleNoSQL ( http://www.oracle.com/technetwork/products/nosqldb/overview/index.html). See https://github.com/impetus-opensource/Kundera/wiki/Kundera-OracleNoSQL. *[Please use the Oracle NoSQL jars from the Oracle NoSQL distribution at http://download.oracle.com/otn-pub/otn_software/nosql-database/kv-ce-2.0.26.zip. For the convenience of those who want to build Kundera from source we have additionally placed the jars at http://kundera.googlecode.com/svn/maven2/maven-missing-resources/]* 2) CQL 3.0 interoperability with thrift. 3) Performance fixes. *Github Bug Fixes:* https://github.com/impetus-opensource/Kundera/issues/240 https://github.com/impetus-opensource/Kundera/issues/232 https://github.com/impetus-opensource/Kundera/issues/231 https://github.com/impetus-opensource/Kundera/issues/230 https://github.com/impetus-opensource/Kundera/issues/226 https://github.com/impetus-opensource/Kundera/issues/221 https://github.com/impetus-opensource/Kundera/issues/218 https://github.com/impetus-opensource/Kundera/issues/214 https://github.com/impetus-opensource/Kundera/issues/209 https://github.com/impetus-opensource/Kundera/issues/207 https://github.com/impetus-opensource/Kundera/issues/196 https://github.com/impetus-opensource/Kundera/issues/193 https://github.com/impetus-opensource/Kundera/issues/190 https://github.com/impetus-opensource/Kundera/issues/188 https://github.com/impetus-opensource/Kundera/issues/182 https://github.com/impetus-opensource/Kundera/issues/181 *How to Download** * To download, use or contribute to Kundera, visit: http://github.com/impetus-opensource/Kundera Latest released tag version is 2.5. Kundera maven libraries are now available at: https://oss.sonatype.org/content/repositories/releases/com/impetus Sample codes and examples for using Kundera can be found here: http://github.com/impetus-opensource/Kundera-Examples and https://github.com/impetus-opensource/Kundera/tree/trunk/kundera-tests *Survey/Feedback:* http://www.surveymonkey.com/s/BMB9PWG Thank you all for your contributions and using Kundera! Sincerely, Kundera Team -- image001.gif
Re: Deletes, null values
I thought that C* had no null values... I use a lot of CF in which only the columns name are filled up and I request a range of column to see which references (like 1228#16866) exists. So I would like those column to simply disappear from the table. Cassandra does not store null values. The output form cqlsh is showing you the values for the columns you requests and using null to indicate there was no value. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 30/04/2013, at 3:52 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: I created it almost a year ago with cassandra-cli. Now show_schema returns: create column family myCF with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and populate_io_cache_on_flush = false and gc_grace = 864000 and min_compaction_threshold = 4 and max_compaction_threshold = 12 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' and caching = 'KEYS_ONLY' and bloom_filter_fp_chance = 0.01 and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'}; The output looks correct to me. CQL table return values, including null, for all of the selected columns. I thought that C* had no null values... I use a lot of CF in which only the columns name are filled up and I request a range of column to see which references (like 1228#16866) exists. So I would like those column to simply disappear from the table. Alain 2013/4/28 aaron morton aa...@thelastpickle.com What's your table definition ? select '1228#16857','1228#16866','1228#16875','1237#16544','1237#16553' from myCF where key = 'all'; The output looks correct to me. CQL table return values, including null, for all of the selected columns. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 27/04/2013, at 12:48 AM, Sorin Manolache sor...@gmail.com wrote: On 2013-04-26 11:55, Alain RODRIGUEZ wrote: Of course: From CQL 2 (cqlsh -2): delete '183#16684','183#16714','183#16717' from myCF where key = 'all'; And selecting this data as follow gives me the result above: select '1228#16857','1228#16866','1228#16875','1237#16544','1237#16553' from myCF where key = 'all'; From thrift (phpCassa client): $pool = new ConnectionPool('myKeyspace',array('192.168.100.201'),6,0,3,3); $my_cf= new ColumnFamily($pool, 'myCF', true, true, ConsistencyLevel::QUORUM, ConsistencyLevel::QUORUM); $my_cf-remove('all', array('1228#16857','1228#16866','1228#16875')); I see. I'm sorry, I know nothing about phpCassa. I use batch_mutation with deletions and it works. But I guess phpCassa must use the same thrift primitives. Sorin 2013/4/25 Sorin Manolache sor...@gmail.com mailto:sor...@gmail.com On 2013-04-25 11:48, Alain RODRIGUEZ wrote: Hi, I tried to delete some columns using cql2 as well as thrift on C*1.2.2 and instead of being unreachable, deleted columns have a null value. I am using no value in this CF, the only information I use is the existence of the column. So when I select all the column for a given key I have the following returned: 1228#16857 | 1228#16866 | 1228#16875 | 1237#16544 | 1237#16553 ---+--__+--+--__-+__-- null | null | null | | This is quite annoying since my app thinks that I have 5 columns there when I should have 2 only. I first thought that this was a visible marker of tombstones but they didn't vanish after a major compaction. How can I get rid of these null/ghost columns and why does it happen ? I do something similar but I don't see null values. Could you please post the code where you delete the columns? Sorin