Hinted handoff bug?
Hi, We,re running cassandra 1.0.3. I've done some testing with 2 nodes (node A, node B), replication factor 2. I take node A down, writing some data to node B and then take node A up. Sometimes hints aren't delivered when node A comes up. I've done some debugging in org.apache.cassandra.db.HintedHandOffManager and sometimes node B ends up in a strange state in method org.apache.cassandra.db.HintedHandOffManager.deliverHints(final InetAddress to), where org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries already has node A in it's Set and therefore no hints will ever be delivered to node A. The only reason for this that I can see is that in org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(InetAddress endpoint) the hintStore.isEmpty() check returns true and the endpoint (node A) isn't removed from org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries. Then no hints will ever be delivered again until node B is restarted. During what conditions will hintStore.isEmpty() return true? Shouldn't the hintStore.isEmpty() check be inside the try {} finally{} clause, removing the endpoint from queuedDeliveries in the finally block? public void deliverHints(final InetAddress to) { logger_.debug(deliverHints to {}, to); *if (!queuedDeliveries.add(to))* return; ... } private void deliverHintsToEndpoint(InetAddress endpoint) throws IOException, DigestMismatchException, InvalidRequestException, TimeoutException, { ColumnFamilyStore hintStore = Table.open(Table.SYSTEM_TABLE).getColumnFamilyStore(HINTS_CF); * if (hintStore.isEmpty())* return; // nothing to do, don't confuse users by logging a no-op handoff try { .. } finally { *queuedDeliveries.remove(endpoint);* } } Regards /Fredrik
Re: Hinted handoff bug?
You're right, good catch. Do you mind opening a ticket on jira (https://issues.apache.org/jira/browse/CASSANDRA)? -- Sylvain On Thu, Dec 1, 2011 at 10:03 AM, Fredrik L Stigbäck fredrik.l.stigb...@sitevision.se wrote: Hi, We,re running cassandra 1.0.3. I've done some testing with 2 nodes (node A, node B), replication factor 2. I take node A down, writing some data to node B and then take node A up. Sometimes hints aren't delivered when node A comes up. I've done some debugging in org.apache.cassandra.db.HintedHandOffManager and sometimes node B ends up in a strange state in method org.apache.cassandra.db.HintedHandOffManager.deliverHints(final InetAddress to), where org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries already has node A in it's Set and therefore no hints will ever be delivered to node A. The only reason for this that I can see is that in org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(InetAddress endpoint) the hintStore.isEmpty() check returns true and the endpoint (node A) isn't removed from org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries. Then no hints will ever be delivered again until node B is restarted. During what conditions will hintStore.isEmpty() return true? Shouldn't the hintStore.isEmpty() check be inside the try {} finally{} clause, removing the endpoint from queuedDeliveries in the finally block? public void deliverHints(final InetAddress to) { logger_.debug(deliverHints to {}, to); if (!queuedDeliveries.add(to)) return; ... } private void deliverHintsToEndpoint(InetAddress endpoint) throws IOException, DigestMismatchException, InvalidRequestException, TimeoutException, { ColumnFamilyStore hintStore = Table.open(Table.SYSTEM_TABLE).getColumnFamilyStore(HINTS_CF); if (hintStore.isEmpty()) return; // nothing to do, don't confuse users by logging a no-op handoff try { .. } finally { queuedDeliveries.remove(endpoint); } } Regards /Fredrik
Insufficient disk space to flush
Hello everyone, 4 node Cassandra 0.8.5 cluster with RF =2. One node started throwing exceptions in its log: ERROR 10:02:46,837 Fatal exception in thread Thread[FlushWriter:1317,5,main] java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk space to flush 17296 bytes at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.RuntimeException: Insufficient disk space to flush 17296 bytes at org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:714) at org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2301) at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246) at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49) at org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more Checked disk and obviously it's 100% full. How do I recover from this without loosing the data? I've got plenty of space on the other nodes, so I thought of doing a decommission which I understand reassigns ranges to the other nodes and replicates data to them. After that's done I plan on manually deleting the data on the node and then joining in the same cluster position with auto-bootstrap turned off so that I won't get back the old data and I can continue getting new data with the node. Note, I would like to have 4 nodes in because the other three barely take the input load alone. These are just long running tests until I get some better machines. On strange thing I found is that the data folder on the ndoe that filled up the disk is 150 GB (as measured with du) while the data folder on all other 3 nodes is 50 GB. At the same time, DataStax OpsCenter shows a size of around 50GB for all 4 nodes. I though that the node was making a major compaction at which time it filled up the diskbut even that doesn't make sense because shouldn't a major compaction just be capable of doubling the size, not triple-ing it? Doesn anyone know how to explain this behavior? Thanks, Alex
Re: Hinted handoff bug?
Yes, I'll do that. /Fredrik Sylvain Lebresne skrev 2011-12-01 11:10: You're right, good catch. Do you mind opening a ticket on jira (https://issues.apache.org/jira/browse/CASSANDRA)? -- Sylvain On Thu, Dec 1, 2011 at 10:03 AM, Fredrik L Stigbäck fredrik.l.stigb...@sitevision.se wrote: Hi, We,re running cassandra 1.0.3. I've done some testing with 2 nodes (node A, node B), replication factor 2. I take node A down, writing some data to node B and then take node A up. Sometimes hints aren't delivered when node A comes up. I've done some debugging in org.apache.cassandra.db.HintedHandOffManager and sometimes node B ends up in a strange state in method org.apache.cassandra.db.HintedHandOffManager.deliverHints(final InetAddress to), where org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries already has node A in it's Set and therefore no hints will ever be delivered to node A. The only reason for this that I can see is that in org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(InetAddress endpoint) the hintStore.isEmpty() check returns true and the endpoint (node A) isn't removed from org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries. Then no hints will ever be delivered again until node B is restarted. During what conditions will hintStore.isEmpty() return true? Shouldn't the hintStore.isEmpty() check be inside the try {} finally{} clause, removing the endpoint from queuedDeliveries in the finally block? public void deliverHints(final InetAddress to) { logger_.debug(deliverHints to {}, to); if (!queuedDeliveries.add(to)) return; ... } private void deliverHintsToEndpoint(InetAddress endpoint) throws IOException, DigestMismatchException, InvalidRequestException, TimeoutException, { ColumnFamilyStore hintStore = Table.open(Table.SYSTEM_TABLE).getColumnFamilyStore(HINTS_CF); if (hintStore.isEmpty()) return; // nothing to do, don't confuse users by logging a no-op handoff try { .. } finally { queuedDeliveries.remove(endpoint); } } Regards /Fredrik
Re: Strategies to maintain counters sorted row ?
In my application, I need to store the total scores/reputation of the users as counters and want to show sorted(by score) lists of users. Also want to implement a name search facility on top of that. Could you suggest any schema to achieve that using cassandra. On Wed, Nov 30, 2011 at 12:53 PM, Aditya ady...@gmail.com wrote: I know it is not possible to sort columns in a row by counter values so what are the other strategies to maintain a sorted list (of counters) in cassandra? Could you propose some schema that might be helpful to achieve this ? Or do I need to retrieve thousands of columns each time and do the sorting at application level ?
RE: [RELEASE] Apache Cassandra 1.0.5 released
Unfortunately no. Second time I did the test: Restore 4 nodes to a new Cassandra cluster (0.7.8) Upgrade to 1.0.0 Run nodetool scrub on each node after upgrade before upgrading next node. Upgrade to 1.0.3 Upgrade to 1.0.5 Run nodetool repair on all nodes. All process was successful. What does this error means and how is it affecting my cluster? Do you think it is safe to upgrade to 1.0.5 and disregard this error? Thanks Michael -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, December 01, 2011 2:10 AM To: user@cassandra.apache.org Subject: Re: [RELEASE] Apache Cassandra 1.0.5 released I don't think so. That code hasn't changed in a long time. Is it reproducible? On Wed, Nov 30, 2011 at 2:46 PM, Michael Vaknine micha...@citypath.com wrote: Hi, Upgrade 1.0.3 to 1.0.5 I have this errors TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 java.lang.AssertionError TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.ColumnFamilyStore.maybeSwitchMemtable(ColumnFamilySt ore.java:671) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java: 745) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.ColumnFamilyStore.forceBlockingFlush(ColumnFamilySto re.java:750) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.index.keys.KeysIndex.forceBlockingFlush(KeysIndex.ja va:119) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.index.SecondaryIndexManager.flushIndexesBlocking(Sec ondaryIndexManager.java:258) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndex es(SecondaryIndexManager.java:123) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSessi on.java:151) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReade r.java:103) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection. java:184) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.jav a:81) Is this another regression? Thanks Michael -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Wednesday, November 30, 2011 9:43 PM To: user@cassandra.apache.org Subject: Re: [RELEASE] Apache Cassandra 1.0.5 released On Wed, Nov 30, 2011 at 1:29 PM, Michael Vaknine micha...@citypath.com wrote: The files are not on the site The requested URL /apache//cassandra/1.0.5/apache-cassandra-1.0.5-bin.tar.gz was not found on this server. It takes the mirrors some time to sync. -Brandon -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: [RELEASE] Apache Cassandra 1.0.5 released
On Thu, Dec 1, 2011 at 11:58 AM, Michael Vaknine micha...@citypath.com wrote: Unfortunately no. Second time I did the test: Restore 4 nodes to a new Cassandra cluster (0.7.8) Upgrade to 1.0.0 Run nodetool scrub on each node after upgrade before upgrading next node. Upgrade to 1.0.3 Upgrade to 1.0.5 Run nodetool repair on all nodes. All process was successful. What does this error means and how is it affecting my cluster? Do you think it is safe to upgrade to 1.0.5 and disregard this error? It's a race condition between the flush of a memtable and the flush of it's secondary indexes. It's not specific to 1.0.5 at all (it's in 0.8, probably since 0.8.0), and it's very unlikely to happen (you're the first to report it). The error has not real effect (it cannot result in data corruption or data loss). I've created https://issues.apache.org/jira/browse/CASSANDRA-3547 to fix. -- Sylvain Thanks Michael -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, December 01, 2011 2:10 AM To: user@cassandra.apache.org Subject: Re: [RELEASE] Apache Cassandra 1.0.5 released I don't think so. That code hasn't changed in a long time. Is it reproducible? On Wed, Nov 30, 2011 at 2:46 PM, Michael Vaknine micha...@citypath.com wrote: Hi, Upgrade 1.0.3 to 1.0.5 I have this errors TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 java.lang.AssertionError TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.ColumnFamilyStore.maybeSwitchMemtable(ColumnFamilySt ore.java:671) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java: 745) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.ColumnFamilyStore.forceBlockingFlush(ColumnFamilySto re.java:750) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.index.keys.KeysIndex.forceBlockingFlush(KeysIndex.ja va:119) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.index.SecondaryIndexManager.flushIndexesBlocking(Sec ondaryIndexManager.java:258) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndex es(SecondaryIndexManager.java:123) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSessi on.java:151) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReade r.java:103) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection. java:184) TST-Cass2 ERROR [Thread-58] 2011-11-30 20:40:17,449 at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.jav a:81) Is this another regression? Thanks Michael -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Wednesday, November 30, 2011 9:43 PM To: user@cassandra.apache.org Subject: Re: [RELEASE] Apache Cassandra 1.0.5 released On Wed, Nov 30, 2011 at 1:29 PM, Michael Vaknine micha...@citypath.com wrote: The files are not on the site The requested URL /apache//cassandra/1.0.5/apache-cassandra-1.0.5-bin.tar.gz was not found on this server. It takes the mirrors some time to sync. -Brandon -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
NetworkTopologyStrategy bug?
Assume for now we have 1 DC and 1 rack with 3 nodes. Ring will look like: (we use own snitch, which returns DC=0, Rack=0 for this case). AddressDC Rack Token 113427455640312821154458202477256070484 10.0.0.1 0 0 0 10.0.0.2 0 0 56713727820156410577229101238628035242 10.0.0.3 0 0 113427455640312821154458202477256070484 Schema: ReplicaPlacementStrategy=NetworkTopologyStrategy, options: [0:2] (2 replicas in DC 0). When trying to run cleanup (same problem with repair), Cassandra reports: From 10.0.0.1: DEBUG [time] 10.0.0.2,10.0.0.3 endpoints in datacenter 0 for token 0 DEBUG [time] 10.0.0.2,10.0.0.3 endpoints in datacenter 0 for token 56713727820156410577229101238628035242 DEBUG [time] 10.0.0.3,10.0.0.2 endpoints in datacenter 0 for token 113427455640312821154458202477256070484 INFO [time] Cleanup cannot run before a node has joined the ring From 10.0.0.2: DEBUG [time] 10.0.0.1,10.0.0.3 endpoints in datacenter 0 for token 0 DEBUG [time] 10.0.0.1,10.0.0.3 endpoints in datacenter 0 for token 56713727820156410577229101238628035242 DEBUG [time] 10.0.0.3,10.0.0.1 endpoints in datacenter 0 for token 113427455640312821154458202477256070484 INFO [time] Cleanup cannot run before a node has joined the ring From 10.0.0.3: DEBUG [time] 10.0.0.1,10.0.0.2 endpoints in datacenter 0 for token 0 DEBUG [time] 10.0.0.1,10.0.0.2 endpoints in datacenter 0 for token 56713727820156410577229101238628035242 DEBUG [time] 10.0.0.2,10.0.0.1 endpoints in datacenter 0 for token 113427455640312821154458202477256070484 INFO [time] Cleanup cannot run before a node has joined the ring For me this means, that one node thinks that whole data range is on other two nodes. As a result: WRITE request with any key/any token sent to 10.0.0.1 controller will be forwarded and saved on 10.0.0.2 and 10.0.0.3 READ request with CL.One with any key/any token sent to 10.0.0.2 controller will be forwarded to 10.0.0.1 or 10.0.0.3, and since 10.0.0.1 can't have data for write above, some requests fails, some don't (if 10.0.0.3 answers). More of it, every READ request to any node will be forwarded to other node. That what we have right now with 0.8.6 and up to 1.0.5 as with 3 nodes in 1 DC, as with 8x2 nodes. Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063. Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania [Adform news]http://www.adform.com/ [Visit us!] Follow: [twitter]http://twitter.com/#!/adforminsider Visit our bloghttp://www.adform.com/site/blog Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. inline: signature-logo29.pnginline: dm-exco4823.pnginline: tweet18be.png
RE: NetworkTopologyStrategy bug?
Sorry, the bug was in our snitch. We're using getHostName() instead of getCanonicalHostName() to determine DC Rack and since for local it returns alias, instead of reverse DNS, DC Rack numbers are not as expected. Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063. Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania [Adform news]http://www.adform.com/ [Visit us!] Follow: [twitter]http://twitter.com/#!/adforminsider Visit our bloghttp://www.adform.com/site/blog Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com] Sent: Thursday, December 01, 2011 14:05 To: user@cassandra.apache.org Subject: NetworkTopologyStrategy bug? Assume for now we have 1 DC and 1 rack with 3 nodes. Ring will look like: (we use own snitch, which returns DC=0, Rack=0 for this case). AddressDC Rack Token 113427455640312821154458202477256070484 10.0.0.1 0 0 0 10.0.0.2 0 0 56713727820156410577229101238628035242 10.0.0.3 0 0 113427455640312821154458202477256070484 Schema: ReplicaPlacementStrategy=NetworkTopologyStrategy, options: [0:2] (2 replicas in DC 0). When trying to run cleanup (same problem with repair), Cassandra reports: From 10.0.0.1: DEBUG [time] 10.0.0.2,10.0.0.3 endpoints in datacenter 0 for token 0 DEBUG [time] 10.0.0.2,10.0.0.3 endpoints in datacenter 0 for token 56713727820156410577229101238628035242 DEBUG [time] 10.0.0.3,10.0.0.2 endpoints in datacenter 0 for token 113427455640312821154458202477256070484 INFO [time] Cleanup cannot run before a node has joined the ring From 10.0.0.2: DEBUG [time] 10.0.0.1,10.0.0.3 endpoints in datacenter 0 for token 0 DEBUG [time] 10.0.0.1,10.0.0.3 endpoints in datacenter 0 for token 56713727820156410577229101238628035242 DEBUG [time] 10.0.0.3,10.0.0.1 endpoints in datacenter 0 for token 113427455640312821154458202477256070484 INFO [time] Cleanup cannot run before a node has joined the ring From 10.0.0.3: DEBUG [time] 10.0.0.1,10.0.0.2 endpoints in datacenter 0 for token 0 DEBUG [time] 10.0.0.1,10.0.0.2 endpoints in datacenter 0 for token 56713727820156410577229101238628035242 DEBUG [time] 10.0.0.2,10.0.0.1 endpoints in datacenter 0 for token 113427455640312821154458202477256070484 INFO [time] Cleanup cannot run before a node has joined the ring For me this means, that one node thinks that whole data range is on other two nodes. As a result: WRITE request with any key/any token sent to 10.0.0.1 controller will be forwarded and saved on 10.0.0.2 and 10.0.0.3 READ request with CL.One with any key/any token sent to 10.0.0.2 controller will be forwarded to 10.0.0.1 or 10.0.0.3, and since 10.0.0.1 can't have data for write above, some requests fails, some don't (if 10.0.0.3 answers). More of it, every READ request to any node will be forwarded to other node. That what we have right now with 0.8.6 and up to 1.0.5 as with 3 nodes in 1 DC, as with 8x2 nodes. Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com Phone: +370 5 212 3063. Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania [Adform news]http://www.adform.com/ [Visit us!] Follow: [twitter]http://twitter.com/#!/adforminsider Visit our bloghttp://www.adform.com/site/blog Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. inline: image001.pnginline: image002.pnginline: image003.pnginline: signature-logo46e2.pnginline: dm-exco578c.pnginline: tweet7db.png
RE: [RELEASE] Apache Cassandra 1.0.5 released
Hi, After upgrading cluster to 1.0.5 I am having problems connecting to the cluster using hector Any help will be aprichiated. Thanks Michael me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException() at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(Exceptio nsTranslatorImpl.java:42) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$19.execute(KeyspaceServ iceImpl.java:744) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$19.execute(KeyspaceServ iceImpl.java:726) at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.jav a:101) at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(H ConnectionManager.java:233) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(Key spaceServiceImpl.java:131) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.getIndexedSlices(Keyspa ceServiceImpl.java:748) at me.prettyprint.cassandra.model.IndexedSlicesQuery$1.doInKeyspace(IndexedSlic esQuery.java:140) at me.prettyprint.cassandra.model.IndexedSlicesQuery$1.doInKeyspace(IndexedSlic esQuery.java:131) at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeas ure(KeyspaceOperationCallback.java:20) at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace .java:85) at me.prettyprint.cassandra.model.IndexedSlicesQuery.execute(IndexedSlicesQuery .java:130) at com.lookin2.cassandra.access.CassandraImpl.executeQuery(CassandraImpl.java:2 20) at com.lookin2.cassandra.access.CassandraImpl.privateGet(CassandraImpl.java:327 ) at com.lookin2.cassandra.access.Cassandra.get(Cassandra.java:67) at com.lookin2.cassandra.entities.BaseCass.getRows(BaseCass.java:150) at com.lookin2.cassandra.entities.CassSet.populateOne(CassSet.java:148) at com.lookin2.cassandra.entities.CassSet.populate(CassSet.java:117) at com.lookin2.cassandra.ontologies.CategoryAccess.getCategories(CategoryAccess .java:20) at com.lookin2.cassandra.ontologies.CategoryOntologyCAO.loadAllNames(CategoryOn tologyCAO.java:23) at com.lookin2.common.dawg.AbstractNamedItemTrie.init(AbstractNamedItemTrie.jav a:69) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory .invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1536) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory .invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1477) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory .initializeBean(AbstractAutowireCapableBeanFactory.java:1409) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory .doCreateBean(AbstractAutowireCapableBeanFactory.java:519) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory .createBean(AbstractAutowireCapableBeanFactory.java:456) at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(Ab stractBeanFactory.java:291) at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSi ngleton(DefaultSingletonBeanRegistry.java:222) at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(Abst ractBeanFactory.java:288) at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(Abstra ctBeanFactory.java:190) at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInst antiateSingletons(DefaultListableBeanFactory.java:574) at org.springframework.context.support.AbstractApplicationContext.finishBeanFac toryInitialization(AbstractApplicationContext.java:895) at org.springframework.context.support.AbstractApplicationContext.refresh(Abstr actApplicationContext.java:425) at org.springframework.web.context.ContextLoader.createWebApplicationContext(Co ntextLoader.java:276) at org.springframework.web.context.ContextLoader.initWebApplicationContext(Cont extLoader.java:197) at org.springframework.web.context.ContextLoaderListener.contextInitialized(Con textLoaderListener.java:47) at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java: 3972) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4467) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at org.apache.catalina.core.StandardHost.start(StandardHost.java:785) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at
is there a no disk storage mode ?
Hi, I want to use Cassandra for (fast) unit testing with a small number of data. So, I imagined the Cassandra embedded server I plan to use would start faster and would be more portable (because no file path depending on OS), without disk storage mode (so, diskless if you want). Is there some no disk storage mode for Cassandra ? Thanks. Regards, Dominique
Re: Insufficient disk space to flush
If you are writing data with QUORUM or ALL you should be safe to restart cassandra on that node. If the extra space is all from *tmp* files from compaction they will get deleted at startup. You will then need to run repair on that node to get back any data that was missed while it was full. If your commit log was on a different device you may not even have lost much. -Jeremiah On 12/01/2011 04:16 AM, Alexandru Dan Sicoe wrote: Hello everyone, 4 node Cassandra 0.8.5 cluster with RF =2. One node started throwing exceptions in its log: ERROR 10:02:46,837 Fatal exception in thread Thread[FlushWriter:1317,5,main] java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk space to flush 17296 bytes at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.RuntimeException: Insufficient disk space to flush 17296 bytes at org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:714) at org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2301) at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246) at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49) at org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more Checked disk and obviously it's 100% full. How do I recover from this without loosing the data? I've got plenty of space on the other nodes, so I thought of doing a decommission which I understand reassigns ranges to the other nodes and replicates data to them. After that's done I plan on manually deleting the data on the node and then joining in the same cluster position with auto-bootstrap turned off so that I won't get back the old data and I can continue getting new data with the node. Note, I would like to have 4 nodes in because the other three barely take the input load alone. These are just long running tests until I get some better machines. On strange thing I found is that the data folder on the ndoe that filled up the disk is 150 GB (as measured with du) while the data folder on all other 3 nodes is 50 GB. At the same time, DataStax OpsCenter shows a size of around 50GB for all 4 nodes. I though that the node was making a major compaction at which time it filled up the diskbut even that doesn't make sense because shouldn't a major compaction just be capable of doubling the size, not triple-ing it? Doesn anyone know how to explain this behavior? Thanks, Alex
decommissioned not show being gossipped
Hi, We have 2 nodes have been decommissioned from the cluster running 1.0.3. However, the live nodes still making references to the decommissioned nodes 3 days after the nodes were decommissioned. Nodetool does not show the decommissioned noes. Here are sample log entries: INFO [GossipStage:1] 2011-12-01 18:20:37,882 Gossiper.java (line 759) InetAddress /x.x.x.x is now dead. INFO [GossipStage:1] 2011-12-01 18:20:37,882 StorageService.java (line 1039) Removing token 170141183460469231731687303715884105727 for /x.x.x.x What might be causing this issue?Any chance this is related to https://issues.apache.org/jira/browse/CASSANDRA-3243? Thanks! Huy -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/decommissioned-not-show-being-gossipped-tp7051526p7051526.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: decommissioned not show being gossipped
On Thu, Dec 1, 2011 at 12:26 PM, huyle hu...@springpartners.com wrote: Hi, We have 2 nodes have been decommissioned from the cluster running 1.0.3. However, the live nodes still making references to the decommissioned nodes 3 days after the nodes were decommissioned. Nodetool does not show the decommissioned noes. Here are sample log entries: How in sync are the clocks in the cluster? -Brandon
MapReduce on Cassandra using Ruby and REST!
I know I've been spamming the list a bit with new features for Virgil, but this one is actually really cool... Enamored with what Riak provides as far as map/reduce via HTTP, http://wiki.basho.com/MapReduce.html#MapReduce-via-the-HTTP-API We implemented the same thing for Virgil/Cassandra. Simply write a script in ruby, then POST that script to Virgil. Virgil will kick off the Hadoop job against Cassandra using column families for input and output. Right now it only supports Ruby, but there is nothing preventing us from adding support for other languages. We'll probably throw Groovy in there as well. Check it out when you get a chance... http://brianoneill.blogspot.com/2011/12/hadoopmapreduce-on-cassandra-using-ruby.html After we add the ability to push the job to an existing Hadoop cluster, we'll move on to PIG support and CRUD operations in the GUI. stay tuned, -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
garbage collecting tombstones
Hello, Is 'garbage collecting tombstones ' a different operation than the JVM GC. Garbage collecting tombstones is controlled by gc_grace_seconds which by default is set to 10 days. But the traditional GC seems to happen much more frequently (when observed through jconsole) ? How can I force the garbage collecting tombstones to happen ad-hoc when I want to ? Thanks.
Re: decommissioned not show being gossipped
The clocks are very sync'ed between the nodes as they have ntp running hitting our time servers. Huy -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/decommissioned-nodes-still-being-gossipped-tp7051526p7051701.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: decommissioned not show being gossipped
On Thu, Dec 1, 2011 at 1:10 PM, huyle hu...@springpartners.com wrote: The clocks are very sync'ed between the nodes as they have ntp running hitting our time servers. Maybe they weren't 3 days after the token left, which https://issues.apache.org/jira/browse/CASSANDRA-2961 requires. If a node sees the token you can removetoken it, otherwise you'll need https://issues.apache.org/jira/browse/CASSANDRA-3337 -Brandon
Re: is there a no disk storage mode ?
I am not sure of no disk option, but as for fast unit testing, you can try RAM disk for storage. Huy -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/is-there-a-no-disk-storage-mode-tp7051192p7051728.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Insufficient disk space to flush
Hi Jeremiah, My commitlog was indeed on another disk. I did what you said and yes the node restart brings back the disk size to the around 50 GB I was expecting. Still I do not understand how the node managed to get itself in the situation of having these tmp files? Could you clarify what these are, how they are produced and why? I've tried to find a clear definition but all I could come up with is hints that they are produced during compaction. I also found a thread that described a similar problem: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Errors-During-Compaction-td5953493.html as described there it seems like compaction fails and tmp files don't get cleaned up until they fill the disk. Is this what happened in my case? Compactions did not finish properly because the disk utilization was more than half and then more and more files tmp started getting accumulated at each other attempt. The Cassandra log would indicate this because I get many of these: ERROR [CompactionExecutor:22850] 2011-12-01 04:12:15,200 CompactionManager.java (line 513) insufficie nt space to compact even the two smallest files, aborting before I started getting many of these: ERROR [FlushWriter:283] 2011-12-01 04:12:22,917 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[FlushWriter:283,5,main] java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk space to flush 42531 bytes I just want to clearly understand what happened. Thanks, Alex On Thu, Dec 1, 2011 at 6:58 PM, Jeremiah Jordan jeremiah.jor...@morningstar.com wrote: If you are writing data with QUORUM or ALL you should be safe to restart cassandra on that node. If the extra space is all from *tmp* files from compaction they will get deleted at startup. You will then need to run repair on that node to get back any data that was missed while it was full. If your commit log was on a different device you may not even have lost much. -Jeremiah On 12/01/2011 04:16 AM, Alexandru Dan Sicoe wrote: Hello everyone, 4 node Cassandra 0.8.5 cluster with RF =2. One node started throwing exceptions in its log: ERROR 10:02:46,837 Fatal exception in thread Thread[FlushWriter:1317,5,main] java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk space to flush 17296 bytes at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.RuntimeException: Insufficient disk space to flush 17296 bytes at org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:714) at org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2301) at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246) at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49) at org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more Checked disk and obviously it's 100% full. How do I recover from this without loosing the data? I've got plenty of space on the other nodes, so I thought of doing a decommission which I understand reassigns ranges to the other nodes and replicates data to them. After that's done I plan on manually deleting the data on the node and then joining in the same cluster position with auto-bootstrap turned off so that I won't get back the old data and I can continue getting new data with the node. Note, I would like to have 4 nodes in because the other three barely take the input load alone. These are just long running tests until I get some better machines. On strange thing I found is that the data folder on the ndoe that filled up the disk is 150 GB (as measured with du) while the data folder on all other 3 nodes is 50 GB. At the same time, DataStax OpsCenter shows a size of around 50GB for all 4 nodes. I though that the node was making a major compaction at which time it filled up the diskbut even that doesn't make sense because shouldn't a major compaction just be capable of doubling the size, not triple-ing it? Doesn anyone know how to explain this behavior? Thanks, Alex
Re: is there a no disk storage mode ?
Hi Dominique, I don't think there is a way to run cassandra without disk storage. But running it embedded can be very useful for unit testing. I'm using cassandra-unit (https://github.com/jsevellec/cassandra-unit) to integrate it in my tests. You don't need to configure any file paths; it works fine out of the box. I've set it up to drop and recreate my keyspace before each test case, and even then it performs quite good. Good luck, Tom On 12/1/11 5:36 PM, DE VITO Dominique wrote: Hi, I want to use Cassandra for (fast) unit testing with a small number of data. So, I imagined the Cassandra embedded server I plan to use would start faster and would be more portable (because no file path depending on OS), without disk storage mode (so, diskless if you want). Is there some no disk storage mode for Cassandra ? Thanks. Regards, Dominique
Re: garbage collecting tombstones
Tombstone is a marker indicating the record to be deleted. gc_grace_seconds is the time after which the record will be deleted physically from the node. There is no ad-hoc way of gc'ing tombstones. Only after gc_grace_seconds the tombstones will be gc'ed. Thanks, Jahangir Mohammed. On Thu, Dec 1, 2011 at 2:01 PM, A J s5a...@gmail.com wrote: Hello, Is 'garbage collecting tombstones ' a different operation than the JVM GC. Garbage collecting tombstones is controlled by gc_grace_seconds which by default is set to 10 days. But the traditional GC seems to happen much more frequently (when observed through jconsole) ? How can I force the garbage collecting tombstones to happen ad-hoc when I want to ? Thanks.
Re: Insufficient disk space to flush
Yes, mostly sounds like it. In our case failed repairs were causing accumulation of the tmp files. Thanks, Jahangir Mohammed. On Thu, Dec 1, 2011 at 2:43 PM, Alexandru Dan Sicoe sicoe.alexan...@googlemail.com wrote: Hi Jeremiah, My commitlog was indeed on another disk. I did what you said and yes the node restart brings back the disk size to the around 50 GB I was expecting. Still I do not understand how the node managed to get itself in the situation of having these tmp files? Could you clarify what these are, how they are produced and why? I've tried to find a clear definition but all I could come up with is hints that they are produced during compaction. I also found a thread that described a similar problem: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Errors-During-Compaction-td5953493.html as described there it seems like compaction fails and tmp files don't get cleaned up until they fill the disk. Is this what happened in my case? Compactions did not finish properly because the disk utilization was more than half and then more and more files tmp started getting accumulated at each other attempt. The Cassandra log would indicate this because I get many of these: ERROR [CompactionExecutor:22850] 2011-12-01 04:12:15,200 CompactionManager.java (line 513) insufficie nt space to compact even the two smallest files, aborting before I started getting many of these: ERROR [FlushWriter:283] 2011-12-01 04:12:22,917 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[FlushWriter:283,5,main] java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk space to flush 42531 bytes I just want to clearly understand what happened. Thanks, Alex On Thu, Dec 1, 2011 at 6:58 PM, Jeremiah Jordan jeremiah.jor...@morningstar.com wrote: If you are writing data with QUORUM or ALL you should be safe to restart cassandra on that node. If the extra space is all from *tmp* files from compaction they will get deleted at startup. You will then need to run repair on that node to get back any data that was missed while it was full. If your commit log was on a different device you may not even have lost much. -Jeremiah On 12/01/2011 04:16 AM, Alexandru Dan Sicoe wrote: Hello everyone, 4 node Cassandra 0.8.5 cluster with RF =2. One node started throwing exceptions in its log: ERROR 10:02:46,837 Fatal exception in thread Thread[FlushWriter:1317,5,main] java.lang.RuntimeException: java.lang.RuntimeException: Insufficient disk space to flush 17296 bytes at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.RuntimeException: Insufficient disk space to flush 17296 bytes at org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:714) at org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2301) at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246) at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49) at org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more Checked disk and obviously it's 100% full. How do I recover from this without loosing the data? I've got plenty of space on the other nodes, so I thought of doing a decommission which I understand reassigns ranges to the other nodes and replicates data to them. After that's done I plan on manually deleting the data on the node and then joining in the same cluster position with auto-bootstrap turned off so that I won't get back the old data and I can continue getting new data with the node. Note, I would like to have 4 nodes in because the other three barely take the input load alone. These are just long running tests until I get some better machines. On strange thing I found is that the data folder on the ndoe that filled up the disk is 150 GB (as measured with du) while the data folder on all other 3 nodes is 50 GB. At the same time, DataStax OpsCenter shows a size of around 50GB for all 4 nodes. I though that the node was making a major compaction at which time it filled up the diskbut even that doesn't make sense because shouldn't a major compaction just be capable of doubling the size, not triple-ing it? Doesn anyone know how to explain this behavior? Thanks, Alex
Re: garbage collecting tombstones
On Thu, Dec 1, 2011 at 2:45 PM, Jahangir Mohammed md.jahangi...@gmail.com wrote: Tombstone is a marker indicating the record to be deleted. gc_grace_seconds is the time after which the record will be deleted physically from the node. There is no ad-hoc way of gc'ing tombstones. Only after gc_grace_seconds the tombstones will be gc'ed. Actually, they'll be removed on the first compaction that occurs after gc_grace_seconds. On Thu, Dec 1, 2011 at 2:01 PM, A J s5a...@gmail.com wrote: Is 'garbage collecting tombstones ' a different operation than the JVM GC. Garbage collecting tombstones is controlled by gc_grace_seconds which by default is set to 10 days. But the traditional GC seems to happen much more frequently (when observed through jconsole) ? How can I force the garbage collecting tombstones to happen ad-hoc when I want to ? -- Eric Evans Acunu | http://www.acunu.com | @acunu
Re: cassandra read performance on large dataset
Our largest dataset has 1200 billion rows. Radim, out of curiosity, how many nodes is that running across? Bill On 28/11/11 13:44, Radim Kolar wrote: I understand that my computer may be not as powerful as those used in the other benchmarks, but it shouldn't be that far off (1:30), right? cassandra has very fast writes. you can have read:write ratios like 1:1000 pure read workload on 1 billion rows without key/row cache on 2 node cluster Running workload in 10 threads 1000 ops each. Workload took 88.59 seconds, thruput 112.88 ops/sec each node can do about 240 IOPS. Which means average 4 iops per read in cassandra on cold system. After OS cache warms enough to cache indirect seek blocks it gets faster to almost ideal: Workload took 79.76 seconds, thruput 200.59 ops/sec Ideal cassandra read performance is (without caches) is 2 IOPS per read - one io to read index, second to data. pure write workload: Running workload in 40 threads 10 ops each. Workload took 302.51 seconds, thruput 13222.62 ops/sec write is slow here because nodes are running out of memory most likely due to memory leaks in 1.0 branch. Also writes in this test are not batched. Cassandra is really awesome for its price tag. Getting similar numbers from Oracle will cost you way too much. For one 2 core Oracle licence suitable for processing large data you can get about 8 cassandra nodes - and dont forget that oracle needs some hardware too. Transactions are not always needed for data warehousing - if you are importing chunks of data, you do not need to do rollbacks, just schedule failed chunks for later processing. If you are able to code your app to work without transactions, cassandra is way to go. Hadoop and cassandra are very good products for working with large data basically for just price of learning new technology. Usually cassandra is deployed first, its easy to get it running and day-to-day operations are simple. Hadoop follows later after discovering that cassandra is not really suitable for large batch jobs because it needs random access for data reading. We finished processing migration from Commercional SQL to Hadoop/Cassa in 3 months, not only that it costs 10x less, we are able to process about 100 times larger datasets. Our largest dataset has 1200 billion rows. Problems with this setup are: bloom filters are using too much memory. they should be configurable for applications where read performance is unimportant node startup is really slow data loaded into cassandra are about 2 times bigger then CSV export. (not really problem, diskspace is cheap, but there is kinda high per row overhead) writing applications is harder then coding for SQL backend. Hadoop is way harder to use then cassandra. lack of good import/export tools for cassandra. especially lack of monitoring must have knowledge of workarounds for hadoop bugs. Hadoop is not easy to use efficiently. index overhead is too big (about 100% slower) compared to index overhead in SQL databases (about 20% slower) no delete over index repair is slow
Re: [RELEASE] Apache Cassandra 1.0.5 released
+1 After upgrade to 1.0.5 also have Timeout exception on Secondary Index search (get_indexed_slices API) .
Re: [RELEASE] Apache Cassandra 1.0.5 released
After upgrade to 1.0.5 RangeSlice got timeout. Ticket https://issues.apache.org/jira/browse/CASSANDRA-3551 On Dec 1, 2011, at 5:43 PM, Evgeniy Ryabitskiy wrote: +1 After upgrade to 1.0.5 also have Timeout exception on Secondary Index search (get_indexed_slices API) .
Re: Hinted handoff bug?
Sorry for not checking source to see if things have changed but i just remembered an issue I have forgotten to make jira for. In old days, nodes would periodically try to deliver queues. However, this was at some stage changed so it only deliver if a node is being marked up. However, you can definitely have a scenario where A fails to deliver to B so it send the hint to C instead. However, B is not really down, it just could not accept that packet at that time and C always (correctly in this case) thinks B is up and it never tries to deliver the hints to B. Will this change fix this, or do we need to get back the thread that periodically tried to deliver hints regardless of node status changes? Regards, Terje On 1 Dec 2011, at 19:10, Sylvain Lebresne sylv...@datastax.com wrote: You're right, good catch. Do you mind opening a ticket on jira (https://issues.apache.org/jira/browse/CASSANDRA)? -- Sylvain On Thu, Dec 1, 2011 at 10:03 AM, Fredrik L Stigbäck fredrik.l.stigb...@sitevision.se wrote: Hi, We,re running cassandra 1.0.3. I've done some testing with 2 nodes (node A, node B), replication factor 2. I take node A down, writing some data to node B and then take node A up. Sometimes hints aren't delivered when node A comes up. I've done some debugging in org.apache.cassandra.db.HintedHandOffManager and sometimes node B ends up in a strange state in method org.apache.cassandra.db.HintedHandOffManager.deliverHints(final InetAddress to), where org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries already has node A in it's Set and therefore no hints will ever be delivered to node A. The only reason for this that I can see is that in org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(InetAddress endpoint) the hintStore.isEmpty() check returns true and the endpoint (node A) isn't removed from org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries. Then no hints will ever be delivered again until node B is restarted. During what conditions will hintStore.isEmpty() return true? Shouldn't the hintStore.isEmpty() check be inside the try {} finally{} clause, removing the endpoint from queuedDeliveries in the finally block? public void deliverHints(final InetAddress to) { logger_.debug(deliverHints to {}, to); if (!queuedDeliveries.add(to)) return; ... } private void deliverHintsToEndpoint(InetAddress endpoint) throws IOException, DigestMismatchException, InvalidRequestException, TimeoutException, { ColumnFamilyStore hintStore = Table.open(Table.SYSTEM_TABLE).getColumnFamilyStore(HINTS_CF); if (hintStore.isEmpty()) return; // nothing to do, don't confuse users by logging a no-op handoff try { .. } finally { queuedDeliveries.remove(endpoint); } } Regards /Fredrik
Re: is there a no disk storage mode ?
if you're in java land there's a maven plugin - http://mojo.codehaus.org/cassandra-maven-plugin/ On 12/1/2011 12:13 PM, Tom van den Berge wrote: Hi Dominique, I don't think there is a way to run cassandra without disk storage. But running it embedded can be very useful for unit testing. I'm using cassandra-unit (https://github.com/jsevellec/cassandra-unit) to integrate it in my tests. You don't need to configure any file paths; it works fine out of the box. I've set it up to drop and recreate my keyspace before each test case, and even then it performs quite good. Good luck, Tom On 12/1/11 5:36 PM, DE VITO Dominique wrote: Hi, I want to use Cassandra for (fast) unit testing with a small number of data. So, I imagined the Cassandra embedded server I plan to use would start faster and would be more portable (because no file path depending on OS), without disk storage mode (so, diskless if you want). Is there some no disk storage mode for Cassandra ? Thanks. Regards, Dominique
Re: Hinted handoff bug?
Nope, that's a separate issue. https://issues.apache.org/jira/browse/CASSANDRA-3554 On Thu, Dec 1, 2011 at 5:59 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: Sorry for not checking source to see if things have changed but i just remembered an issue I have forgotten to make jira for. In old days, nodes would periodically try to deliver queues. However, this was at some stage changed so it only deliver if a node is being marked up. However, you can definitely have a scenario where A fails to deliver to B so it send the hint to C instead. However, B is not really down, it just could not accept that packet at that time and C always (correctly in this case) thinks B is up and it never tries to deliver the hints to B. Will this change fix this, or do we need to get back the thread that periodically tried to deliver hints regardless of node status changes? Regards, Terje On 1 Dec 2011, at 19:10, Sylvain Lebresne sylv...@datastax.com wrote: You're right, good catch. Do you mind opening a ticket on jira (https://issues.apache.org/jira/browse/CASSANDRA)? -- Sylvain On Thu, Dec 1, 2011 at 10:03 AM, Fredrik L Stigbäck fredrik.l.stigb...@sitevision.se wrote: Hi, We,re running cassandra 1.0.3. I've done some testing with 2 nodes (node A, node B), replication factor 2. I take node A down, writing some data to node B and then take node A up. Sometimes hints aren't delivered when node A comes up. I've done some debugging in org.apache.cassandra.db.HintedHandOffManager and sometimes node B ends up in a strange state in method org.apache.cassandra.db.HintedHandOffManager.deliverHints(final InetAddress to), where org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries already has node A in it's Set and therefore no hints will ever be delivered to node A. The only reason for this that I can see is that in org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(InetAddress endpoint) the hintStore.isEmpty() check returns true and the endpoint (node A) isn't removed from org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries. Then no hints will ever be delivered again until node B is restarted. During what conditions will hintStore.isEmpty() return true? Shouldn't the hintStore.isEmpty() check be inside the try {} finally{} clause, removing the endpoint from queuedDeliveries in the finally block? public void deliverHints(final InetAddress to) { logger_.debug(deliverHints to {}, to); if (!queuedDeliveries.add(to)) return; ... } private void deliverHintsToEndpoint(InetAddress endpoint) throws IOException, DigestMismatchException, InvalidRequestException, TimeoutException, { ColumnFamilyStore hintStore = Table.open(Table.SYSTEM_TABLE).getColumnFamilyStore(HINTS_CF); if (hintStore.isEmpty()) return; // nothing to do, don't confuse users by logging a no-op handoff try { .. } finally { queuedDeliveries.remove(endpoint); } } Regards /Fredrik -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Leveled Compaction in cassandra1.0 may not be perfect
Hello,everyone! In this doc(http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra),I found a conclusion Only enough space for 10x the sstable size needs to be reserved for temporary use by compaction.I don't know how can we got this conclusion,but I guess the author of this doc may got this conclusion by one compaction rule from levelDB.In one doc of levelDB(http://leveldb.googlecode.com/svn/trunk/doc/impl.html),I found this rule We also switch to a new output file when the key range of the current output file has grown enough to overlap more then ten level-(L+2) files.As this compaction rule descripting,10x the sstable size is enough for every compaction.Unfortunatly,cassandra1.0 may not implement this compaction rule,so I think this conclusion may be arbitrary.Everyone,what do you think about it? Of course,implement this compaction rule may not be hard,but this implementation may cause another problem.Many small sstables which overlap just 10 sstables with next level may be generated in compaction,especially when we use RandomPartitioner.This may cause many compactions when these small sstables have to uplevel to next level.In my practice,I write 120g data to one cassandra node,and cassandra node spent 24 hourse to compacte this data by Leveled Compaction.So,I don't think Leveled Compaction is perfect.What do you think about it,my friends?