Re: decommissioning a cassandra node
As I see the state 162.243.109.94 is UL(Up/Leaving) so maybe this is causing the problem. On Sunday, October 26, 2014 11:57 PM, Tim Dunphy bluethu...@gmail.com wrote: Hey all, I'm trying to decommission a node. First I'm getting a status: [root@beta-new:/usr/local] #nodetool statusNote: Ownership information does not include topology; for complete information, specify a keyspaceDatacenter: datacenter1===Status=Up/Down|/ State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns Host ID RackUN 162.243.86.41 1.08 MB 1 0.1% e945f3b5-2e3e-4a20-b1bd-e30c474a7634 rack1UL 162.243.109.94 1.28 MB 256 99.9% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 But when I try to decommission the node I get this message: [root@beta-new:/usr/local] #nodetool -h 162.243.86.41 decommissionnodetool: Failed to connect to '162.243.86.41:7199' - NoSuchObjectException: 'no such object in table'. Yet I can telnet to that host on that port just fine: [root@beta-new:/usr/local] #telnet 162.243.86.41 7199Trying 162.243.86.41...Connected to 162.243.86.41.Escape character is '^]'. And I have verified that cassandra is running and accessible via cqlsh on the other machine. What could be going wrong? ThanksTim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Re: Empty cqlsh cells vs. null
Tyler, I see. That explains it. Any chance you might know how the Datastax Java driver behaves for this (odd) case? Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Friday, Oct 24, 2014 at 6:24 pm, Tyler Hobbs ty...@datastax.com, wrote: On Fri, Oct 24, 2014 at 6:38 AM, Jens Rantil jens.ran...@tink.se wrote: Just to clarify, I am seeing three types of output for an int field. It’s either: * Empty output. Nothing. Nil. Also ‘’. * An integer written in green. Regexp: [0-9]+ * Explicitly ‘null’ written in red letters. Some types (including ints) accept an empty string/ByteBuffer as a valid value. This is distinct from null, or no cell being present. This behavior is primarily a legacy from the Thrift days. -- Tyler Hobbs DataStax
Re: Empty cqlsh cells vs. null
On Mon, Oct 27, 2014 at 11:05 AM, Jens Rantil jens.ran...@tink.se wrote: Tyler, I see. That explains it. Any chance you might know how the Datastax Java driver behaves for this (odd) case? The Row.getInt() method will do as for nulls and return 0 (though of course, the Row.isNull() method will return false). If you want to explicitely check if it's an empty value, you'll have to use getBytesUnsafe(). Long story short, unless you like suffering for no reason, don't insert empty values for types for which it doesn't make sense. -- Sylvain Cheers, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter On Friday, Oct 24, 2014 at 6:24 pm, Tyler Hobbs ty...@datastax.com, wrote: On Fri, Oct 24, 2014 at 6:38 AM, Jens Rantil jens.ran...@tink.se wrote: Just to clarify, I am seeing three types of output for an int field. It’s either: * Empty output. Nothing. Nil. Also ‘’. * An integer written in green. Regexp: [0-9]+ * Explicitly ‘null’ written in red letters. Some types (including ints) accept an empty string/ByteBuffer as a valid value. This is distinct from null, or no cell being present. This behavior is primarily a legacy from the Thrift days. -- Tyler Hobbs DataStax http://datastax.com/
Hector latency related configuration
Hi all, We're using Hector in one of our older use cases with C* 1.0.9. We suspect it increases our total round trip write latency to Cassandra. C* metrics shows low latency so we assume the problem is somewhere else. What are the configuration parameters you would recommend to investigate/change in order to decrease latency. -- Or Sher
Re: decommissioning a cassandra node
As I see the state 162.243.109.94 is UL(Up/Leaving) so maybe this is causing the problem OK, that's an interesting observation.How do you fix a node that is an UL state? What causes this? Also, is there any document that explains what all the nodetool abbreviations (UN, UL) stand for? On Mon, Oct 27, 2014 at 5:46 AM, jivko donev jivko_...@yahoo.com wrote: As I see the state 162.243.109.94 is UL(Up/Leaving) so maybe this is causing the problem. On Sunday, October 26, 2014 11:57 PM, Tim Dunphy bluethu...@gmail.com wrote: Hey all, I'm trying to decommission a node. First I'm getting a status: [root@beta-new:/usr/local] #nodetool status Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 162.243.86.41 1.08 MB1 0.1% e945f3b5-2e3e-4a20-b1bd-e30c474a7634 rack1 UL 162.243.109.94 1.28 MB256 99.9% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 But when I try to decommission the node I get this message: [root@beta-new:/usr/local] #nodetool -h 162.243.86.41 decommission nodetool: Failed to connect to '162.243.86.41:7199' - NoSuchObjectException: 'no such object in table'. Yet I can telnet to that host on that port just fine: [root@beta-new:/usr/local] #telnet 162.243.86.41 7199 Trying 162.243.86.41... Connected to 162.243.86.41. Escape character is '^]'. And I have verified that cassandra is running and accessible via cqlsh on the other machine. What could be going wrong? Thanks Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Re: decommissioning a cassandra node
Also, is there any document that explains what all the nodetool abbreviations (UN, UL) stand for? -- The documentation is in the command output itself Datacenter: datacenter1 === *Status=Up/Down* *|/ State=Normal/Leaving/Joining/Moving* -- Address Load Tokens OwnsHost ID Rack UN 162.243.86.41 1.08 MB1 0.1% e945f3b5-2e3e-4a20-b1bd-e30c474a7634 rack1 UL 162.243.109.94 1.28 MB256 99.9% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 U = Up, D = Down N = Normal, L = Leaving, J = Joining and M = Moving On Mon, Oct 27, 2014 at 2:42 PM, Tim Dunphy bluethu...@gmail.com wrote: As I see the state 162.243.109.94 is UL(Up/Leaving) so maybe this is causing the problem OK, that's an interesting observation.How do you fix a node that is an UL state? What causes this? Also, is there any document that explains what all the nodetool abbreviations (UN, UL) stand for? On Mon, Oct 27, 2014 at 5:46 AM, jivko donev jivko_...@yahoo.com wrote: As I see the state 162.243.109.94 is UL(Up/Leaving) so maybe this is causing the problem. On Sunday, October 26, 2014 11:57 PM, Tim Dunphy bluethu...@gmail.com wrote: Hey all, I'm trying to decommission a node. First I'm getting a status: [root@beta-new:/usr/local] #nodetool status Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 162.243.86.41 1.08 MB1 0.1% e945f3b5-2e3e-4a20-b1bd-e30c474a7634 rack1 UL 162.243.109.94 1.28 MB256 99.9% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 But when I try to decommission the node I get this message: [root@beta-new:/usr/local] #nodetool -h 162.243.86.41 decommission nodetool: Failed to connect to '162.243.86.41:7199' - NoSuchObjectException: 'no such object in table'. Yet I can telnet to that host on that port just fine: [root@beta-new:/usr/local] #telnet 162.243.86.41 7199 Trying 162.243.86.41... Connected to 162.243.86.41. Escape character is '^]'. And I have verified that cassandra is running and accessible via cqlsh on the other machine. What could be going wrong? Thanks Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Re: decommissioning a cassandra node
Also, is there any document that explains what all the nodetool abbreviations (UN, UL) stand for? -- The documentation is in the command output itself Datacenter: datacenter1 === *Status=Up/Down* *|/ State=Normal/Leaving/Joining/Moving*-- Address Load Tokens OwnsHost ID Rack UN 162.243.86.41 1.08 MB1 0.1% e945f3b5-2e3e-4a20-b1bd-e30c474a7634 rack1 UL 162.243.109.94 1.28 MB256 99.9% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 U = Up, D = Down N = Normal, L = Leaving, J = Joining and M = Moving Ok, got it, thanks! Can someone suggest a good way to fix a node that is in an UL state? Thanks Tim On Mon, Oct 27, 2014 at 9:46 AM, DuyHai Doan doanduy...@gmail.com wrote: Also, is there any document that explains what all the nodetool abbreviations (UN, UL) stand for? -- The documentation is in the command output itself Datacenter: datacenter1 === *Status=Up/Down* *|/ State=Normal/Leaving/Joining/Moving* -- Address Load Tokens OwnsHost ID Rack UN 162.243.86.41 1.08 MB1 0.1% e945f3b5-2e3e-4a20-b1bd-e30c474a7634 rack1 UL 162.243.109.94 1.28 MB256 99.9% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 U = Up, D = Down N = Normal, L = Leaving, J = Joining and M = Moving On Mon, Oct 27, 2014 at 2:42 PM, Tim Dunphy bluethu...@gmail.com wrote: As I see the state 162.243.109.94 is UL(Up/Leaving) so maybe this is causing the problem OK, that's an interesting observation.How do you fix a node that is an UL state? What causes this? Also, is there any document that explains what all the nodetool abbreviations (UN, UL) stand for? On Mon, Oct 27, 2014 at 5:46 AM, jivko donev jivko_...@yahoo.com wrote: As I see the state 162.243.109.94 is UL(Up/Leaving) so maybe this is causing the problem. On Sunday, October 26, 2014 11:57 PM, Tim Dunphy bluethu...@gmail.com wrote: Hey all, I'm trying to decommission a node. First I'm getting a status: [root@beta-new:/usr/local] #nodetool status Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 162.243.86.41 1.08 MB1 0.1% e945f3b5-2e3e-4a20-b1bd-e30c474a7634 rack1 UL 162.243.109.94 1.28 MB256 99.9% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 But when I try to decommission the node I get this message: [root@beta-new:/usr/local] #nodetool -h 162.243.86.41 decommission nodetool: Failed to connect to '162.243.86.41:7199' - NoSuchObjectException: 'no such object in table'. Yet I can telnet to that host on that port just fine: [root@beta-new:/usr/local] #telnet 162.243.86.41 7199 Trying 162.243.86.41... Connected to 162.243.86.41. Escape character is '^]'. And I have verified that cassandra is running and accessible via cqlsh on the other machine. What could be going wrong? Thanks Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Re: decommissioning a cassandra node
Hi Tim, The node with IP 94 is leaving. Maybe something wrong happens during streaming data. You could use nodetool netstats on both nodes to monitor if there is any streaming connection stuck. Indeed, you could force remove the leaving node by shutting down it directly. Then, perform nodetool removenode to remove dead node. But you should understand you're taking the risk to lose data if your RF in cluster is lower than 3 and data have not been fully synced. Therefore, remember to sync data using repair before you're going to remove/decommission the node in cluster. Thanks! On Mon, Oct 27, 2014 at 9:55 PM, Tim Dunphy bluethu...@gmail.com wrote: Also, is there any document that explains what all the nodetool abbreviations (UN, UL) stand for? -- The documentation is in the command output itself Datacenter: datacenter1 === *Status=Up/Down* *|/ State=Normal/Leaving/Joining/Moving*-- Address Load Tokens OwnsHost ID Rack UN 162.243.86.41 1.08 MB1 0.1% e945f3b5-2e3e-4a20-b1bd-e30c474a7634 rack1 UL 162.243.109.94 1.28 MB256 99.9% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 U = Up, D = Down N = Normal, L = Leaving, J = Joining and M = Moving Ok, got it, thanks! Can someone suggest a good way to fix a node that is in an UL state? Thanks Tim On Mon, Oct 27, 2014 at 9:46 AM, DuyHai Doan doanduy...@gmail.com wrote: Also, is there any document that explains what all the nodetool abbreviations (UN, UL) stand for? -- The documentation is in the command output itself Datacenter: datacenter1 === *Status=Up/Down* *|/ State=Normal/Leaving/Joining/Moving* -- Address Load Tokens OwnsHost ID Rack UN 162.243.86.41 1.08 MB1 0.1% e945f3b5-2e3e-4a20-b1bd-e30c474a7634 rack1 UL 162.243.109.94 1.28 MB256 99.9% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 U = Up, D = Down N = Normal, L = Leaving, J = Joining and M = Moving On Mon, Oct 27, 2014 at 2:42 PM, Tim Dunphy bluethu...@gmail.com wrote: As I see the state 162.243.109.94 is UL(Up/Leaving) so maybe this is causing the problem OK, that's an interesting observation.How do you fix a node that is an UL state? What causes this? Also, is there any document that explains what all the nodetool abbreviations (UN, UL) stand for? On Mon, Oct 27, 2014 at 5:46 AM, jivko donev jivko_...@yahoo.com wrote: As I see the state 162.243.109.94 is UL(Up/Leaving) so maybe this is causing the problem. On Sunday, October 26, 2014 11:57 PM, Tim Dunphy bluethu...@gmail.com wrote: Hey all, I'm trying to decommission a node. First I'm getting a status: [root@beta-new:/usr/local] #nodetool status Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 162.243.86.41 1.08 MB1 0.1% e945f3b5-2e3e-4a20-b1bd-e30c474a7634 rack1 UL 162.243.109.94 1.28 MB256 99.9% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 But when I try to decommission the node I get this message: [root@beta-new:/usr/local] #nodetool -h 162.243.86.41 decommission nodetool: Failed to connect to '162.243.86.41:7199' - NoSuchObjectException: 'no such object in table'. Yet I can telnet to that host on that port just fine: [root@beta-new:/usr/local] #telnet 162.243.86.41 7199 Trying 162.243.86.41... Connected to 162.243.86.41. Escape character is '^]'. And I have verified that cassandra is running and accessible via cqlsh on the other machine. What could be going wrong? Thanks Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Re: decommissioning a cassandra node
The node with IP 94 is leaving. Maybe something wrong happens during streaming data. You could use nodetool netstats on both nodes to monitor if there is any streaming connection stuck. Indeed, you could force remove the leaving node by shutting down it directly. Then, perform nodetool removenode to remove dead node. But you should understand you're taking the risk to lose data if your RF in cluster is lower than 3 and data have not been fully synced. Therefore, remember to sync data using repair before you're going to remove/decommission the node in cluster. Hi Colin, Ok that's good advice. Thanks for your help! I'll give that a shot and see what I can do. Thanks Tim On Mon, Oct 27, 2014 at 11:17 AM, Colin Kuo colinkuo...@gmail.com wrote: Hi Tim, The node with IP 94 is leaving. Maybe something wrong happens during streaming data. You could use nodetool netstats on both nodes to monitor if there is any streaming connection stuck. Indeed, you could force remove the leaving node by shutting down it directly. Then, perform nodetool removenode to remove dead node. But you should understand you're taking the risk to lose data if your RF in cluster is lower than 3 and data have not been fully synced. Therefore, remember to sync data using repair before you're going to remove/decommission the node in cluster. Thanks! On Mon, Oct 27, 2014 at 9:55 PM, Tim Dunphy bluethu...@gmail.com wrote: Also, is there any document that explains what all the nodetool abbreviations (UN, UL) stand for? -- The documentation is in the command output itself Datacenter: datacenter1 === *Status=Up/Down* *|/ State=Normal/Leaving/Joining/Moving*-- Address Load Tokens OwnsHost ID Rack UN 162.243.86.41 1.08 MB1 0.1% e945f3b5-2e3e-4a20-b1bd-e30c474a7634 rack1 UL 162.243.109.94 1.28 MB256 99.9% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 U = Up, D = Down N = Normal, L = Leaving, J = Joining and M = Moving Ok, got it, thanks! Can someone suggest a good way to fix a node that is in an UL state? Thanks Tim On Mon, Oct 27, 2014 at 9:46 AM, DuyHai Doan doanduy...@gmail.com wrote: Also, is there any document that explains what all the nodetool abbreviations (UN, UL) stand for? -- The documentation is in the command output itself Datacenter: datacenter1 === *Status=Up/Down* *|/ State=Normal/Leaving/Joining/Moving* -- Address Load Tokens OwnsHost ID Rack UN 162.243.86.41 1.08 MB1 0.1% e945f3b5-2e3e-4a20-b1bd-e30c474a7634 rack1 UL 162.243.109.94 1.28 MB256 99.9% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 U = Up, D = Down N = Normal, L = Leaving, J = Joining and M = Moving On Mon, Oct 27, 2014 at 2:42 PM, Tim Dunphy bluethu...@gmail.com wrote: As I see the state 162.243.109.94 is UL(Up/Leaving) so maybe this is causing the problem OK, that's an interesting observation.How do you fix a node that is an UL state? What causes this? Also, is there any document that explains what all the nodetool abbreviations (UN, UL) stand for? On Mon, Oct 27, 2014 at 5:46 AM, jivko donev jivko_...@yahoo.com wrote: As I see the state 162.243.109.94 is UL(Up/Leaving) so maybe this is causing the problem. On Sunday, October 26, 2014 11:57 PM, Tim Dunphy bluethu...@gmail.com wrote: Hey all, I'm trying to decommission a node. First I'm getting a status: [root@beta-new:/usr/local] #nodetool status Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens OwnsHost ID Rack UN 162.243.86.41 1.08 MB1 0.1% e945f3b5-2e3e-4a20-b1bd-e30c474a7634 rack1 UL 162.243.109.94 1.28 MB256 99.9% fd2f76ae-8dcf-4e93-a37f-bf1e9088696e rack1 But when I try to decommission the node I get this message: [root@beta-new:/usr/local] #nodetool -h 162.243.86.41 decommission nodetool: Failed to connect to '162.243.86.41:7199' - NoSuchObjectException: 'no such object in table'. Yet I can telnet to that host on that port just fine: [root@beta-new:/usr/local] #telnet 162.243.86.41 7199 Trying 162.243.86.41... Connected to 162.243.86.41. Escape character is '^]'. And I have verified that cassandra is running and accessible via cqlsh on the other machine. What could be going wrong? Thanks Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Re: Hector latency related configuration
Hi, What version of Hector are you using? Probably start with different consistency level? Does your node in cluster having memory pressure (you can check in cassandra system log)? what is the average node load per node currently? Also read concurrent_writes in cassandra.yaml if you can increase higher. You can also use nodetool cfstats to read for the write latency. Thanks. Jason On Mon, Oct 27, 2014 at 8:45 PM, Or Sher or.sh...@gmail.com wrote: Hi all, We're using Hector in one of our older use cases with C* 1.0.9. We suspect it increases our total round trip write latency to Cassandra. C* metrics shows low latency so we assume the problem is somewhere else. What are the configuration parameters you would recommend to investigate/change in order to decrease latency. -- Or Sher
Re: Multi Datacenter / MultiRegion on AWS Best practice ?
Hi guys, any feedback on this could be very useful for me, and I guess for more people out there. 2014-10-23 11:16 GMT+02:00 Alain RODRIGUEZ arodr...@gmail.com: Hi, We are currently wondering about the best way to configure network architecture to have a Cassandra cluster multi DC. Reading previous messages on this mailing list, I see 2 main ways to do this: 1 - 2 private VPC, joined by a VPN tunnel linking 2 regions. C* using EC2Snitch (or PropertyFileSnitch) and private IPs. 2 - 2 public VPC. C* using EC2MultiRegionSnitch (and so public IPs for seeds and broadcast, private for listen address). On solution one we are not confident on VPN tunnel about stability and performances, the rest should work just fine. On solution 2, we would need to open IPs one by one on 3 ports (7000, 9042, 9160) at least. 100 entries in a security group would allow us to have a maximum of ~30 nodes. An other issuer is that a ring describe (using astyanax let's say) would also give to clients public IPs, our clients which are also inside the VPC, would have to go to the internet before coming back to VPC, creating unnecessary latencies. What are your advices regarding best practices for a multiDC (cross region) inside AWS cloud ? And by the way, how to configure Astyanax when using EC2MultiRegionSnitch (and public IP for broadcasting) to use private IPs instead of public ones ? Alain
Re: Multi Datacenter / MultiRegion on AWS Best practice ?
Hi! 2014-10-23 11:16 GMT+02:00 Alain RODRIGUEZ arodr...@gmail.com: We are currently wondering about the best way to configure network architecture to have a Cassandra cluster multi DC. On solution 2, we would need to open IPs one by one on 3 ports (7000, 9042, 9160) at least. 100 entries in a security group would allow us to have a maximum of ~30 nodes You can also allow those ports from everywhere and then use local iptables to limit the access to only those IPs which you are actually using. You'll most certainly need some kind of configuration management system for this (Chef, puppet, salt-stack etc).
Re: OOM at Bootstrap Time
Again, from our experience w 2.0.x: Revert to the defaults - you are manually setting heap way too high IMHO. On our small nodes we tried LCS - way too much compaction - switch all CFs to STCS. We do a major rolling compaction on our small nodes weekly during less busy hours - works great. Be sure you have enough disk. We never explicitly delete and only use ttls or truncation. You can set GC to 0 in that case, so tombstones are more readily expunged. There are a couple threads in the list that discuss this... also normal rolling repair becomes optional, reducing load (still repair if something unusual happens tho...). In your current situation, you need to kickstart compaction - are there any CFs you can truncate at least temporarily? Then try compacting a small CF, then another, etc. Hopefully you can get enough headroom to add a node. ml On Sun, Oct 26, 2014 at 6:24 PM, Maxime maxim...@gmail.com wrote: Hmm, thanks for the reading. I initially followed some (perhaps too old) maintenance scripts, which included weekly 'nodetool compact'. Is there a way for me to undo the damage? Tombstones will be a very important issue for me since the dataset is very much a rolling dataset using TTLs heavily. On Sun, Oct 26, 2014 at 6:04 PM, DuyHai Doan doanduy...@gmail.com wrote: Should doing a major compaction on those nodes lead to a restructuration of the SSTables? -- Beware of the major compaction on SizeTiered, it will create 2 giant SSTables and the expired/outdated/tombstone columns in this big file will be never cleaned since the SSTable will never get a chance to be compacted again Essentially to reduce the fragmentation of small SSTables you can stay with SizeTiered compaction and play around with compaction properties (the thresholds) to make C* group a bunch of files each time it compacts so that the file number shrinks to a reasonable count Since you're using C* 2.1 and anti-compaction has been introduced, I hesitate advising you to use Leveled compaction as a work-around to reduce SSTable count. Things are a little bit more complicated because of the incremental repair process (I don't know whether you're using incremental repair or not in production). The Dev blog says that Leveled compaction is performed only on repaired SSTables, the un-repaired ones still use SizeTiered, more details here: http://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1 Regards On Sun, Oct 26, 2014 at 9:44 PM, Jonathan Haddad j...@jonhaddad.com wrote: If the issue is related to I/O, you're going to want to determine if you're saturated. Take a look at `iostat -dmx 1`, you'll see avgqu-sz (queue size) and svctm, (service time).The higher those numbers are, the most overwhelmed your disk is. On Sun, Oct 26, 2014 at 12:01 PM, DuyHai Doan doanduy...@gmail.com wrote: Hello Maxime Increasing the flush writers won't help if your disk I/O is not keeping up. I've had a look into the log file, below are some remarks: 1) There are a lot of SSTables on disk for some tables (events for example, but not only). I've seen that some compactions are taking up to 32 SSTables (which corresponds to the default max value for SizeTiered compaction). 2) There is a secondary index that I found suspicious : loc.loc_id_idx. As its name implies I have the impression that it's an index on the id of the loc which would lead to almost an 1-1 relationship between the indexed value and the original loc. Such index should be avoided because they do not perform well. If it's not an index on the loc_id, please disregard my remark 3) There is a clear imbalance of SSTable count on some nodes. In the log, I saw: INFO [STREAM-IN-/...20] 2014-10-25 02:21:43,360 StreamResultFuture.java:166 - [Stream #a6e54ea0-5bed-11e4-8df5-f357715e1a79 ID#0] Prepare completed. Receiving 163 files(4 111 187 195 bytes), sending 0 files(0 bytes) INFO [STREAM-IN-/...81] 2014-10-25 02:21:46,121 StreamResultFuture.java:166 - [Stream #a6e54ea0-5bed-11e4-8df5-f357715e1a79 ID#0] Prepare completed. Receiving 154 files(3 332 779 920 bytes), sending 0 files(0 bytes) INFO [STREAM-IN-/...71] 2014-10-25 02:21:50,494 StreamResultFuture.java:166 - [Stream #a6e54ea0-5bed-11e4-8df5-f357715e1a79 ID#0] Prepare completed. Receiving 1315 files(4 606 316 933 bytes), sending 0 files(0 bytes) INFO [STREAM-IN-/...217] 2014-10-25 02:21:51,036 StreamResultFuture.java:166 - [Stream #a6e54ea0-5bed-11e4-8df5-f357715e1a79 ID#0] Prepare completed. Receiving 1640 files(3 208 023 573 bytes), sending 0 files(0 bytes) As you can see, the existing 4 nodes are streaming data to the new node and on average the data set size is about 3.3 - 4.5 Gb. However the number of SSTables is around 150 files for nodes ...20 and ...81 but goes through the roof to reach 1315 files for
Re: OOM at Bootstrap Time
Tombstones will be a very important issue for me since the dataset is very much a rolling dataset using TTLs heavily. -- You can try the new DateTiered compaction strategy ( https://issues.apache.org/jira/browse/CASSANDRA-6602) released on 2.1.1 if you have a time series data model to eliminate tombstones On Mon, Oct 27, 2014 at 5:47 PM, Laing, Michael michael.la...@nytimes.com wrote: Again, from our experience w 2.0.x: Revert to the defaults - you are manually setting heap way too high IMHO. On our small nodes we tried LCS - way too much compaction - switch all CFs to STCS. We do a major rolling compaction on our small nodes weekly during less busy hours - works great. Be sure you have enough disk. We never explicitly delete and only use ttls or truncation. You can set GC to 0 in that case, so tombstones are more readily expunged. There are a couple threads in the list that discuss this... also normal rolling repair becomes optional, reducing load (still repair if something unusual happens tho...). In your current situation, you need to kickstart compaction - are there any CFs you can truncate at least temporarily? Then try compacting a small CF, then another, etc. Hopefully you can get enough headroom to add a node. ml On Sun, Oct 26, 2014 at 6:24 PM, Maxime maxim...@gmail.com wrote: Hmm, thanks for the reading. I initially followed some (perhaps too old) maintenance scripts, which included weekly 'nodetool compact'. Is there a way for me to undo the damage? Tombstones will be a very important issue for me since the dataset is very much a rolling dataset using TTLs heavily. On Sun, Oct 26, 2014 at 6:04 PM, DuyHai Doan doanduy...@gmail.com wrote: Should doing a major compaction on those nodes lead to a restructuration of the SSTables? -- Beware of the major compaction on SizeTiered, it will create 2 giant SSTables and the expired/outdated/tombstone columns in this big file will be never cleaned since the SSTable will never get a chance to be compacted again Essentially to reduce the fragmentation of small SSTables you can stay with SizeTiered compaction and play around with compaction properties (the thresholds) to make C* group a bunch of files each time it compacts so that the file number shrinks to a reasonable count Since you're using C* 2.1 and anti-compaction has been introduced, I hesitate advising you to use Leveled compaction as a work-around to reduce SSTable count. Things are a little bit more complicated because of the incremental repair process (I don't know whether you're using incremental repair or not in production). The Dev blog says that Leveled compaction is performed only on repaired SSTables, the un-repaired ones still use SizeTiered, more details here: http://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1 Regards On Sun, Oct 26, 2014 at 9:44 PM, Jonathan Haddad j...@jonhaddad.com wrote: If the issue is related to I/O, you're going to want to determine if you're saturated. Take a look at `iostat -dmx 1`, you'll see avgqu-sz (queue size) and svctm, (service time).The higher those numbers are, the most overwhelmed your disk is. On Sun, Oct 26, 2014 at 12:01 PM, DuyHai Doan doanduy...@gmail.com wrote: Hello Maxime Increasing the flush writers won't help if your disk I/O is not keeping up. I've had a look into the log file, below are some remarks: 1) There are a lot of SSTables on disk for some tables (events for example, but not only). I've seen that some compactions are taking up to 32 SSTables (which corresponds to the default max value for SizeTiered compaction). 2) There is a secondary index that I found suspicious : loc.loc_id_idx. As its name implies I have the impression that it's an index on the id of the loc which would lead to almost an 1-1 relationship between the indexed value and the original loc. Such index should be avoided because they do not perform well. If it's not an index on the loc_id, please disregard my remark 3) There is a clear imbalance of SSTable count on some nodes. In the log, I saw: INFO [STREAM-IN-/...20] 2014-10-25 02:21:43,360 StreamResultFuture.java:166 - [Stream #a6e54ea0-5bed-11e4-8df5-f357715e1a79 ID#0] Prepare completed. Receiving 163 files(4 111 187 195 bytes), sending 0 files(0 bytes) INFO [STREAM-IN-/...81] 2014-10-25 02:21:46,121 StreamResultFuture.java:166 - [Stream #a6e54ea0-5bed-11e4-8df5-f357715e1a79 ID#0] Prepare completed. Receiving 154 files(3 332 779 920 bytes), sending 0 files(0 bytes) INFO [STREAM-IN-/...71] 2014-10-25 02:21:50,494 StreamResultFuture.java:166 - [Stream #a6e54ea0-5bed-11e4-8df5-f357715e1a79 ID#0] Prepare completed. Receiving 1315 files(4 606 316 933 bytes), sending 0 files(0 bytes) INFO [STREAM-IN-/...217] 2014-10-25 02:21:51,036 StreamResultFuture.java:166 - [Stream
Why RDD is not cached?
Hi, I have a standalone spark , where the executor is set to have 6.3 G memory , as I am using two workers so in total there 12.6 G memory and 4 cores. I am trying to cache a RDD with approximate size of 3.2 G, but apparently it is not cached as neither I can seeBlockManagerMasterActor: Added rdd_XX in memory nor the performance of running the tasks is improved But, why it is not cached when there is enough memory storage? I tried with smaller RDDs. 1 or 2 G and it works, at least I could see BlockManagerMasterActor: Added rdd_0_1 in memory and improvement in results. Any idea what I am missing in my settings, or... ? thanks, /Shahab
Re: Why RDD is not cached?
On Mon, Oct 27, 2014 at 12:17 PM, shahab shahab.mok...@gmail.com wrote: I have a standalone spark , where the executor is set to have 6.3 G memory , as I am using two workers so in total there 12.6 G memory and 4 cores. Did you intend to mail the Apache Spark mailing list, instead of the Apache Cassandra User mailing list? =Rob
Repair/Compaction Completion Confirmation
Hello, I am looking to change how we trigger maintenance operations in our C* clusters. The end goal is to schedule and run the jobs using a system that is backed by Serf to handle the event propagation. I know that when issuing some operations via nodetool, the command blocks until the operation is finished. However, is there a way to reliably determine whether or not the operation has finished without monitoring that invocation of nodetool? In other words, when I run 'nodetool repair' what is the best way to reliably determine that the repair is finished without running something equivalent to a 'pgrep' against the command I invoked? I am curious about trying to do the same for major compactions too. Cheers! -Tim
Re: Repair/Compaction Completion Confirmation
On Mon, Oct 27, 2014 at 1:33 PM, Tim Heckman t...@pagerduty.com wrote: I know that when issuing some operations via nodetool, the command blocks until the operation is finished. However, is there a way to reliably determine whether or not the operation has finished without monitoring that invocation of nodetool? In other words, when I run 'nodetool repair' what is the best way to reliably determine that the repair is finished without running something equivalent to a 'pgrep' against the command I invoked? I am curious about trying to do the same for major compactions too. This is beyond a FAQ at this point, unfortunately; non-incremental repair is awkward to deal with and probably impossible to automate. In The Future [1] the correct solution will be to use incremental repair, which mitigates but does not solve this challenge entirely. As brief meta commentary, it would have been nice if the project had spent more time optimizing the operability of the critically important thing you must do once a week [2]. https://issues.apache.org/jira/browse/CASSANDRA-5483 =Rob [1] http://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1 [2] Or, more sensibly, once a month with gc_grace_seconds set to 34 days.
Re: Repair/Compaction Completion Confirmation
On Mon, Oct 27, 2014 at 1:44 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Oct 27, 2014 at 1:33 PM, Tim Heckman t...@pagerduty.com wrote: I know that when issuing some operations via nodetool, the command blocks until the operation is finished. However, is there a way to reliably determine whether or not the operation has finished without monitoring that invocation of nodetool? In other words, when I run 'nodetool repair' what is the best way to reliably determine that the repair is finished without running something equivalent to a 'pgrep' against the command I invoked? I am curious about trying to do the same for major compactions too. This is beyond a FAQ at this point, unfortunately; non-incremental repair is awkward to deal with and probably impossible to automate. In The Future [1] the correct solution will be to use incremental repair, which mitigates but does not solve this challenge entirely. As brief meta commentary, it would have been nice if the project had spent more time optimizing the operability of the critically important thing you must do once a week [2]. https://issues.apache.org/jira/browse/CASSANDRA-5483 =Rob [1] http://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1 [2] Or, more sensibly, once a month with gc_grace_seconds set to 34 days. Thank you for getting back to me so quickly. Not the answer that I was secretly hoping for, but it is nice to have confirmation. :) Cheers! -Tim
Re: Multi Datacenter / MultiRegion on AWS Best practice ?
If you decide to go the iptables route, you could try neti https://github.com/Instagram/neti (blog post here http://instagram-engineering.tumblr.com/post/100758229719/migrating-from-aws-to-aws .) On 27 October 2014 16:44, Juho Mäkinen juho.maki...@gmail.com wrote: Hi! 2014-10-23 11:16 GMT+02:00 Alain RODRIGUEZ arodr...@gmail.com: We are currently wondering about the best way to configure network architecture to have a Cassandra cluster multi DC. On solution 2, we would need to open IPs one by one on 3 ports (7000, 9042, 9160) at least. 100 entries in a security group would allow us to have a maximum of ~30 nodes You can also allow those ports from everywhere and then use local iptables to limit the access to only those IPs which you are actually using. You'll most certainly need some kind of configuration management system for this (Chef, puppet, salt-stack etc).
Re: Repair/Compaction Completion Confirmation
https://github.com/BrianGallew/cassandra_range_repair This breaks down the repair operation into very small portions of the ring as a way to try and work around the current fragile nature of repair. Leveraging range repair should go some way towards automating repair (this is how the automatic repair service in DataStax opscenter works, this is how we perform repairs). We have had a lot of success running repairs in a similar manner against vnode enabled clusters. Not 100% bullet proof, but way better than nodetool repair On 28 October 2014 08:32, Tim Heckman t...@pagerduty.com wrote: On Mon, Oct 27, 2014 at 1:44 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Oct 27, 2014 at 1:33 PM, Tim Heckman t...@pagerduty.com wrote: I know that when issuing some operations via nodetool, the command blocks until the operation is finished. However, is there a way to reliably determine whether or not the operation has finished without monitoring that invocation of nodetool? In other words, when I run 'nodetool repair' what is the best way to reliably determine that the repair is finished without running something equivalent to a 'pgrep' against the command I invoked? I am curious about trying to do the same for major compactions too. This is beyond a FAQ at this point, unfortunately; non-incremental repair is awkward to deal with and probably impossible to automate. In The Future [1] the correct solution will be to use incremental repair, which mitigates but does not solve this challenge entirely. As brief meta commentary, it would have been nice if the project had spent more time optimizing the operability of the critically important thing you must do once a week [2]. https://issues.apache.org/jira/browse/CASSANDRA-5483 =Rob [1] http://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1 [2] Or, more sensibly, once a month with gc_grace_seconds set to 34 days. Thank you for getting back to me so quickly. Not the answer that I was secretly hoping for, but it is nice to have confirmation. :) Cheers! -Tim -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | +61 415 936 359