Re: Adding new nodes in a cluster with virtual nodes
Hi Aaron, Thanks for your answer. I apologize, I did a mistake in my 1st mail. The cluster was only 12 nodes instead of 16 (it is a test cluster). There are 2 datacenters b1 and s1. Here is the result of nodetool status after adding a new node in the 1st datacenter (dc s1): root@node007:~# nodetool status Datacenter: b1 == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.234.72.135 10.71 GB 256 44.6% 2fc583b2-822f-4347-9fab-5e9d10d548c9 c01 UN 10.234.72.134 16.74 GB 256 63.7% f209a8c5-7e1b-45b5-aa80-ed679bbbdbd1 e01 UN 10.234.72.139 17.09 GB 256 62.0% 95661392-ccd8-4592-a76f-1c99f7cdf23a e07 UN 10.234.72.138 10.96 GB 256 42.9% 0d6725f0-1357-423d-85c1-153fb94257d5 e03 UN 10.234.72.137 11.09 GB 256 45.7% 492190d7-3055-4167-8699-9c6560e28164 e03 UN 10.234.72.136 11.91 GB 256 41.1% 3872f26c-5f2d-4fb3-9f5c-08b4c7762466 c01 Datacenter: s1 == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.98.255.139 16.94 GB 256 43.8% 3523e80c-8468-4502-b334-79eabc3357f0 g10 UN 10.98.255.138 12.62 GB 256 42.4% a2bcddf1-393e-453b-9d4f-9f7111c01d7f i02 UN 10.98.255.137 10.59 GB 256 38.4% f851b6ee-f1e4-431b-8beb-e7b173a77342 i02 UN 10.98.255.136 11.89 GB 256 42.9% 36fe902f-3fb1-4b6d-9e2c-71e601fa0f2e a09 UN 10.98.255.135 10.29 GB 256 40.4% e2d020a5-97a9-48d4-870c-d10b59858763 a09 UN 10.98.255.134 16.19 GB 256 52.3% 73e3376a-5a9f-4b8a-a119-c87ae1fafdcb h06 UN 10.98.255.140 127.84 KB 256 39.9% 3d5c33e6-35d0-40a0-b60d-2696fd5cbf72 g10 We can see that the new node (10.98.255.140) contains only 127,84KB. We saw also that there was no network traffic between the nodes. Then we added a new node in the 2nd datacenter (dc b1) root@node007:~# nodetool status Datacenter: b1 == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.234.72.135 12.95 GB 256 42.0% 2fc583b2-822f-4347-9fab-5e9d10d548c9 c01 UN 10.234.72.134 20.11 GB 256 53.1% f209a8c5-7e1b-45b5-aa80-ed679bbbdbd1 e01 UN 10.234.72.140 122.25 KB 256 41.9% 501ea498-8fed-4cc8-a23a-c99492bc4f26 e07 UN 10.234.72.139 20.46 GB 256 40.2% 95661392-ccd8-4592-a76f-1c99f7cdf23a e07 UN 10.234.72.138 13.21 GB 256 40.9% 0d6725f0-1357-423d-85c1-153fb94257d5 e03 UN 10.234.72.137 13.34 GB 256 42.9% 492190d7-3055-4167-8699-9c6560e28164 e03 UN 10.234.72.136 14.16 GB 256 39.0% 3872f26c-5f2d-4fb3-9f5c-08b4c7762466 c01 Datacenter: s1 == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.98.255.139 19.19 GB 256 43.8% 3523e80c-8468-4502-b334-79eabc3357f0 g10 UN 10.98.255.138 14.9 GB256 42.4% a2bcddf1-393e-453b-9d4f-9f7111c01d7f i02 UN 10.98.255.137 12.49 GB 256 38.4% f851b6ee-f1e4-431b-8beb-e7b173a77342 i02 UN 10.98.255.136 14.13 GB 256 42.9% 36fe902f-3fb1-4b6d-9e2c-71e601fa0f2e a09 UN 10.98.255.135 12.16 GB 256 40.4% e2d020a5-97a9-48d4-870c-d10b59858763 a09 UN 10.98.255.134 18.85 GB 256 52.3% 73e3376a-5a9f-4b8a-a119-c87ae1fafdcb h06 UN 10.98.255.140 2.24 GB256 39.9% 3d5c33e6-35d0-40a0-b60d-2696fd5cbf72 g10 We can see that the 2nd new node (10.234.72.140) contains only 122,25KB. The new node in the 1st datacenter contains now 2,24 GB because we were inserting data in the cluster while adding the new nodes. Then we started a repair from the new node in the 2nd datacenter : time nodetool repair We can see that the old nodes are sending data to the new node : root@node007:~# nodetool netstats Mode: NORMAL Not sending any streams. Streaming from: /10.98.255.137 hbxtest: /var/opt/hosting/db/iof/cassandra/data/hbxtest/medium_column/hbxtest-medium_column-ia-3-Data.db sections=130 progress=0/15598366 - 0% hbxtest: /var/opt/hosting/db/iof/cassandra/data/hbxtest/medium_column/hbxtest-medium_column-ia-198-Data.db sections=107 progress=0/429517 - 0% hbxtest: /var/opt/hosting/db/iof/cassandra/data/hbxtest/medium_column/hbxtest-medium_column-ia-17-Data.db sections=109 progress=0/696057 - 0% hbxtest: /var/opt/hosting/db/iof/cassandra/data/hbxtest/medium_column/hbxtest-medium_column-ia-119-Data.db sections=57 progress=0/189844 - 0% hbxtest: /var/opt/hosting/db/iof/cassandra/data/hbxtest/medium_column/hbxtest-medium_column-ia-199-Data.db sections=124 progress=56492032/4597955 - 1228% hbxtest: /var/opt/hosting/db/iof/cassandra/data/hbxtest/medium_column/hbxtest-medium_column-ia-196-Data.db
perlcassa throws TApplicationException=HASH(0x2323600)
Hello all, The perl script below throws TApplicationException=HASH(0x2323600). I googled around and it seems to be a thrift issue. Does anyone have a clue how I can prevent this? Regards Hans-Peter use perlcassa; use strict; use warnings; use perlcassa; my $obj = new perlcassa( keyspace = 'demo', #seed_nodes = ['nlvora213.oracle.atos', 'nlvora214.oracle.atos'], seed_nodes = ['127.0.0.1'], port = '9160' ); Dit bericht is vertrouwelijk en kan geheime informatie bevatten enkel bestemd voor de geadresseerde. Indien dit bericht niet voor u is bestemd, verzoeken wij u dit onmiddellijk aan ons te melden en het bericht te vernietigen. Aangezien de integriteit van het bericht niet veilig gesteld is middels verzending via internet, kan Atos Nederland B.V. niet aansprakelijk worden gehouden voor de inhoud daarvan. Hoewel wij ons inspannen een virusvrij netwerk te hanteren, geven wij geen enkele garantie dat dit bericht virusvrij is, noch aanvaarden wij enige aansprakelijkheid voor de mogelijke aanwezigheid van een virus in dit bericht. Op al onze rechtsverhoudingen, aanbiedingen en overeenkomsten waaronder Atos Nederland B.V. goederen en/of diensten levert zijn met uitsluiting van alle andere voorwaarden de Leveringsvoorwaarden van Atos Nederland B.V. van toepassing. Deze worden u op aanvraag direct kosteloos toegezonden. This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Atos Nederland B.V. group liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. On all offers and agreements under which Atos Nederland B.V. supplies goods and/or services of whatever nature, the Terms of Delivery from Atos Nederland B.V. exclusively apply. The Terms of Delivery shall be promptly submitted to you on your request. Atos Nederland B.V. / Utrecht KvK Utrecht 30132762
CQL describe table not working
I can describe keyspace keyspace just fine and I see my table(as the CREATE TABLE seen below) but when I run describe table nreldata cqlsh just prints out Not in any keyspace. Am I doing something wrong here? This is 1.1.4 cassandra and I wanted to try to set my bloomfilter fp to 1.0 (ie. Disabled) and the docs gave me some cql alter statement rather than the command for the cassandra cli client. CREATE TABLE nreldata ( KEY blob PRIMARY KEY ) WITH comment='' AND comparator=blob AND read_repair_chance=0.10 AND gc_grace_seconds=864000 AND default_validation=blob AND min_compaction_threshold=4 AND max_compaction_threshold=32 AND replicate_on_write='true' AND compaction_strategy_class='SizeTieredCompactionStrategy' AND compression_parameters:sstable_compression='SnappyCompressor';
operations progress on DBA operations?
I am used to systems running a first phase calculating how much files it will need to go through and then logging out the percent done or X files out of total files done. I ran this command and it is logging nothing nodetool upgradesstables databus5 nreldata; I have 130Gigs of data on my node and not all of it in that one column family above. How can I tell how far it is in it's process? It has been running for about 10 minutes already. I don't see anything in the log files either. Thanks, Dean
ReverseIndexExample
Hello, Anyone have already used ReverseIndexQuery from Astyanay. I was tring to understand it, but I execute the example of Astyanax Site and can not understood. Ssomeone can help me please? Thanks; -- Everton Lima Aleixo Mestrando em Ciência da Computação pela UFG Programador no LUPA
Re: CQL describe table not working
Hello, I'm using v1.2.1. If I want to use desc table and I haven't done a use keyspace then I use desc table keyspace.tablename. However if I have done use keyspace I only do a desc table tablename On 22 February 2013 14:09, Hiller, Dean dean.hil...@nrel.gov wrote: I can describe keyspace keyspace just fine and I see my table(as the CREATE TABLE seen below) but when I run describe table nreldata cqlsh just prints out Not in any keyspace. Am I doing something wrong here? This is 1.1.4 cassandra and I wanted to try to set my bloomfilter fp to 1.0 (ie. Disabled) and the docs gave me some cql alter statement rather than the command for the cassandra cli client. CREATE TABLE nreldata ( KEY blob PRIMARY KEY ) WITH comment='' AND comparator=blob AND read_repair_chance=0.10 AND gc_grace_seconds=864000 AND default_validation=blob AND min_compaction_threshold=4 AND max_compaction_threshold=32 AND replicate_on_write='true' AND compaction_strategy_class='SizeTieredCompactionStrategy' AND compression_parameters:sstable_compression='SnappyCompressor'; -- Thanks A Jabbar Azam
disabling bloomfilter not working? or did I do this wrong?
So in the cli, I ran update column family nreldata with bloom_filter_fp_chance=1.0; Then I ran nodetool upgradesstables databus5 nreldata; But my bloom filter size is still around 2gig(and I want to free up this heap) According to nodetool cfstats command… Column Family: nreldata SSTable count: 10 Space used (live): 96841497731 Space used (total): 96841497731 Number of Keys (estimate): 1249133696 Memtable Columns Count: 7066 Memtable Data Size: 4286174 Memtable Switch Count: 924 Read Count: 19087150 Read Latency: 0.595 ms. Write Count: 21281994 Write Latency: 0.013 ms. Pending Tasks: 0 Bloom Filter False Postives: 974393 Bloom Filter False Ratio: 0.8 Bloom Filter Space Used: 2318392048 Compacted row minimum size: 73 Compacted row maximum size: 446 Compacted row mean size: 143
Re: perlcassa throws TApplicationException=HASH(0x2323600)
Yes, this is a thrift error returned by C*. You can use Data::Dumper to grab what's in that hash ref to see if there are more clues. Throw your object in an eval{} block and then print Dumper($@) If you file a bug on github I can work with you there more so we don't bother everyone on the users list debugging. Best, Michael On Feb 22, 2013, at 2:10 AM, Sloot, Hans-Peter hans-peter.sl...@atos.netmailto:hans-peter.sl...@atos.net wrote: Hello all, The perl script below throws TApplicationException=HASH(0x2323600). I googled around and it seems to be a thrift issue. Does anyone have a clue how I can prevent this? Regards Hans-Peter use perlcassa; use strict; use warnings; use perlcassa; my $obj = new perlcassa( keyspace = 'demo', #seed_nodes = ['nlvora213.oracle.atos', 'nlvora214.oracle.atos'], seed_nodes = ['127.0.0.1'], port = '9160' ); Dit bericht is vertrouwelijk en kan geheime informatie bevatten enkel bestemd voor de geadresseerde. Indien dit bericht niet voor u is bestemd, verzoeken wij u dit onmiddellijk aan ons te melden en het bericht te vernietigen. Aangezien de integriteit van het bericht niet veilig gesteld is middels verzending via internet, kan Atos Nederland B.V. niet aansprakelijk worden gehouden voor de inhoud daarvan. Hoewel wij ons inspannen een virusvrij netwerk te hanteren, geven wij geen enkele garantie dat dit bericht virusvrij is, noch aanvaarden wij enige aansprakelijkheid voor de mogelijke aanwezigheid van een virus in dit bericht. Op al onze rechtsverhoudingen, aanbiedingen en overeenkomsten waaronder Atos Nederland B.V. goederen en/of diensten levert zijn met uitsluiting van alle andere voorwaarden de Leveringsvoorwaarden van Atos Nederland B.V. van toepassing. Deze worden u op aanvraag direct kosteloos toegezonden. This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Atos Nederland B.V. group liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. On all offers and agreements under which Atos Nederland B.V. supplies goods and/or services of whatever nature, the Terms of Delivery from Atos Nederland B.V. exclusively apply. The Terms of Delivery shall be promptly submitted to you on your request. Atos Nederland B.V. / Utrecht KvK Utrecht 30132762 Copy, by Barracuda, helps you store, protect, and share all your amazing things. Start today: www.copy.com.
How wide rows are structured in CQL3
Hi, My impression from reading docs is that in old versions of Cassandra, you could create very wide rows, say with timestamps as column names for time series data, and read an ordered slice of the row. So, RowKeyColumns === == RowKey1 1:val1 2:val2 3:val3 N:valN With this data I think you could say get RowKey1, cols 100 to 1000 and get a slice of values. (I have no experience with this, just from reading about it.) In CQL3 it looks like this is kind of normalized so I would have CREATE TABLE X ( RowKey text, TimeStamp int, Value text, PRIMARY KEY(RowKey, TimeStamp) ); Does this effectively create the same storage structure? Now, in CQL3, it looks like I should access it like this, SELECT Value FROM X WHERE RowKey = 'RowKey1' AND TimeStamp BETWEEN 100 AND 1000; Does this do the same thing? I also don't understand some of the things like WITH COMPACT STORAGE and CLUSTERING. I'm having a hard time figuring out how this maps to the underlying storage. It is a little more abstract. I feel like the new CQL stuff isn't really explained clearly to me -- is it just a query language that accesses the same underlying structures, or is Cassandra's storage and access model fundamentally different now?
Re: Read IO
AFAIk this is still roughly correct http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ It includes information on the page size read from disk. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 22/02/2013, at 5:45 AM, Jouni Hartikainen jouni.hartikai...@reaktor.fi wrote: Hi, On Feb 21, 2013, at 7:52 , Kanwar Sangha kan...@mavenir.com wrote: Hi – Can someone explain the worst case IOPS for a read ? No key cache, No row cache, sampling rate say 512. 1) Bloom filter will be checked to see existence of key (In RAM) 2) Index filer sample (IN RAM) will be checked to find approx. location in index file on disk 3) 1 IOPS to read the actual index file on disk (DISK) 4) 1 IOPS to get the data from the location in the sstable (DISK) Is this correct ? As you were asking for the worst case, I would still add one step that would be a seek inside an SSTable from the row start to the queried columns using column index. However, this applies only if you are querying a subset of columns in the row (not all) and the total row size exceeds column_index_size_in_kb (defaults to 64kB). So, as far as I have understood, the worst case steps (without any caches) are: 1. Check the SSTable bloom filters (in memory) 2. Use index samples to find approx. correct place in the key index file (in memory) 3. Read the key index file until correct key is found (1st disk seek read) 5. Seek to the start of the row in SSTable file and read row headers (possibly including column index) (2nd seek read) 6. Using column index seek to the correct place inside the SSTable file to actually read the columns (3rd seek read) If the row is very wide and you are asking for a random bunch of columns from here and there, the step 6 might even be needed multiple times. Also, if your row has spread over many SSTables, each of them needs to be accessed (at least steps 1. - 5.) to get the complete results for the query. All this in mind, if your node has any reasonable amount of reads, I'd say that in practice key index files will be page cached by the OS very quickly and thus normal read would end up being either one seek (for small rows without the column index) or two (for wider rows). Of course, as Peter already pointed out, the more columns you ask for, the more disk needs to read. For a continuous set of columns the read should be linear, however. -Jouni
Re: SSTable Num
Ok. So for 10 TB, I could have at least 4 SStables files each of 2.5 TB ? You will have many sstables, in your case 32. Each bucket of files (files that are within 50% of the average size of files in a bucket) will contain 3 or less files. This article provides com back ground, but it's working correctly as you have described it http://www.datastax.com/dev/blog/when-to-use-leveled-compaction Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 22/02/2013, at 6:39 AM, Kanwar Sangha kan...@mavenir.com wrote: No. The default size tiered strategy compacts files what are roughly the same size, and only when there are more than 4 (default) of them. Ok. So for 10 TB, I could have at least 4 SStables files each of 2.5 TB ? From: aaron morton [mailto:aa...@thelastpickle.com] Sent: 21 February 2013 11:01 To: user@cassandra.apache.org Subject: Re: SSTable Num Hi – I have around 6TB of data on 1 node Unless you have SSD and 10GbE you probably have too much data on there. Remember you need to run repair and that can take a long time with a lot of data. Also you may need to replace a node one day and moving 6TB will take a while. Or will the sstable compaction continue and eventually we will have 1 file ? No. The default size tiered strategy compacts files what are roughly the same size, and only when there are more than 4 (default) of them. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 21/02/2013, at 3:47 AM, Kanwar Sangha kan...@mavenir.com wrote: Hi – I have around 6TB of data on 1 node and the cfstats show 32 sstables. There is no compaction job running in the background. Is there a limit on the size per sstable ? Or will the sstable compaction continue and eventually we will have 1 file ? Thanks, Kanwar
Re: Heap is N.N full. Immediately on startup
To get a good idea of how GC is performing turn on the GC logging in cassandra-env.sh. After a full cms GC event, see how big the tenured heap is. If it's not reducing enough then GC will never get far enough ahead. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 22/02/2013, at 8:37 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: Thank you- indeed my index interval is 64 with a CF of 300M rows + bloom filter false positive chance was default. Raising the index interval to 512 didn't fix this alone, so I guess I'll have to set the bloom filter to some reasonable value and scrub. From: aaron morton aa...@thelastpickle.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Thursday 21 February 2013 17:58 To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Heap is N.N full. Immediately on startup My first guess would be the bloom filter and index sampling from lots-o-rows Check the row count in cfstats Check the bloom filter size in cfstats. Background on memory requirements http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 20/02/2013, at 11:27 PM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: Hey list, Any ideas ( before I take a heap dump ) what might be consuming my 8GB JVM heap at startup in Cassandra 1.1.6 besides row cache : not persisted and is at 0 keys when this warning is produced Memtables : no write traffic at startup, my app's column families are durable_writes:false Pending tasks : no pending tasks, except for 928 compactions ( not sure where those are coming from ) I drew these conclusions from the StatusLogger output below: INFO [ScheduledTasks:1] 2013-02-20 05:13:25,198 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 14959 ms for 2 collections, 7017934560 used; max is 8375238656 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,198 StatusLogger.java (line 57) Pool NameActive Pending Blocked INFO [ScheduledTasks:1] 2013-02-20 05:13:25,199 StatusLogger.java (line 72) ReadStage 0 0 0 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,200 StatusLogger.java (line 72) RequestResponseStage 0 0 0 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,200 StatusLogger.java (line 72) ReadRepairStage 0 0 0 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,200 StatusLogger.java (line 72) MutationStage 0-1 0 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,201 StatusLogger.java (line 72) ReplicateOnWriteStage 0 0 0 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,201 StatusLogger.java (line 72) GossipStage 0 0 0 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,201 StatusLogger.java (line 72) AntiEntropyStage 0 0 0 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,201 StatusLogger.java (line 72) MigrationStage0 0 0 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,201 StatusLogger.java (line 72) StreamStage 0 0 0 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,202 StatusLogger.java (line 72) MemtablePostFlusher 0 0 0 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,202 StatusLogger.java (line 72) FlushWriter 0 0 0 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,202 StatusLogger.java (line 72) MiscStage 0 0 0 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,202 StatusLogger.java (line 72) commitlog_archiver0 0 0 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,203 StatusLogger.java (line 72) InternalResponseStage 0 0 0 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,212 StatusLogger.java (line 77) CompactionManager 0 928 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,212 StatusLogger.java (line 89) MessagingServicen/a 0,0 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,212 StatusLogger.java (line 99) Cache Type Size Capacity KeysToSave Provider INFO [ScheduledTasks:1] 2013-02-20 05:13:25,212 StatusLogger.java (line 100) KeyCache 25 25 all INFO [ScheduledTasks:1] 2013-02-20 05:13:25,213 StatusLogger.java (line 106) RowCache 00
Re: operations progress on DBA operations?
Finally found itŠnodetool compactionstats shows the percentage complete. Dean On 2/22/13 7:44 AM, Hiller, Dean dean.hil...@nrel.gov wrote: I am used to systems running a first phase calculating how much files it will need to go through and then logging out the percent done or X files out of total files done. I ran this command and it is logging nothing nodetool upgradesstables databus5 nreldata; I have 130Gigs of data on my node and not all of it in that one column family above. How can I tell how far it is in it's process? It has been running for about 10 minutes already. I don't see anything in the log files either. Thanks, Dean
Re: Mutation dropped
If you are running repair, using QUORUM, and there are not dropped writes you should not be getting DigestMismatch during reads. If everything else looks good, but the request latency is higher than the CF latency I would check that client load is evenly distributed. Then start looking to see if the request throughput is at it's maximum for the cluster. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 22/02/2013, at 8:15 PM, Wei Zhu wz1...@yahoo.com wrote: Thanks Aaron for the great information as always. I just checked cfhistograms and only a handful of read latency are bigger than 100ms, but for proxyhistograms there are 10 times more are greater than 100ms. We are using QUORUM for reading with RF=3, and I understand coordinator needs to get the digest from other nodes and read repair on the miss match etc. But is it normal to see the latency from proxyhistograms to go beyond 100ms? Is there anyway to improve that? We are tracking the metrics from Client side and we see the 95th percentile response time averages at 40ms which is a bit high. Our 50th percentile was great under 3ms. Any suggestion is very much appreciated. Thanks. -Wei - Original Message - From: aaron morton aa...@thelastpickle.com To: Cassandra User user@cassandra.apache.org Sent: Thursday, February 21, 2013 9:20:49 AM Subject: Re: Mutation dropped What does rpc_timeout control? Only the reads/writes? Yes. like data stream, streaming_socket_timeout_in_ms in the yaml merkle tree request? Either no time out or a number of days, cannot remember which right now. What is the side effect if it's set to a really small number, say 20ms? You will probably get a lot more requests that fail with a TimedOutException. rpc_timeout needs to be longer than the time it takes a node to process the message, and the time it takes the coordinator to do it's thing. You can look at cfhistograms and proxyhistograms to get a better idea of how long a request takes in your system. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 21/02/2013, at 6:56 AM, Wei Zhu wz1...@yahoo.com wrote: What does rpc_timeout control? Only the reads/writes? How about other inter-node communication, like data stream, merkle tree request? What is the reasonable value for roc_timeout? The default value of 10 seconds are way too long. What is the side effect if it's set to a really small number, say 20ms? Thanks. -Wei From: aaron morton aa...@thelastpickle.com To: user@cassandra.apache.org Sent: Tuesday, February 19, 2013 7:32 PM Subject: Re: Mutation dropped Does the rpc_timeout not control the client timeout ? No it is how long a node will wait for a response from other nodes before raising a TimedOutException if less than CL nodes have responded. Set the client side socket timeout using your preferred client. Is there any param which is configurable to control the replication timeout between nodes ? There is no such thing. rpc_timeout is roughly like that, but it's not right to think about it that way. i.e. if a message to a replica times out and CL nodes have already responded then we are happy to call the request complete. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 19/02/2013, at 1:48 AM, Kanwar Sangha kan...@mavenir.com wrote: Thanks Aaron. Does the rpc_timeout not control the client timeout ? Is there any param which is configurable to control the replication timeout between nodes ? Or the same param is used to control that since the other node is also like a client ? From: aaron morton [mailto:aa...@thelastpickle.com] Sent: 17 February 2013 11:26 To: user@cassandra.apache.org Subject: Re: Mutation dropped You are hitting the maximum throughput on the cluster. The messages are dropped because the node fails to start processing them before rpc_timeout. However the request is still a success because the client requested CL was achieved. Testing with RF 2 and CL 1 really just tests the disks on one local machine. Both nodes replicate each row, and writes are sent to each replica, so the only thing the client is waiting on is the local node to write to it's commit log. Testing with (and running in prod) RF3 and CL QUROUM is a more real world scenario. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 15/02/2013, at 9:42 AM, Kanwar Sangha kan...@mavenir.com wrote: Hi – Is there a parameter which can be tuned to prevent the mutations from being dropped ? Is this logic correct ? Node A and B with RF=2, CL =1. Load balanced between the two.
is there a way to drain node(and prevent reads) and upgrade sstables offline?
We would like to take a node out of the ring and upgradesstables while it is not doing any writes nor reads with the ring. Is this possible? I am thinking from the documentation 1. nodetool drain 2. ANYTHING to stop reads here 3. Modify cassandra.yaml with compaction_throughput_mb_per_sec = 0 and multithreaded_compaction = true temporarily 4. Restart cassandra and run nodetool upgradesstables keyspace CF 5. Modify cassandra.yaml to revert changes 6. Restart cassandra to join the cluster again. Is this how it should be done? Thanks, Dean
Re: is there a way to drain node(and prevent reads) and upgrade sstables offline?
Couldn't you just disable thrift and leave gossip active? On 2/22/13 9:01 AM, Hiller, Dean dean.hil...@nrel.gov wrote: We would like to take a node out of the ring and upgradesstables while it is not doing any writes nor reads with the ring. Is this possible? I am thinking from the documentation 1. nodetool drain 2. ANYTHING to stop reads here 3. Modify cassandra.yaml with compaction_throughput_mb_per_sec = 0 and multithreaded_compaction = true temporarily 4. Restart cassandra and run nodetool upgradesstables keyspace CF 5. Modify cassandra.yaml to revert changes 6. Restart cassandra to join the cluster again. Is this how it should be done? Thanks, Dean Copy, by Barracuda, helps you store, protect, and share all your amazing things. Start today: www.copy.com.
Re: Adding new nodes in a cluster with virtual nodes
So, it looks that the repair is required if we want to add new nodes in our platform, but I don't understand why. Bootstrapping should take care of it. But new seed nodes do not bootstrap. Check the logs on the nodes you added to see what messages have bootstrap in them. Anytime you are worried about things like this throw in a nodetool repair. If you are using QUOURM for read and writes you will still be getting consistent data, so long as you have only added one node. Or one node every RF'th nodes. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 22/02/2013, at 9:55 PM, Jean-Armel Luce jaluc...@gmail.com wrote: Hi Aaron, Thanks for your answer. I apologize, I did a mistake in my 1st mail. The cluster was only 12 nodes instead of 16 (it is a test cluster). There are 2 datacenters b1 and s1. Here is the result of nodetool status after adding a new node in the 1st datacenter (dc s1): root@node007:~# nodetool status Datacenter: b1 == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.234.72.135 10.71 GB 256 44.6% 2fc583b2-822f-4347-9fab-5e9d10d548c9 c01 UN 10.234.72.134 16.74 GB 256 63.7% f209a8c5-7e1b-45b5-aa80-ed679bbbdbd1 e01 UN 10.234.72.139 17.09 GB 256 62.0% 95661392-ccd8-4592-a76f-1c99f7cdf23a e07 UN 10.234.72.138 10.96 GB 256 42.9% 0d6725f0-1357-423d-85c1-153fb94257d5 e03 UN 10.234.72.137 11.09 GB 256 45.7% 492190d7-3055-4167-8699-9c6560e28164 e03 UN 10.234.72.136 11.91 GB 256 41.1% 3872f26c-5f2d-4fb3-9f5c-08b4c7762466 c01 Datacenter: s1 == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.98.255.139 16.94 GB 256 43.8% 3523e80c-8468-4502-b334-79eabc3357f0 g10 UN 10.98.255.138 12.62 GB 256 42.4% a2bcddf1-393e-453b-9d4f-9f7111c01d7f i02 UN 10.98.255.137 10.59 GB 256 38.4% f851b6ee-f1e4-431b-8beb-e7b173a77342 i02 UN 10.98.255.136 11.89 GB 256 42.9% 36fe902f-3fb1-4b6d-9e2c-71e601fa0f2e a09 UN 10.98.255.135 10.29 GB 256 40.4% e2d020a5-97a9-48d4-870c-d10b59858763 a09 UN 10.98.255.134 16.19 GB 256 52.3% 73e3376a-5a9f-4b8a-a119-c87ae1fafdcb h06 UN 10.98.255.140 127.84 KB 256 39.9% 3d5c33e6-35d0-40a0-b60d-2696fd5cbf72 g10 We can see that the new node (10.98.255.140) contains only 127,84KB. We saw also that there was no network traffic between the nodes. Then we added a new node in the 2nd datacenter (dc b1) root@node007:~# nodetool status Datacenter: b1 == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.234.72.135 12.95 GB 256 42.0% 2fc583b2-822f-4347-9fab-5e9d10d548c9 c01 UN 10.234.72.134 20.11 GB 256 53.1% f209a8c5-7e1b-45b5-aa80-ed679bbbdbd1 e01 UN 10.234.72.140 122.25 KB 256 41.9% 501ea498-8fed-4cc8-a23a-c99492bc4f26 e07 UN 10.234.72.139 20.46 GB 256 40.2% 95661392-ccd8-4592-a76f-1c99f7cdf23a e07 UN 10.234.72.138 13.21 GB 256 40.9% 0d6725f0-1357-423d-85c1-153fb94257d5 e03 UN 10.234.72.137 13.34 GB 256 42.9% 492190d7-3055-4167-8699-9c6560e28164 e03 UN 10.234.72.136 14.16 GB 256 39.0% 3872f26c-5f2d-4fb3-9f5c-08b4c7762466 c01 Datacenter: s1 == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.98.255.139 19.19 GB 256 43.8% 3523e80c-8468-4502-b334-79eabc3357f0 g10 UN 10.98.255.138 14.9 GB256 42.4% a2bcddf1-393e-453b-9d4f-9f7111c01d7f i02 UN 10.98.255.137 12.49 GB 256 38.4% f851b6ee-f1e4-431b-8beb-e7b173a77342 i02 UN 10.98.255.136 14.13 GB 256 42.9% 36fe902f-3fb1-4b6d-9e2c-71e601fa0f2e a09 UN 10.98.255.135 12.16 GB 256 40.4% e2d020a5-97a9-48d4-870c-d10b59858763 a09 UN 10.98.255.134 18.85 GB 256 52.3% 73e3376a-5a9f-4b8a-a119-c87ae1fafdcb h06 UN 10.98.255.140 2.24 GB256 39.9% 3d5c33e6-35d0-40a0-b60d-2696fd5cbf72 g10 We can see that the 2nd new node (10.234.72.140) contains only 122,25KB. The new node in the 1st datacenter contains now 2,24 GB because we
Re: operations progress on DBA operations?
nodetool compactionstats Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 23/02/2013, at 3:44 AM, Hiller, Dean dean.hil...@nrel.gov wrote: I am used to systems running a first phase calculating how much files it will need to go through and then logging out the percent done or X files out of total files done. I ran this command and it is logging nothing nodetool upgradesstables databus5 nreldata; I have 130Gigs of data on my node and not all of it in that one column family above. How can I tell how far it is in it's process? It has been running for about 10 minutes already. I don't see anything in the log files either. Thanks, Dean
Re: operations progress on DBA operations?
Just to add though- compactionstats on an upgradesstables will only show the currently running sstable being upgraded. Overall progress on a upgradesstables isn't exposed anywhere yet but you can figure out how much there is to go thru the log lines. From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Friday, February 22, 2013 9:09 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: operations progress on DBA operations? nodetool compactionstats Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 23/02/2013, at 3:44 AM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: I am used to systems running a first phase calculating how much files it will need to go through and then logging out the percent done or X files out of total files done. I ran this command and it is logging nothing nodetool upgradesstables databus5 nreldata; I have 130Gigs of data on my node and not all of it in that one column family above. How can I tell how far it is in it's process? It has been running for about 10 minutes already. I don't see anything in the log files either. Thanks, Dean Copy, by Barracuda, helps you store, protect, and share all your amazing things. Start today: www.copy.com.
Re: ReverseIndexExample
We are trying to answer client library specific questions on the client-dev list, see the link at the bottom here http://cassandra.apache.org/ If you can ask a more specific question I'll answer it there. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 23/02/2013, at 3:44 AM, Everton Lima peitin.inu...@gmail.com wrote: Hello, Anyone have already used ReverseIndexQuery from Astyanay. I was tring to understand it, but I execute the example of Astyanax Site and can not understood. Ssomeone can help me please? Thanks; -- Everton Lima Aleixo Mestrando em Ciência da Computação pela UFG Programador no LUPA
Re: disabling bloomfilter not working? or did I do this wrong?
Bloom Filter Space Used: 2318392048 Just to be sane do a quick check of the -Filter.db files on disk for this CF. If they are very small try a restart on the node. Number of Keys (estimate): 1249133696 Hey a billion rows on a node, what an age we live in :) Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 23/02/2013, at 4:35 AM, Hiller, Dean dean.hil...@nrel.gov wrote: So in the cli, I ran update column family nreldata with bloom_filter_fp_chance=1.0; Then I ran nodetool upgradesstables databus5 nreldata; But my bloom filter size is still around 2gig(and I want to free up this heap) According to nodetool cfstats command… Column Family: nreldata SSTable count: 10 Space used (live): 96841497731 Space used (total): 96841497731 Number of Keys (estimate): 1249133696 Memtable Columns Count: 7066 Memtable Data Size: 4286174 Memtable Switch Count: 924 Read Count: 19087150 Read Latency: 0.595 ms. Write Count: 21281994 Write Latency: 0.013 ms. Pending Tasks: 0 Bloom Filter False Postives: 974393 Bloom Filter False Ratio: 0.8 Bloom Filter Space Used: 2318392048 Compacted row minimum size: 73 Compacted row maximum size: 446 Compacted row mean size: 143
Re: How wide rows are structured in CQL3
Does this effectively create the same storage structure? Yes. SELECT Value FROM X WHERE RowKey = 'RowKey1' AND TimeStamp BETWEEN 100 AND 1000; select value from X where RoWKey = 'foo' and timestamp = 100 and timestamp = 1000; I also don't understand some of the things like WITH COMPACT STORAGE and CLUSTERING. Some info here, does not cover compact storage http://thelastpickle.com/2013/01/11/primary-keys-in-cql/ Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 23/02/2013, at 4:36 AM, Boris Solovyov boris.solov...@gmail.com wrote: Hi, My impression from reading docs is that in old versions of Cassandra, you could create very wide rows, say with timestamps as column names for time series data, and read an ordered slice of the row. So, RowKeyColumns === == RowKey1 1:val1 2:val2 3:val3 N:valN With this data I think you could say get RowKey1, cols 100 to 1000 and get a slice of values. (I have no experience with this, just from reading about it.) In CQL3 it looks like this is kind of normalized so I would have CREATE TABLE X ( RowKey text, TimeStamp int, Value text, PRIMARY KEY(RowKey, TimeStamp) ); Does this effectively create the same storage structure? Now, in CQL3, it looks like I should access it like this, SELECT Value FROM X WHERE RowKey = 'RowKey1' AND TimeStamp BETWEEN 100 AND 1000; Does this do the same thing? I also don't understand some of the things like WITH COMPACT STORAGE and CLUSTERING. I'm having a hard time figuring out how this maps to the underlying storage. It is a little more abstract. I feel like the new CQL stuff isn't really explained clearly to me -- is it just a query language that accesses the same underlying structures, or is Cassandra's storage and access model fundamentally different now?
Re: Q on schema migratins
dropped this secondary index after while. I assume you use UPDATE COLUMN FAMILY in the CLI. How can I avoid this secondary index building on node join? Check the schema using show schema in the cli. Check that all nodes in the cluster have the same schema, using describe cluster in the cli. If they are in disagreement see this http://wiki.apache.org/cassandra/FAQ#schema_disagreement Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 23/02/2013, at 5:17 AM, Igor i...@4friends.od.ua wrote: Hello Cassandra 1.0.7 Some time ago we used secondary index on one of CF. Due to performance reasons we dropped this secondary index after while. But now, each time I add and bootstrap new node I see how cassandra again build this secondary index on this node (which takes huge time), and when index is built it is not used anymore, so I can safely delete files from disk. How can I avoid this secondary index building on node join? Thanks for your answers!
Re: is there a way to drain node(and prevent reads) and upgrade sstables offline?
To stop all writes and reads disable thrift and gossip via nodetool. This will not stop any in progress repair sessions nor disconnect fat clients if you have them. There are also cmd line args cassandra.start_rpc and cassandra.join_ring whihc do the same thing. You can also change the compaction throughput using nodetool multithreaded_compaction = true temporarily Unless you have SSD leave this guy alone. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 23/02/2013, at 6:04 AM, Michael Kjellman mkjell...@barracuda.com wrote: Couldn't you just disable thrift and leave gossip active? On 2/22/13 9:01 AM, Hiller, Dean dean.hil...@nrel.gov wrote: We would like to take a node out of the ring and upgradesstables while it is not doing any writes nor reads with the ring. Is this possible? I am thinking from the documentation 1. nodetool drain 2. ANYTHING to stop reads here 3. Modify cassandra.yaml with compaction_throughput_mb_per_sec = 0 and multithreaded_compaction = true temporarily 4. Restart cassandra and run nodetool upgradesstables keyspace CF 5. Modify cassandra.yaml to revert changes 6. Restart cassandra to join the cluster again. Is this how it should be done? Thanks, Dean Copy, by Barracuda, helps you store, protect, and share all your amazing things. Start today: www.copy.com.
Re: Size Tiered - Leveled Compaction
Hello, Still doing research before we potentially move one of our column families from Size Tiered-Leveled compaction this weekend. I was doing some research around some of the bugs that were filed against leveled compaction in Cassandra and I found this: https://issues.apache.org/jira/browse/CASSANDRA-4644 The bug mentions: You need to run the offline scrub (bin/sstablescrub) to fix the sstable overlapping problem from early 1.1 releases. (Running with -m to just check for overlaps between sstables should be fine, since you already scrubbed online which will catch out-of-order within an sstable.) We recently upgraded from 1.1.2 to 1.1.9. Does anyone know if an offline scrub is recommended to be performed when switching from STCS-LCS after upgrading from 1.1.2? Any insight would be appreciated, Thanks, -Mike On 2/17/2013 8:57 PM, Wei Zhu wrote: We doubled the SStable size to 10M. It still generates a lot of SSTable and we don't see much difference of the read latency. We are able to finish the compactions after repair within serveral hours. We will increase the SSTable size again if we feel the number of SSTable hurts the performance. - Original Message - From: Mike mthero...@yahoo.com To: user@cassandra.apache.org Sent: Sunday, February 17, 2013 4:50:40 AM Subject: Re: Size Tiered - Leveled Compaction Hello Wei, First thanks for this response. Out of curiosity, what SSTable size did you choose for your usecase, and what made you decide on that number? Thanks, -Mike On 2/14/2013 3:51 PM, Wei Zhu wrote: I haven't tried to switch compaction strategy. We started with LCS. For us, after massive data imports (5000 w/seconds for 6 days), the first repair is painful since there is quite some data inconsistency. For 150G nodes, repair brought in about 30 G and created thousands of pending compactions. It took almost a day to clear those. Just be prepared LCS is really slow in 1.1.X. System performance degrades during that time since reads could go to more SSTable, we see 20 SSTable lookup for one read.. (We tried everything we can and couldn't speed it up. I think it's single threaded and it's not recommended to turn on multithread compaction. We even tried that, it didn't help )There is parallel LCS in 1.2 which is supposed to alleviate the pain. Haven't upgraded yet, hope it works:) http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 Since our cluster is not write intensive, only 100 w/seconds. I don't see any pending compactions during regular operation. One thing worth mentioning is the size of the SSTable, default is 5M which is kind of small for 200G (all in one CF) data set, and we are on SSD. It more than 150K files in one directory. (200G/5M = 40K SSTable and each SSTable creates 4 files on disk) You might want to watch that and decide the SSTable size. By the way, there is no concept of Major compaction for LCS. Just for fun, you can look at a file called $CFName.json in your data directory and it tells you the SSTable distribution among different levels. -Wei From: Charles Brophy cbro...@zulily.com To: user@cassandra.apache.org Sent: Thursday, February 14, 2013 8:29 AM Subject: Re: Size Tiered - Leveled Compaction I second these questions: we've been looking into changing some of our CFs to use leveled compaction as well. If anybody here has the wisdom to answer them it would be of wonderful help. Thanks Charles On Wed, Feb 13, 2013 at 7:50 AM, Mike mthero...@yahoo.com wrote: Hello, I'm investigating the transition of some of our column families from Size Tiered - Leveled Compaction. I believe we have some high-read-load column families that would benefit tremendously. I've stood up a test DB Node to investigate the transition. I successfully alter the column family, and I immediately noticed a large number (1000+) pending compaction tasks become available, but no compaction get executed. I tried running nodetool sstableupgrade on the column family, and the compaction tasks don't move. I also notice no changes to the size and distribution of the existing SSTables. I then run a major compaction on the column family. All pending compaction tasks get run, and the SSTables have a distribution that I would expect from LeveledCompaction (lots and lots of 10MB files). Couple of questions: 1) Is a major compaction required to transition from size-tiered to leveled compaction? 2) Are major compactions as much of a concern for LeveledCompaction as their are for Size Tiered? All the documentation I found concerning transitioning from Size Tiered to Level compaction discuss the alter table cql command, but I haven't found too much on what else needs to be done after the schema change. I did these tests with Cassandra 1.1.9. Thanks, -Mike
Re: disabling bloomfilter not working? or did I do this wrong?
Thanks, but I found out it is still running. It looks like I have about a 5 hour wait left for my upgradesstables(waited 4 hours already). I will check the bloomfilter after that. Out of curiosity, if I had much wider rows (ie. 900k) per row, will compaction run faster(e…upgradesstables) at all or would it basically run at the same speed. I guess what I am wondering is 9 hours a normal compaction time for 130gb of data? Thanks, Dean From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Friday, February 22, 2013 10:29 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: disabling bloomfilter not working? or did I do this wrong? Bloom Filter Space Used: 2318392048 Just to be sane do a quick check of the -Filter.db files on disk for this CF. If they are very small try a restart on the node. Number of Keys (estimate): 1249133696 Hey a billion rows on a node, what an age we live in :) Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 23/02/2013, at 4:35 AM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: So in the cli, I ran update column family nreldata with bloom_filter_fp_chance=1.0; Then I ran nodetool upgradesstables databus5 nreldata; But my bloom filter size is still around 2gig(and I want to free up this heap) According to nodetool cfstats command… Column Family: nreldata SSTable count: 10 Space used (live): 96841497731 Space used (total): 96841497731 Number of Keys (estimate): 1249133696 Memtable Columns Count: 7066 Memtable Data Size: 4286174 Memtable Switch Count: 924 Read Count: 19087150 Read Latency: 0.595 ms. Write Count: 21281994 Write Latency: 0.013 ms. Pending Tasks: 0 Bloom Filter False Postives: 974393 Bloom Filter False Ratio: 0.8 Bloom Filter Space Used: 2318392048 Compacted row minimum size: 73 Compacted row maximum size: 446 Compacted row mean size: 143
found bottleneck but can we do these steps?
So, it turns out we don't have enough I/o going on for our upgradesstables but it is really hitting the upper bounds of memory(8G) and our cpu is pretty low as well. At any rate, we are trying to remove a 2 gig bloomfilter on a columnfamily. Can we do the following 1. Disable thrift/gossip (per previous emails) 2. Restart the node? (any way to restart it without reading in that bloomfilter to lesson the memory……should I temporarily turn up the node without the key cache maybe) 3. Run nodetool upgradesstables databus5 nreldata; 1.When I restart the node, will gossip/thrift stay off ??? Or do I change the seeds, change 9160 to and I don't see where I can change 7199 to something? (how to do this safely). 2.H, is there any way to run upgradesstables when cassandra is not running AND crank up the memory of nodetool to 8G or does nodetool always just tell cassandra to do it I feel like I have a chicken and egg problem here. I want to clean up this bloomfilter which requires upgradesstables(from what I read), but I need the bloomfilter to not be there so I am not bottlenecked by the memory. At this rate, I will have to do each node each day for 6 days before I can recover (and I would prefer to speed it up just a little). Thanks, Dean