Re: Adding new nodes in a cluster with virtual nodes

2013-02-22 Thread Jean-Armel Luce
Hi Aaron,

Thanks for your answer.


I apologize, I did a mistake in my 1st mail. The cluster was only 12 nodes
instead of 16 (it is a test cluster).
There are 2 datacenters b1 and s1.

Here is the result of nodetool status after adding a new node in the 1st
datacenter (dc s1):
root@node007:~# nodetool status
Datacenter: b1
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns (effective)  Host
ID   Rack
UN  10.234.72.135 10.71 GB   256 44.6%
2fc583b2-822f-4347-9fab-5e9d10d548c9  c01
UN  10.234.72.134 16.74 GB   256 63.7%
f209a8c5-7e1b-45b5-aa80-ed679bbbdbd1  e01
UN  10.234.72.139 17.09 GB   256 62.0%
95661392-ccd8-4592-a76f-1c99f7cdf23a  e07
UN  10.234.72.138 10.96 GB   256 42.9%
0d6725f0-1357-423d-85c1-153fb94257d5  e03
UN  10.234.72.137 11.09 GB   256 45.7%
492190d7-3055-4167-8699-9c6560e28164  e03
UN  10.234.72.136 11.91 GB   256 41.1%
3872f26c-5f2d-4fb3-9f5c-08b4c7762466  c01
Datacenter: s1
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns (effective)  Host
ID   Rack
UN  10.98.255.139 16.94 GB   256 43.8%
3523e80c-8468-4502-b334-79eabc3357f0  g10
UN  10.98.255.138 12.62 GB   256 42.4%
a2bcddf1-393e-453b-9d4f-9f7111c01d7f  i02
UN  10.98.255.137 10.59 GB   256 38.4%
f851b6ee-f1e4-431b-8beb-e7b173a77342  i02
UN  10.98.255.136 11.89 GB   256 42.9%
36fe902f-3fb1-4b6d-9e2c-71e601fa0f2e  a09
UN  10.98.255.135 10.29 GB   256 40.4%
e2d020a5-97a9-48d4-870c-d10b59858763  a09
UN  10.98.255.134 16.19 GB   256 52.3%
73e3376a-5a9f-4b8a-a119-c87ae1fafdcb  h06
UN  10.98.255.140 127.84 KB  256 39.9%
3d5c33e6-35d0-40a0-b60d-2696fd5cbf72  g10

We can see that the new node (10.98.255.140) contains only 127,84KB.
We saw also that there was no network traffic between the nodes.

Then we added a new node in the 2nd datacenter (dc b1)



root@node007:~# nodetool status
Datacenter: b1
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns (effective)  Host
ID   Rack
UN  10.234.72.135 12.95 GB   256 42.0%
2fc583b2-822f-4347-9fab-5e9d10d548c9  c01
UN  10.234.72.134 20.11 GB   256 53.1%
f209a8c5-7e1b-45b5-aa80-ed679bbbdbd1  e01
UN  10.234.72.140 122.25 KB  256 41.9%
501ea498-8fed-4cc8-a23a-c99492bc4f26  e07
UN  10.234.72.139 20.46 GB   256 40.2%
95661392-ccd8-4592-a76f-1c99f7cdf23a  e07
UN  10.234.72.138 13.21 GB   256 40.9%
0d6725f0-1357-423d-85c1-153fb94257d5  e03
UN  10.234.72.137 13.34 GB   256 42.9%
492190d7-3055-4167-8699-9c6560e28164  e03
UN  10.234.72.136 14.16 GB   256 39.0%
3872f26c-5f2d-4fb3-9f5c-08b4c7762466  c01
Datacenter: s1
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns (effective)  Host
ID   Rack
UN  10.98.255.139 19.19 GB   256 43.8%
3523e80c-8468-4502-b334-79eabc3357f0  g10
UN  10.98.255.138 14.9 GB256 42.4%
a2bcddf1-393e-453b-9d4f-9f7111c01d7f  i02
UN  10.98.255.137 12.49 GB   256 38.4%
f851b6ee-f1e4-431b-8beb-e7b173a77342  i02
UN  10.98.255.136 14.13 GB   256 42.9%
36fe902f-3fb1-4b6d-9e2c-71e601fa0f2e  a09
UN  10.98.255.135 12.16 GB   256 40.4%
e2d020a5-97a9-48d4-870c-d10b59858763  a09
UN  10.98.255.134 18.85 GB   256 52.3%
73e3376a-5a9f-4b8a-a119-c87ae1fafdcb  h06
UN  10.98.255.140 2.24 GB256 39.9%
3d5c33e6-35d0-40a0-b60d-2696fd5cbf72  g10


We can see that the 2nd new node (10.234.72.140) contains only 122,25KB.
The new node in the 1st datacenter contains now 2,24 GB because we were
inserting data in the cluster while adding the new nodes.

Then we started a repair from the new node in the 2nd datacenter :
time nodetool repair


We can see that the old nodes are sending data to the new node :
root@node007:~# nodetool netstats
Mode: NORMAL
Not sending any streams.
Streaming from: /10.98.255.137
   hbxtest:
/var/opt/hosting/db/iof/cassandra/data/hbxtest/medium_column/hbxtest-medium_column-ia-3-Data.db
sections=130 progress=0/15598366 - 0%
   hbxtest:
/var/opt/hosting/db/iof/cassandra/data/hbxtest/medium_column/hbxtest-medium_column-ia-198-Data.db
sections=107 progress=0/429517 - 0%
   hbxtest:
/var/opt/hosting/db/iof/cassandra/data/hbxtest/medium_column/hbxtest-medium_column-ia-17-Data.db
sections=109 progress=0/696057 - 0%
   hbxtest:
/var/opt/hosting/db/iof/cassandra/data/hbxtest/medium_column/hbxtest-medium_column-ia-119-Data.db
sections=57 progress=0/189844 - 0%
   hbxtest:
/var/opt/hosting/db/iof/cassandra/data/hbxtest/medium_column/hbxtest-medium_column-ia-199-Data.db
sections=124 progress=56492032/4597955 - 1228%
   hbxtest:
/var/opt/hosting/db/iof/cassandra/data/hbxtest/medium_column/hbxtest-medium_column-ia-196-Data.db

perlcassa throws TApplicationException=HASH(0x2323600)

2013-02-22 Thread Sloot, Hans-Peter
Hello all,

The perl script below throws TApplicationException=HASH(0x2323600).
I googled around and it seems to be a thrift issue.

Does anyone have a clue how I can prevent this?

Regards Hans-Peter

use perlcassa;
use strict;
use warnings;

use perlcassa;


my $obj = new perlcassa(
keyspace = 'demo',
#seed_nodes = ['nlvora213.oracle.atos', 'nlvora214.oracle.atos'],
seed_nodes = ['127.0.0.1'],
port = '9160'
);





Dit bericht is vertrouwelijk en kan geheime informatie bevatten enkel bestemd 
voor de geadresseerde. Indien dit bericht niet voor u is bestemd, verzoeken wij 
u dit onmiddellijk aan ons te melden en het bericht te vernietigen. Aangezien 
de integriteit van het bericht niet veilig gesteld is middels verzending via 
internet, kan Atos Nederland B.V. niet aansprakelijk worden gehouden voor de 
inhoud daarvan. Hoewel wij ons inspannen een virusvrij netwerk te hanteren, 
geven wij geen enkele garantie dat dit bericht virusvrij is, noch aanvaarden 
wij enige aansprakelijkheid voor de mogelijke aanwezigheid van een virus in dit 
bericht. Op al onze rechtsverhoudingen, aanbiedingen en overeenkomsten 
waaronder Atos Nederland B.V. goederen en/of diensten levert zijn met 
uitsluiting van alle andere voorwaarden de Leveringsvoorwaarden van Atos 
Nederland B.V. van toepassing. Deze worden u op aanvraag direct kosteloos 
toegezonden.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Atos Nederland B.V. group liability cannot be 
triggered for the message content. Although the sender endeavours to maintain a 
computer virus-free network, the sender does not warrant that this transmission 
is virus-free and will not be liable for any damages resulting from any virus 
transmitted. On all offers and agreements under which Atos Nederland B.V. 
supplies goods and/or services of whatever nature, the Terms of Delivery from 
Atos Nederland B.V. exclusively apply. The Terms of Delivery shall be promptly 
submitted to you on your request.

Atos Nederland B.V. / Utrecht
KvK Utrecht 30132762


CQL describe table not working

2013-02-22 Thread Hiller, Dean
I can describe keyspace keyspace just fine and I see my table(as the CREATE 
TABLE seen below) but when I run

describe table nreldata cqlsh just prints out Not in any keyspace.  Am I 
doing something wrong here?  This is 1.1.4 cassandra and I wanted to try to set 
my bloomfilter fp to 1.0 (ie. Disabled) and the docs gave me some cql alter 
statement rather than the command for the cassandra cli client.

CREATE TABLE nreldata (
  KEY blob PRIMARY KEY
) WITH
  comment='' AND
  comparator=blob AND
  read_repair_chance=0.10 AND
  gc_grace_seconds=864000 AND
  default_validation=blob AND
  min_compaction_threshold=4 AND
  max_compaction_threshold=32 AND
  replicate_on_write='true' AND
  compaction_strategy_class='SizeTieredCompactionStrategy' AND
  compression_parameters:sstable_compression='SnappyCompressor';


operations progress on DBA operations?

2013-02-22 Thread Hiller, Dean
I am used to systems running a first phase calculating how much files it will 
need to go through and then logging out the percent done or X files out of 
total files done.  I ran this command and it is logging nothing

nodetool upgradesstables databus5 nreldata;

I have 130Gigs of data on my node and not all of it in that one column family 
above.  How can I tell how far it is in it's process?  It has been running for 
about 10 minutes already.  I don't see anything in the log files either.

Thanks,
Dean


ReverseIndexExample

2013-02-22 Thread Everton Lima
Hello,

Anyone have already used ReverseIndexQuery from Astyanay. I was tring to
understand it, but I execute the example of Astyanax Site and can not
understood.
Ssomeone can help me please?

Thanks;

-- 
Everton Lima Aleixo
Mestrando em Ciência da Computação pela UFG
Programador no LUPA


Re: CQL describe table not working

2013-02-22 Thread Jabbar
Hello,

I'm using v1.2.1. If I want to use desc table and I haven't done a use
keyspace then I use desc table keyspace.tablename.

However if I have done use keyspace I only do a desc table tablename


On 22 February 2013 14:09, Hiller, Dean dean.hil...@nrel.gov wrote:

 I can describe keyspace keyspace just fine and I see my table(as the
 CREATE TABLE seen below) but when I run

 describe table nreldata cqlsh just prints out Not in any keyspace.  Am
 I doing something wrong here?  This is 1.1.4 cassandra and I wanted to try
 to set my bloomfilter fp to 1.0 (ie. Disabled) and the docs gave me some
 cql alter statement rather than the command for the cassandra cli client.

 CREATE TABLE nreldata (
   KEY blob PRIMARY KEY
 ) WITH
   comment='' AND
   comparator=blob AND
   read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   default_validation=blob AND
   min_compaction_threshold=4 AND
   max_compaction_threshold=32 AND
   replicate_on_write='true' AND
   compaction_strategy_class='SizeTieredCompactionStrategy' AND
   compression_parameters:sstable_compression='SnappyCompressor';




-- 
Thanks

 A Jabbar Azam


disabling bloomfilter not working? or did I do this wrong?

2013-02-22 Thread Hiller, Dean
So in the cli, I ran

update column family nreldata with bloom_filter_fp_chance=1.0;

Then I ran

nodetool upgradesstables databus5 nreldata;

But my bloom filter size is still around 2gig(and I want to free up this 
heap) According to nodetool cfstats command…

Column Family: nreldata
SSTable count: 10
Space used (live): 96841497731
Space used (total): 96841497731
Number of Keys (estimate): 1249133696
Memtable Columns Count: 7066
Memtable Data Size: 4286174
Memtable Switch Count: 924
Read Count: 19087150
Read Latency: 0.595 ms.
Write Count: 21281994
Write Latency: 0.013 ms.
Pending Tasks: 0
Bloom Filter False Postives: 974393
Bloom Filter False Ratio: 0.8
Bloom Filter Space Used: 2318392048
Compacted row minimum size: 73
Compacted row maximum size: 446
Compacted row mean size: 143




Re: perlcassa throws TApplicationException=HASH(0x2323600)

2013-02-22 Thread Michael Kjellman
Yes, this is a thrift error returned by C*. You can use Data::Dumper to grab 
what's in that hash ref to see if there are more clues. Throw your object in an 
eval{} block and then print Dumper($@)

If you file a bug on github I can work with you there more so we don't bother 
everyone on the users list debugging.

Best,
Michael

On Feb 22, 2013, at 2:10 AM, Sloot, Hans-Peter 
hans-peter.sl...@atos.netmailto:hans-peter.sl...@atos.net wrote:

Hello all,

The perl script below throws TApplicationException=HASH(0x2323600).
I googled around and it seems to be a thrift issue.

Does anyone have a clue how I can prevent this?

Regards Hans-Peter

use perlcassa;
use strict;
use warnings;

use perlcassa;


my $obj = new perlcassa(
keyspace = 'demo',
#seed_nodes = ['nlvora213.oracle.atos', 'nlvora214.oracle.atos'],
seed_nodes = ['127.0.0.1'],
port = '9160'
);





Dit bericht is vertrouwelijk en kan geheime informatie bevatten enkel bestemd 
voor de geadresseerde. Indien dit bericht niet voor u is bestemd, verzoeken wij 
u dit onmiddellijk aan ons te melden en het bericht te vernietigen. Aangezien 
de integriteit van het bericht niet veilig gesteld is middels verzending via 
internet, kan Atos Nederland B.V. niet aansprakelijk worden gehouden voor de 
inhoud daarvan. Hoewel wij ons inspannen een virusvrij netwerk te hanteren, 
geven wij geen enkele garantie dat dit bericht virusvrij is, noch aanvaarden 
wij enige aansprakelijkheid voor de mogelijke aanwezigheid van een virus in dit 
bericht. Op al onze rechtsverhoudingen, aanbiedingen en overeenkomsten 
waaronder Atos Nederland B.V. goederen en/of diensten levert zijn met 
uitsluiting van alle andere voorwaarden de Leveringsvoorwaarden van Atos 
Nederland B.V. van toepassing. Deze worden u op aanvraag direct kosteloos 
toegezonden.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Atos Nederland B.V. group liability cannot be 
triggered for the message content. Although the sender endeavours to maintain a 
computer virus-free network, the sender does not warrant that this transmission 
is virus-free and will not be liable for any damages resulting from any virus 
transmitted. On all offers and agreements under which Atos Nederland B.V. 
supplies goods and/or services of whatever nature, the Terms of Delivery from 
Atos Nederland B.V. exclusively apply. The Terms of Delivery shall be promptly 
submitted to you on your request.

Atos Nederland B.V. / Utrecht
KvK Utrecht 30132762

Copy, by Barracuda, helps you store, protect, and share all your amazing
things. Start today: www.copy.com.


How wide rows are structured in CQL3

2013-02-22 Thread Boris Solovyov
Hi,

My impression from reading docs is that in old versions of Cassandra, you
could create very wide rows, say with timestamps as column names for time
series data, and read an ordered slice of the row.  So,

RowKeyColumns
===  ==
RowKey1  1:val1 2:val2 3:val3  N:valN

With this data I think you could say get RowKey1, cols 100 to 1000 and
get a slice of values. (I have no experience with this, just from reading
about it.)

In CQL3 it looks like this is kind of normalized so I would have

CREATE TABLE X (
RowKey text,
TimeStamp int,
Value text,
PRIMARY KEY(RowKey, TimeStamp)
);

Does this effectively create the same storage structure?

Now, in CQL3, it looks like I should access it like this,

SELECT Value FROM X WHERE RowKey = 'RowKey1' AND TimeStamp BETWEEN 100 AND
1000;

Does this do the same thing?

I also don't understand some of the things like WITH COMPACT STORAGE and
CLUSTERING. I'm having a hard time figuring out how this maps to the
underlying storage. It is a little more abstract. I feel like the new CQL
stuff isn't really explained clearly to me -- is it just a query language
that accesses the same underlying structures, or is Cassandra's storage and
access model fundamentally different now?


Re: Read IO

2013-02-22 Thread aaron morton
AFAIk this is still roughly correct 
http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/

It includes information on the page size read from disk. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/02/2013, at 5:45 AM, Jouni Hartikainen jouni.hartikai...@reaktor.fi 
wrote:

 
 Hi,
 
 On Feb 21, 2013, at 7:52 , Kanwar Sangha kan...@mavenir.com wrote:
 Hi – Can someone explain the worst case IOPS for a read ? No key cache, No 
 row cache, sampling rate say 512.
 
 1)  Bloom filter will be checked to see existence of key (In RAM)
 2)  Index filer sample (IN RAM) will be checked to find approx. location 
 in index file on disk
 3)  1 IOPS to read the actual index file on disk (DISK)
 4)  1 IOPS to get the data from the location in the sstable (DISK)
 
 Is this correct ?
 
 As you were asking for the worst case, I would still add one step that would 
 be a seek inside an SSTable from the row start to the queried columns using 
 column index.
 
 However, this applies only if you are querying a subset of columns in the row 
 (not all) and the total row size exceeds column_index_size_in_kb (defaults to 
 64kB).
 
 So, as far as I have understood, the worst case steps (without any caches) 
 are:
 
 1. Check the SSTable bloom filters (in memory)
 2. Use index samples to find approx. correct place in the key index file (in 
 memory)
 3. Read the key index file until correct key is found (1st disk seek  read)
 5. Seek to the start of the row in SSTable file and read row headers 
 (possibly including column index) (2nd seek  read)
 6. Using column index seek to the correct place inside the SSTable file to 
 actually read the columns (3rd seek  read)
 
 If the row is very wide and you are asking for a random bunch of columns from 
 here and there, the step 6 might even be needed multiple times. Also, if your 
 row has spread over many SSTables, each of them needs to be accessed (at 
 least steps 1. - 5.) to get the complete results for the query.
 
 All this in mind, if your node has any reasonable amount of reads, I'd say 
 that in practice key index files will be page cached by the OS very quickly 
 and thus normal read would end up being either one seek (for small rows 
 without the column index) or two (for wider rows). Of course, as Peter 
 already pointed out, the more columns you ask for, the more disk needs to 
 read. For a continuous set of columns the read should be linear, however.
 
 -Jouni



Re: SSTable Num

2013-02-22 Thread aaron morton
 Ok. So for 10 TB, I could have at least 4 SStables files each of 2.5 TB ?
You will have many sstables, in your case 32. 
Each bucket of files (files that are within 50% of the average size of files in 
a bucket) will contain 3 or less files. 

This article provides com back ground, but it's working correctly as you have 
described it 
http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/02/2013, at 6:39 AM, Kanwar Sangha kan...@mavenir.com wrote:

 No. 
 The default size tiered strategy compacts files what are roughly the same 
 size, and only when there are more than 4 (default) of them.
  
 Ok. So for 10 TB, I could have at least 4 SStables files each of 2.5 TB ?
  
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: 21 February 2013 11:01
 To: user@cassandra.apache.org
 Subject: Re: SSTable Num
  
 Hi – I have around 6TB of data on 1 node
 Unless you have SSD and 10GbE you probably have too much data on there. 
 Remember you need to run repair and that can take a long time with a lot of 
 data. Also you may need to replace a node one day and moving 6TB will take a 
 while.
  
  Or will the sstable compaction continue and eventually we will have 1 file ?
 No. 
 The default size tiered strategy compacts files what are roughly the same 
 size, and only when there are more than 4 (default) of them.
  
 Cheers
   
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
  
 @aaronmorton
 http://www.thelastpickle.com
  
 On 21/02/2013, at 3:47 AM, Kanwar Sangha kan...@mavenir.com wrote:
 
 
 Hi – I have around 6TB of data on 1 node and the cfstats show 32 sstables. 
 There is no compaction job running in the background. Is there a limit on the 
 size per sstable ? Or will the sstable compaction continue and eventually we 
 will have 1 file ?
  
 Thanks,
 Kanwar
  



Re: Heap is N.N full. Immediately on startup

2013-02-22 Thread aaron morton
To get a good idea of how GC is performing turn on the GC logging in 
cassandra-env.sh. 

After a full cms GC event, see how big the tenured heap is. If it's not 
reducing enough then GC will never get far enough ahead. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/02/2013, at 8:37 AM, Andras Szerdahelyi 
andras.szerdahe...@ignitionone.com wrote:

 Thank you- indeed my index interval is 64 with a CF of 300M rows + bloom 
 filter false positive chance was default.
 Raising the index interval to 512 didn't fix this  alone, so I guess I'll 
 have to set the bloom filter to some reasonable value and scrub.
 
 From: aaron morton aa...@thelastpickle.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Thursday 21 February 2013 17:58
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: Heap is N.N full. Immediately on startup
 
 My first guess would be the bloom filter and index sampling from lots-o-rows 
 
 Check the row count in cfstats
 Check the bloom filter size in cfstats. 
 
 Background on memory requirements 
 http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 20/02/2013, at 11:27 PM, Andras Szerdahelyi 
 andras.szerdahe...@ignitionone.com wrote:
 
 Hey list,
 
 Any ideas ( before I take a heap dump ) what might be consuming my 8GB JVM 
 heap at startup in Cassandra 1.1.6 besides
 row cache : not persisted and is at 0 keys when this warning is produced
 Memtables : no write traffic at startup, my app's column families are 
 durable_writes:false
 Pending tasks : no pending tasks, except for 928 compactions ( not sure 
 where those are coming from )
 I drew these conclusions from the StatusLogger output below: 
 
 INFO [ScheduledTasks:1] 2013-02-20 05:13:25,198 GCInspector.java (line 122) 
 GC for ConcurrentMarkSweep: 14959 ms for 2 collections, 7017934560 used; max 
 is 8375238656
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,198 StatusLogger.java (line 57) 
 Pool NameActive   Pending   Blocked
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,199 StatusLogger.java (line 72) 
 ReadStage 0 0 0
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,200 StatusLogger.java (line 72) 
 RequestResponseStage  0 0 0
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,200 StatusLogger.java (line 72) 
 ReadRepairStage   0 0 0
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,200 StatusLogger.java (line 72) 
 MutationStage 0-1 0
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,201 StatusLogger.java (line 72) 
 ReplicateOnWriteStage 0 0 0
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,201 StatusLogger.java (line 72) 
 GossipStage   0 0 0
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,201 StatusLogger.java (line 72) 
 AntiEntropyStage  0 0 0
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,201 StatusLogger.java (line 72) 
 MigrationStage0 0 0
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,201 StatusLogger.java (line 72) 
 StreamStage   0 0 0
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,202 StatusLogger.java (line 72) 
 MemtablePostFlusher   0 0 0
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,202 StatusLogger.java (line 72) 
 FlushWriter   0 0 0
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,202 StatusLogger.java (line 72) 
 MiscStage 0 0 0
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,202 StatusLogger.java (line 72) 
 commitlog_archiver0 0 0
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,203 StatusLogger.java (line 72) 
 InternalResponseStage 0 0 0
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,212 StatusLogger.java (line 77) 
 CompactionManager 0   928
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,212 StatusLogger.java (line 89) 
 MessagingServicen/a   0,0
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,212 StatusLogger.java (line 99) 
 Cache Type Size Capacity   
 KeysToSave Provider
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,212 StatusLogger.java (line 
 100) KeyCache 25   25
   all
  
  INFO [ScheduledTasks:1] 2013-02-20 05:13:25,213 StatusLogger.java (line 
 106) RowCache  00 

Re: operations progress on DBA operations?

2013-02-22 Thread Hiller, Dean
Finally found itŠnodetool compactionstats shows the percentage complete.

Dean

On 2/22/13 7:44 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

I am used to systems running a first phase calculating how much files it
will need to go through and then logging out the percent done or X files
out of total files done.  I ran this command and it is logging nothing

nodetool upgradesstables databus5 nreldata;

I have 130Gigs of data on my node and not all of it in that one column
family above.  How can I tell how far it is in it's process?  It has been
running for about 10 minutes already.  I don't see anything in the log
files either.

Thanks,
Dean



Re: Mutation dropped

2013-02-22 Thread aaron morton
If you are running repair, using QUORUM, and there are not dropped writes you 
should not be getting DigestMismatch during reads. 

If everything else looks good, but the request latency is higher than the CF 
latency I would check that client load is evenly distributed. Then start 
looking to see if the request throughput is at it's maximum for the cluster. 

Cheers
  
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/02/2013, at 8:15 PM, Wei Zhu wz1...@yahoo.com wrote:

 Thanks Aaron for the great information as always. I just checked cfhistograms 
 and only a handful of read latency are bigger than 100ms, but for 
 proxyhistograms there are 10 times more are greater than 100ms. We are using 
 QUORUM  for reading with RF=3, and I understand coordinator needs to get the 
 digest from other nodes and read repair on the miss match etc. But is it 
 normal to see the latency from proxyhistograms to go beyond 100ms? Is there 
 anyway to improve that? 
 We are tracking the metrics from Client side and we see the 95th percentile 
 response time averages at 40ms which is a bit high. Our 50th percentile was 
 great under 3ms. 
 
 Any suggestion is very much appreciated.
 
 Thanks.
 -Wei
 
 - Original Message -
 From: aaron morton aa...@thelastpickle.com
 To: Cassandra User user@cassandra.apache.org
 Sent: Thursday, February 21, 2013 9:20:49 AM
 Subject: Re: Mutation dropped
 
 What does rpc_timeout control? Only the reads/writes? 
 Yes. 
 
 like data stream,
 streaming_socket_timeout_in_ms in the yaml
 
 merkle tree request? 
 Either no time out or a number of days, cannot remember which right now. 
 
 What is the side effect if it's set to a really small number, say 20ms?
 You will probably get a lot more requests that fail with a TimedOutException. 
 
 rpc_timeout needs to be longer than the time it takes a node to process the 
 message, and the time it takes the coordinator to do it's thing. You can look 
 at cfhistograms and proxyhistograms to get a better idea of how long a 
 request takes in your system.  
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 21/02/2013, at 6:56 AM, Wei Zhu wz1...@yahoo.com wrote:
 
 What does rpc_timeout control? Only the reads/writes? How about other 
 inter-node communication, like data stream, merkle tree request?  What is 
 the reasonable value for roc_timeout? The default value of 10 seconds are 
 way too long. What is the side effect if it's set to a really small number, 
 say 20ms?
 
 Thanks.
 -Wei
 
 From: aaron morton aa...@thelastpickle.com
 To: user@cassandra.apache.org 
 Sent: Tuesday, February 19, 2013 7:32 PM
 Subject: Re: Mutation dropped
 
 Does the rpc_timeout not control the client timeout ?
 No it is how long a node will wait for a response from other nodes before 
 raising a TimedOutException if less than CL nodes have responded. 
 Set the client side socket timeout using your preferred client. 
 
 Is there any param which is configurable to control the replication timeout 
 between nodes ?
 There is no such thing.
 rpc_timeout is roughly like that, but it's not right to think about it that 
 way. 
 i.e. if a message to a replica times out and CL nodes have already responded 
 then we are happy to call the request complete. 
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 19/02/2013, at 1:48 AM, Kanwar Sangha kan...@mavenir.com wrote:
 
 Thanks Aaron.
 
 Does the rpc_timeout not control the client timeout ? Is there any param 
 which is configurable to control the replication timeout between nodes ? Or 
 the same param is used to control that since the other node is also like a 
 client ?
 
 
 
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: 17 February 2013 11:26
 To: user@cassandra.apache.org
 Subject: Re: Mutation dropped
 
 You are hitting the maximum throughput on the cluster. 
 
 The messages are dropped because the node fails to start processing them 
 before rpc_timeout. 
 
 However the request is still a success because the client requested CL was 
 achieved. 
 
 Testing with RF 2 and CL 1 really just tests the disks on one local 
 machine. Both nodes replicate each row, and writes are sent to each 
 replica, so the only thing the client is waiting on is the local node to 
 write to it's commit log. 
 
 Testing with (and running in prod) RF3 and CL QUROUM is a more real world 
 scenario. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 15/02/2013, at 9:42 AM, Kanwar Sangha kan...@mavenir.com wrote:
 
 
 Hi – Is there a parameter which can be tuned to prevent the mutations from 
 being dropped ? Is this logic correct ?
 
 Node A and B with RF=2, CL =1. Load balanced between the two.
 

is there a way to drain node(and prevent reads) and upgrade sstables offline?

2013-02-22 Thread Hiller, Dean
We would like to take a node out of the ring and upgradesstables while it is 
not doing any writes nor reads with the ring.  Is this possible?

I am thinking from the documentation

 1.  nodetool drain
 2.  ANYTHING to stop reads here
 3.  Modify cassandra.yaml with compaction_throughput_mb_per_sec = 0 and 
multithreaded_compaction = true temporarily
 4.  Restart cassandra and run nodetool upgradesstables keyspace CF
 5.  Modify cassandra.yaml to revert changes
 6.  Restart cassandra to join the cluster again.

Is this how it should be done?

Thanks,
Dean


Re: is there a way to drain node(and prevent reads) and upgrade sstables offline?

2013-02-22 Thread Michael Kjellman
Couldn't you just disable thrift and leave gossip active?

On 2/22/13 9:01 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

We would like to take a node out of the ring and upgradesstables while it
is not doing any writes nor reads with the ring.  Is this possible?

I am thinking from the documentation

 1.  nodetool drain
 2.  ANYTHING to stop reads here
 3.  Modify cassandra.yaml with compaction_throughput_mb_per_sec = 0 and
multithreaded_compaction = true temporarily
 4.  Restart cassandra and run nodetool upgradesstables keyspace CF
 5.  Modify cassandra.yaml to revert changes
 6.  Restart cassandra to join the cluster again.

Is this how it should be done?

Thanks,
Dean


Copy, by Barracuda, helps you store, protect, and share all your amazing
things. Start today: www.copy.com.


Re: Adding new nodes in a cluster with virtual nodes

2013-02-22 Thread aaron morton
 So, it looks that the repair is required if we want to add new nodes in our 
 platform, but I don't understand why.
Bootstrapping should take care of it. But new seed nodes do not bootstrap. 
Check the logs on the nodes you added to see what messages have bootstrap in 
them. 

Anytime you are worried about things like this throw in a nodetool repair. If 
you are using QUOURM for read and writes you will still be getting consistent 
data, so long as you have only added one node. Or one node every RF'th nodes. 

Cheers


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/02/2013, at 9:55 PM, Jean-Armel Luce jaluc...@gmail.com wrote:

 Hi Aaron,
 
 Thanks for your answer.
 
 
 I apologize, I did a mistake in my 1st mail. The cluster was only 12 nodes 
 instead of 16 (it is a test cluster).
 There are 2 datacenters b1 and s1.
 
 Here is the result of nodetool status after adding a new node in the 1st 
 datacenter (dc s1):
 root@node007:~# nodetool status
 Datacenter: b1
 ==
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns (effective)  Host ID
Rack
 UN  10.234.72.135 10.71 GB   256 44.6% 
 2fc583b2-822f-4347-9fab-5e9d10d548c9  c01
 UN  10.234.72.134 16.74 GB   256 63.7% 
 f209a8c5-7e1b-45b5-aa80-ed679bbbdbd1  e01
 UN  10.234.72.139 17.09 GB   256 62.0% 
 95661392-ccd8-4592-a76f-1c99f7cdf23a  e07
 UN  10.234.72.138 10.96 GB   256 42.9% 
 0d6725f0-1357-423d-85c1-153fb94257d5  e03
 UN  10.234.72.137 11.09 GB   256 45.7% 
 492190d7-3055-4167-8699-9c6560e28164  e03
 UN  10.234.72.136 11.91 GB   256 41.1% 
 3872f26c-5f2d-4fb3-9f5c-08b4c7762466  c01
 Datacenter: s1
 ==
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns (effective)  Host ID
Rack
 UN  10.98.255.139 16.94 GB   256 43.8% 
 3523e80c-8468-4502-b334-79eabc3357f0  g10
 UN  10.98.255.138 12.62 GB   256 42.4% 
 a2bcddf1-393e-453b-9d4f-9f7111c01d7f  i02
 UN  10.98.255.137 10.59 GB   256 38.4% 
 f851b6ee-f1e4-431b-8beb-e7b173a77342  i02
 UN  10.98.255.136 11.89 GB   256 42.9% 
 36fe902f-3fb1-4b6d-9e2c-71e601fa0f2e  a09
 UN  10.98.255.135 10.29 GB   256 40.4% 
 e2d020a5-97a9-48d4-870c-d10b59858763  a09
 UN  10.98.255.134 16.19 GB   256 52.3% 
 73e3376a-5a9f-4b8a-a119-c87ae1fafdcb  h06
 UN  10.98.255.140 127.84 KB  256 39.9% 
 3d5c33e6-35d0-40a0-b60d-2696fd5cbf72  g10
 
 We can see that the new node (10.98.255.140) contains only 127,84KB.
 We saw also that there was no network traffic between the nodes.
 
 Then we added a new node in the 2nd datacenter (dc b1)
 
 
 
 root@node007:~# nodetool status
 Datacenter: b1
 ==
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns (effective)  Host ID
Rack
 UN  10.234.72.135 12.95 GB   256 42.0% 
 2fc583b2-822f-4347-9fab-5e9d10d548c9  c01
 UN  10.234.72.134 20.11 GB   256 53.1% 
 f209a8c5-7e1b-45b5-aa80-ed679bbbdbd1  e01
 UN  10.234.72.140 122.25 KB  256 41.9% 
 501ea498-8fed-4cc8-a23a-c99492bc4f26  e07
 UN  10.234.72.139 20.46 GB   256 40.2% 
 95661392-ccd8-4592-a76f-1c99f7cdf23a  e07
 UN  10.234.72.138 13.21 GB   256 40.9% 
 0d6725f0-1357-423d-85c1-153fb94257d5  e03
 UN  10.234.72.137 13.34 GB   256 42.9% 
 492190d7-3055-4167-8699-9c6560e28164  e03
 UN  10.234.72.136 14.16 GB   256 39.0% 
 3872f26c-5f2d-4fb3-9f5c-08b4c7762466  c01
 Datacenter: s1
 ==
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns (effective)  Host ID
Rack
 UN  10.98.255.139 19.19 GB   256 43.8% 
 3523e80c-8468-4502-b334-79eabc3357f0  g10
 UN  10.98.255.138 14.9 GB256 42.4% 
 a2bcddf1-393e-453b-9d4f-9f7111c01d7f  i02
 UN  10.98.255.137 12.49 GB   256 38.4% 
 f851b6ee-f1e4-431b-8beb-e7b173a77342  i02
 UN  10.98.255.136 14.13 GB   256 42.9% 
 36fe902f-3fb1-4b6d-9e2c-71e601fa0f2e  a09
 UN  10.98.255.135 12.16 GB   256 40.4% 
 e2d020a5-97a9-48d4-870c-d10b59858763  a09
 UN  10.98.255.134 18.85 GB   256 52.3% 
 73e3376a-5a9f-4b8a-a119-c87ae1fafdcb  h06
 UN  10.98.255.140 2.24 GB256 39.9% 
 3d5c33e6-35d0-40a0-b60d-2696fd5cbf72  g10
 
 
 We can see that the 2nd new node (10.234.72.140) contains only 122,25KB.
 The new node in the 1st datacenter contains now 2,24 GB because we 

Re: operations progress on DBA operations?

2013-02-22 Thread aaron morton
nodetool compactionstats 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/02/2013, at 3:44 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 I am used to systems running a first phase calculating how much files it will 
 need to go through and then logging out the percent done or X files out of 
 total files done.  I ran this command and it is logging nothing
 
 nodetool upgradesstables databus5 nreldata;
 
 I have 130Gigs of data on my node and not all of it in that one column family 
 above.  How can I tell how far it is in it's process?  It has been running 
 for about 10 minutes already.  I don't see anything in the log files either.
 
 Thanks,
 Dean



Re: operations progress on DBA operations?

2013-02-22 Thread Michael Kjellman
Just to add though- compactionstats on an upgradesstables will only show the 
currently running sstable being upgraded. Overall progress on a upgradesstables 
isn't exposed anywhere yet but you can figure out how much there is to go thru 
the log lines.

From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Friday, February 22, 2013 9:09 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: operations progress on DBA operations?

nodetool compactionstats

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/02/2013, at 3:44 AM, Hiller, Dean 
dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:

I am used to systems running a first phase calculating how much files it will 
need to go through and then logging out the percent done or X files out of 
total files done.  I ran this command and it is logging nothing

nodetool upgradesstables databus5 nreldata;

I have 130Gigs of data on my node and not all of it in that one column family 
above.  How can I tell how far it is in it's process?  It has been running for 
about 10 minutes already.  I don't see anything in the log files either.

Thanks,
Dean


Copy, by Barracuda, helps you store, protect, and share all your amazing
things. Start today: www.copy.com.


Re: ReverseIndexExample

2013-02-22 Thread aaron morton
We are trying to answer client library specific questions on the client-dev 
list, see the link at the bottom here http://cassandra.apache.org/

If you can ask a more specific question I'll answer it there. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/02/2013, at 3:44 AM, Everton Lima peitin.inu...@gmail.com wrote:

 Hello, 
 
 Anyone have already used ReverseIndexQuery from Astyanay. I was tring to 
 understand it, but I execute the example of Astyanax Site and can not 
 understood.
 Ssomeone can help me please?
 
 Thanks;
 
 -- 
 Everton Lima Aleixo
 Mestrando em Ciência da Computação pela UFG
 Programador no LUPA
 



Re: disabling bloomfilter not working? or did I do this wrong?

2013-02-22 Thread aaron morton
 Bloom Filter Space Used: 2318392048
Just to be sane do a quick check of the -Filter.db files on disk for this CF. 
If they are very small try a restart on the node. 

 Number of Keys (estimate): 1249133696
Hey a billion rows on a node, what an age we live in :)

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/02/2013, at 4:35 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 So in the cli, I ran
 
 update column family nreldata with bloom_filter_fp_chance=1.0;
 
 Then I ran
 
 nodetool upgradesstables databus5 nreldata;
 
 But my bloom filter size is still around 2gig(and I want to free up this 
 heap) According to nodetool cfstats command…
 
 Column Family: nreldata
 SSTable count: 10
 Space used (live): 96841497731
 Space used (total): 96841497731
 Number of Keys (estimate): 1249133696
 Memtable Columns Count: 7066
 Memtable Data Size: 4286174
 Memtable Switch Count: 924
 Read Count: 19087150
 Read Latency: 0.595 ms.
 Write Count: 21281994
 Write Latency: 0.013 ms.
 Pending Tasks: 0
 Bloom Filter False Postives: 974393
 Bloom Filter False Ratio: 0.8
 Bloom Filter Space Used: 2318392048
 Compacted row minimum size: 73
 Compacted row maximum size: 446
 Compacted row mean size: 143
 
 



Re: How wide rows are structured in CQL3

2013-02-22 Thread aaron morton
 Does this effectively create the same storage structure?
Yes. 

 SELECT Value FROM X WHERE RowKey = 'RowKey1' AND TimeStamp BETWEEN 100 AND 
 1000;
select value from X where RoWKey  = 'foo' and timestamp = 100 and timestamp = 
1000;
 
 I also don't understand some of the things like WITH COMPACT STORAGE and 
 CLUSTERING.
Some info here, does not cover compact storage 
http://thelastpickle.com/2013/01/11/primary-keys-in-cql/

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/02/2013, at 4:36 AM, Boris Solovyov boris.solov...@gmail.com wrote:

 Hi,
 
 My impression from reading docs is that in old versions of Cassandra, you 
 could create very wide rows, say with timestamps as column names for time 
 series data, and read an ordered slice of the row.  So,
 
 RowKeyColumns
 ===  ==
 RowKey1  1:val1 2:val2 3:val3  N:valN
 
 With this data I think you could say get RowKey1, cols 100 to 1000 and get 
 a slice of values. (I have no experience with this, just from reading about 
 it.)
 
 In CQL3 it looks like this is kind of normalized so I would have
 
 CREATE TABLE X (
 RowKey text,
 TimeStamp int,
 Value text,
 PRIMARY KEY(RowKey, TimeStamp)
 );
 
 Does this effectively create the same storage structure?
 
 Now, in CQL3, it looks like I should access it like this,
 
 SELECT Value FROM X WHERE RowKey = 'RowKey1' AND TimeStamp BETWEEN 100 AND 
 1000;
 
 Does this do the same thing?
 
 I also don't understand some of the things like WITH COMPACT STORAGE and 
 CLUSTERING. I'm having a hard time figuring out how this maps to the 
 underlying storage. It is a little more abstract. I feel like the new CQL 
 stuff isn't really explained clearly to me -- is it just a query language 
 that accesses the same underlying structures, or is Cassandra's storage and 
 access model fundamentally different now?



Re: Q on schema migratins

2013-02-22 Thread aaron morton
  dropped this secondary index after while.
I assume you use UPDATE COLUMN FAMILY in the CLI. 

 How can I avoid this secondary index building on node join?
Check the schema using show schema in the cli.

Check that all nodes in the cluster have the same schema, using describe 
cluster in the cli.
If they are in disagreement see this 
http://wiki.apache.org/cassandra/FAQ#schema_disagreement

Cheers

 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/02/2013, at 5:17 AM, Igor i...@4friends.od.ua wrote:

 Hello
 
 Cassandra 1.0.7
 
 Some time ago we used secondary index on one of  CF. Due to performance 
 reasons we dropped this secondary index after while. But now, each time I add 
 and bootstrap new node I see how cassandra again build this secondary index 
 on this node (which takes huge time), and when  index is built it is not used 
 anymore, so I can safely delete files from disk.
 
 How can I avoid this secondary index building on node join?
 
 Thanks for your answers!



Re: is there a way to drain node(and prevent reads) and upgrade sstables offline?

2013-02-22 Thread aaron morton
To stop all writes and reads disable thrift and gossip via nodetool. 
This will not stop any in progress repair sessions nor disconnect fat clients 
if you have them.

There are also cmd line args cassandra.start_rpc and cassandra.join_ring whihc 
do the same thing. 

You can also change the compaction throughput using nodetool 

 multithreaded_compaction = true temporarily
Unless you have SSD leave this guy alone. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/02/2013, at 6:04 AM, Michael Kjellman mkjell...@barracuda.com wrote:

 Couldn't you just disable thrift and leave gossip active?
 
 On 2/22/13 9:01 AM, Hiller, Dean dean.hil...@nrel.gov wrote:
 
 We would like to take a node out of the ring and upgradesstables while it
 is not doing any writes nor reads with the ring.  Is this possible?
 
 I am thinking from the documentation
 
 1.  nodetool drain
 2.  ANYTHING to stop reads here
 3.  Modify cassandra.yaml with compaction_throughput_mb_per_sec = 0 and
 multithreaded_compaction = true temporarily
 4.  Restart cassandra and run nodetool upgradesstables keyspace CF
 5.  Modify cassandra.yaml to revert changes
 6.  Restart cassandra to join the cluster again.
 
 Is this how it should be done?
 
 Thanks,
 Dean
 
 
 Copy, by Barracuda, helps you store, protect, and share all your amazing
 things. Start today: www.copy.com.



Re: Size Tiered - Leveled Compaction

2013-02-22 Thread Mike

Hello,

Still doing research before we potentially move one of our column 
families from Size Tiered-Leveled compaction this weekend.  I was doing 
some research around some of the bugs that were filed against leveled 
compaction in Cassandra and I found this:


https://issues.apache.org/jira/browse/CASSANDRA-4644

The bug mentions:

You need to run the offline scrub (bin/sstablescrub) to fix the sstable 
overlapping problem from early 1.1 releases. (Running with -m to just 
check for overlaps between sstables should be fine, since you already 
scrubbed online which will catch out-of-order within an sstable.)


We recently upgraded from 1.1.2 to 1.1.9.

Does anyone know if an offline scrub is recommended to be performed when 
switching from STCS-LCS after upgrading from 1.1.2?


Any insight would be appreciated,
Thanks,
-Mike

On 2/17/2013 8:57 PM, Wei Zhu wrote:

We doubled the SStable size to 10M. It still generates a lot of SSTable and we 
don't see much difference of the read latency.  We are able to finish the 
compactions after repair within serveral hours. We will increase the SSTable 
size again if we feel the number of SSTable hurts the performance.

- Original Message -
From: Mike mthero...@yahoo.com
To: user@cassandra.apache.org
Sent: Sunday, February 17, 2013 4:50:40 AM
Subject: Re: Size Tiered - Leveled Compaction


Hello Wei,

First thanks for this response.

Out of curiosity, what SSTable size did you choose for your usecase, and what 
made you decide on that number?

Thanks,
-Mike

On 2/14/2013 3:51 PM, Wei Zhu wrote:




I haven't tried to switch compaction strategy. We started with LCS.


For us, after massive data imports (5000 w/seconds for 6 days), the first 
repair is painful since there is quite some data inconsistency. For 150G nodes, 
repair brought in about 30 G and created thousands of pending compactions. It 
took almost a day to clear those. Just be prepared LCS is really slow in 1.1.X. 
System performance degrades during that time since reads could go to more 
SSTable, we see 20 SSTable lookup for one read.. (We tried everything we can 
and couldn't speed it up. I think it's single threaded and it's not 
recommended to turn on multithread compaction. We even tried that, it didn't 
help )There is parallel LCS in 1.2 which is supposed to alleviate the pain. 
Haven't upgraded yet, hope it works:)


http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2





Since our cluster is not write intensive, only 100 w/seconds. I don't see any 
pending compactions during regular operation.


One thing worth mentioning is the size of the SSTable, default is 5M which is 
kind of small for 200G (all in one CF) data set, and we are on SSD. It more 
than 150K files in one directory. (200G/5M = 40K SSTable and each SSTable 
creates 4 files on disk) You might want to watch that and decide the SSTable 
size.


By the way, there is no concept of Major compaction for LCS. Just for fun, you 
can look at a file called $CFName.json in your data directory and it tells you 
the SSTable distribution among different levels.


-Wei





From: Charles Brophy cbro...@zulily.com
To: user@cassandra.apache.org
Sent: Thursday, February 14, 2013 8:29 AM
Subject: Re: Size Tiered - Leveled Compaction


I second these questions: we've been looking into changing some of our CFs to 
use leveled compaction as well. If anybody here has the wisdom to answer them 
it would be of wonderful help.


Thanks
Charles


On Wed, Feb 13, 2013 at 7:50 AM, Mike  mthero...@yahoo.com  wrote:


Hello,

I'm investigating the transition of some of our column families from Size Tiered 
- Leveled Compaction. I believe we have some high-read-load column families 
that would benefit tremendously.

I've stood up a test DB Node to investigate the transition. I successfully 
alter the column family, and I immediately noticed a large number (1000+) 
pending compaction tasks become available, but no compaction get executed.

I tried running nodetool sstableupgrade on the column family, and the 
compaction tasks don't move.

I also notice no changes to the size and distribution of the existing SSTables.

I then run a major compaction on the column family. All pending compaction 
tasks get run, and the SSTables have a distribution that I would expect from 
LeveledCompaction (lots and lots of 10MB files).

Couple of questions:

1) Is a major compaction required to transition from size-tiered to leveled 
compaction?
2) Are major compactions as much of a concern for LeveledCompaction as their 
are for Size Tiered?

All the documentation I found concerning transitioning from Size Tiered to 
Level compaction discuss the alter table cql command, but I haven't found too 
much on what else needs to be done after the schema change.

I did these tests with Cassandra 1.1.9.

Thanks,
-Mike









Re: disabling bloomfilter not working? or did I do this wrong?

2013-02-22 Thread Hiller, Dean
Thanks, but I found out it is still running.  It looks like I have about a 5 
hour wait left for my upgradesstables(waited 4 hours already).  I will check 
the bloomfilter after that.

Out of curiosity, if I had much wider rows (ie.  900k) per row, will 
compaction run faster(e…upgradesstables) at all or would it basically run 
at the same speed.

I guess what I am wondering is 9 hours a normal compaction time for 130gb of 
data?

Thanks,
Dean

From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Friday, February 22, 2013 10:29 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: disabling bloomfilter not working? or did I do this wrong?

Bloom Filter Space Used: 2318392048
Just to be sane do a quick check of the -Filter.db files on disk for this CF.
If they are very small try a restart on the node.

Number of Keys (estimate): 1249133696
Hey a billion rows on a node, what an age we live in :)

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/02/2013, at 4:35 AM, Hiller, Dean 
dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:

So in the cli, I ran

update column family nreldata with bloom_filter_fp_chance=1.0;

Then I ran

nodetool upgradesstables databus5 nreldata;

But my bloom filter size is still around 2gig(and I want to free up this 
heap) According to nodetool cfstats command…

Column Family: nreldata
SSTable count: 10
Space used (live): 96841497731
Space used (total): 96841497731
Number of Keys (estimate): 1249133696
Memtable Columns Count: 7066
Memtable Data Size: 4286174
Memtable Switch Count: 924
Read Count: 19087150
Read Latency: 0.595 ms.
Write Count: 21281994
Write Latency: 0.013 ms.
Pending Tasks: 0
Bloom Filter False Postives: 974393
Bloom Filter False Ratio: 0.8
Bloom Filter Space Used: 2318392048
Compacted row minimum size: 73
Compacted row maximum size: 446
Compacted row mean size: 143





found bottleneck but can we do these steps?

2013-02-22 Thread Hiller, Dean
So, it turns out we don't have enough I/o going on for our upgradesstables but 
it is really hitting the upper bounds of memory(8G) and our cpu is pretty low 
as well.

At any rate, we are trying to remove a 2 gig bloomfilter on a columnfamily.  
Can we do the following

 1.  Disable thrift/gossip (per previous emails)
 2.  Restart the node?  (any way to restart it without reading in that 
bloomfilter to lesson the memory……should I temporarily turn up the node without 
the key cache maybe)
 3.  Run nodetool upgradesstables databus5 nreldata;

1.When I restart the node, will gossip/thrift stay off ??? Or do I change the 
seeds, change 9160 to  and I don't see where I can change 7199 to 
something?  (how to do this safely).
2.H, is there any way to run upgradesstables when cassandra is not running 
AND crank up the memory of nodetool to 8G or does nodetool always  just tell 
cassandra to do it

  I feel like I have a chicken and egg problem here.  I want to clean up this 
bloomfilter which requires upgradesstables(from what I read), but I need the 
bloomfilter to not be there so I am not bottlenecked by the memory.  At this 
rate, I will have to do each node each day for 6 days before I can recover (and 
I would prefer to speed it up just a little).

Thanks,
Dean