nodetool repair with vnodes

2013-02-17 Thread Marco Matarazzo
Greetings. 

I'm trying to run nodetool repair on a Cassandra 1.2.1 cluster of 3 nodes 
with 256 vnodes each.

On a pre-1.2 cluster I used to launch a nodetool repair on every node every 
24hrs. Now I'm getting a differenf behavior, and I'm sure I'm missing something.

What I see on the command line is: 

[2013-02-17 10:20:15,186] Starting repair command #1, repairing 768 ranges for 
keyspace goh_master
[2013-02-17 10:48:13,401] Repair session 3d140e10-78e3-11e2-af53-d344dbdd69f5 
for range (6556914650761469337,6580337080281832001] finished
(…repeat the last line 767 times)

…so it seems to me that it is running on all vnodes ranges.

Also, whatever the node which I launch the command on is, only one node log is 
moving and is always the same node. 

So, to me, it's like the nodetool repair command is running always on the 
same single node and repairing everything.

I'm sure I'm making some mistakes, and I just can't find any clue of what's 
wrong with my nodetool usage on the documentation (if anything is wrong, btw). 
Is there anything I'm missing ?

--
Marco Matarazzo




Re: Size Tiered - Leveled Compaction

2013-02-17 Thread Mike

Hello Wei,

First thanks for this response.

Out of curiosity, what SSTable size did you choose for your usecase, and 
what made you decide on that number?


Thanks,
-Mike

On 2/14/2013 3:51 PM, Wei Zhu wrote:

I haven't tried to switch compaction strategy. We started with LCS.

For us, after massive data imports (5000 w/seconds for 6 days), the 
first repair is painful since there is quite some data inconsistency. 
For 150G nodes, repair brought in about 30 G and created thousands of 
pending compactions. It took almost a day to clear those. Just be 
prepared LCS is really slow in 1.1.X. System performance degrades 
during that time since reads could go to more SSTable, we see 20 
SSTable lookup for one read.. (We tried everything we can and couldn't 
speed it up. I think it's single threaded and it's not recommended 
to turn on multithread compaction. We even tried that, it didn't help 
)There is parallel LCS in 1.2 which is supposed to alleviate the pain. 
Haven't upgraded yet, hope it works:)


http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2


Since our cluster is not write intensive, only 100 w/seconds. I don't 
see any pending compactions during regular operation.


One thing worth mentioning is the size of the SSTable, default is 5M 
which is kind of small for 200G (all in one CF) data set, and we are 
on SSD.  It more than  150K files in one directory. (200G/5M = 40K 
SSTable and each SSTable creates 4 files on disk)  You might want to 
watch that and decide the SSTable size.


By the way, there is no concept of Major compaction for LCS. Just for 
fun, you can look at a file called $CFName.json in your data directory 
and it tells you the SSTable distribution among different levels.


-Wei


*From:* Charles Brophy cbro...@zulily.com
*To:* user@cassandra.apache.org
*Sent:* Thursday, February 14, 2013 8:29 AM
*Subject:* Re: Size Tiered - Leveled Compaction

I second these questions: we've been looking into changing some of our 
CFs to use leveled compaction as well. If anybody here has the wisdom 
to answer them it would be of wonderful help.


Thanks
Charles

On Wed, Feb 13, 2013 at 7:50 AM, Mike mthero...@yahoo.com 
mailto:mthero...@yahoo.com wrote:


Hello,

I'm investigating the transition of some of our column families
from Size Tiered - Leveled Compaction.  I believe we have some
high-read-load column families that would benefit tremendously.

I've stood up a test DB Node to investigate the transition.  I
successfully alter the column family, and I immediately noticed a
large number (1000+) pending compaction tasks become available,
but no compaction get executed.

I tried running nodetool sstableupgrade on the column family,
and the compaction tasks don't move.

I also notice no changes to the size and distribution of the
existing SSTables.

I then run a major compaction on the column family.  All pending
compaction tasks get run, and the SSTables have a distribution
that I would expect from LeveledCompaction (lots and lots of 10MB
files).

Couple of questions:

1) Is a major compaction required to transition from size-tiered
to leveled compaction?
2) Are major compactions as much of a concern for
LeveledCompaction as their are for Size Tiered?

All the documentation I found concerning transitioning from Size
Tiered to Level compaction discuss the alter table cql command,
but I haven't found too much on what else needs to be done after
the schema change.

I did these tests with Cassandra 1.1.9.

Thanks,
-Mike








Re: virtual nodes + map reduce = too many mappers

2013-02-17 Thread cem
Thanks Eric for the appreciation :)

Default split size is 64K rows. ColumnFamilyInputFormat first collects all
tokens and create a split for each. if you have 256 vnode for each node
that it creates 256 splits even if you have no data at all. current split
size will only work if you have a vnode that has more than 64K rows.

Possible solution that came to my mind: We can simply
extend ColumnFamilySplit by adding a list of token ranges instead of one.
Than no need create mapper for each token. Each  mapper can
do multiple range queries.  But I don't know how to combine the range
queries because in the typical range query  you need to set start and end
token. But in the virtual nodes I realized that tokens are not continuous.

Best Regards,
Cem

On Sun, Feb 17, 2013 at 2:47 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 Split size does not have to equal block size.


 http://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html

 An abstract InputFormat that returns CombineFileSplit's in
 InputFormat.getSplits(JobConf, int) method. Splits are constructed
 from the files under the input paths. A split cannot have files from
 different pools. Each split returned may contain blocks from different
 files. If a maxSplitSize is specified, then blocks on the same node
 are combined to form a single split. Blocks that are left over are
 then combined with other blocks in the same rack. If maxSplitSize is
 not specified, then blocks from the same rack are combined in a single
 split; no attempt is made to create node-local splits. If the
 maxSplitSize is equal to the block size, then this class is similar to
 the default spliting behaviour in Hadoop: each block is a locally
 processed split. Subclasses implement
 InputFormat.getRecordReader(InputSplit, JobConf, Reporter) to
 construct RecordReader's for CombineFileSplit's.

 Hive offers a CombinedHiveInputFormat

 https://issues.apache.org/jira/browse/HIVE-74

 Essentially Combined input formats rock hard. If you have a directory
 with say 2000 files, you do not want 2000 splits, and then the
 overhead of starting stopping 2000 mappers.

 If you enable CombineInputFormat you can tune mapred.split.size and
 the number of mappers is based (mostly) on the input size. This gives
 jobs that would create too many map tasks way more throughput, and
 stops them from monopolizing the map slots on the cluster.

 It would seem like all the extra splits from the vnode change could be
 combined back together.

 On Sat, Feb 16, 2013 at 8:21 PM, Jonathan Ellis jbel...@gmail.com wrote:
  Wouldn't you have more than 256 splits anyway, given a normal amount of
 data?
 
  (Default split size is 64k rows.)
 
  On Fri, Feb 15, 2013 at 7:01 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
  Seems like the hadoop Input format should combine the splits that are
  on the same node into the same map task, like Hadoop's
  CombinedInputFormat can. I am not sure who recommends vnodes as the
  default, because this is now the second problem (that I know of) of
  this class where vnodes has extra overhead,
  https://issues.apache.org/jira/browse/CASSANDRA-5161
 
  This seems to be the standard operating practice in c* now, enable
  things in the default configuration like new partitioners and newer
  features like vnodes, even though they are not heavily tested in the
  wild or well understood, then deal with fallout.
 
 
  On Fri, Feb 15, 2013 at 11:52 AM, cem cayiro...@gmail.com wrote:
  Hi All,
 
  I have just started to use virtual nodes. I set the number of nodes to
 256
  as recommended.
 
  The problem that I have is when I run a mapreduce job it creates node
 * 256
  mappers. It creates node * 256 splits. this effects the performance
 since
  the range queries have a lot of overhead.
 
  Any suggestion to improve the performance? It seems like I need to
 lower the
  number of virtual nodes.
 
  Best Regards,
  Cem
 
 
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder, http://www.datastax.com
  @spyced



Re: NPE in running ClientOnlyExample

2013-02-17 Thread Edward Capriolo
This is a bad example to follow. This is the internal client the
Cassandra nodes use to talk to each other (fat client) usually you do
not use this unless you want to write some embedded code on the
Cassandra server.

Typically clients use thrift/native transport. But you are likely
getting the error you are seeing because the keyspace or column family
is not created yet.

On Sat, Feb 16, 2013 at 11:41 PM, Jain Rahul ja...@ivycomptech.com wrote:
 Hi All,



 I am newbie to Cassandra and trying to run an example program
 “ClientOnlyExample”  taken from
 https://raw.github.com/apache/cassandra/cassandra-1.2/examples/client_only/src/ClientOnlyExample.java.
 But while executing  the program it gives me a null pointer exception. Can
 you guys please help me out what I am missing.



 I am using Cassandra 1.2.1 version. I have pasted the logs at
 http://pastebin.com/pmADWCYe



 Exception in thread main java.lang.NullPointerException

   at org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:71)

   at org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:66)

   at org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:61)

   at org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:56)

   at org.apache.cassandra.db.RowMutation.add(RowMutation.java:183)

   at org.apache.cassandra.db.RowMutation.add(RowMutation.java:204)

   at ClientOnlyExample.testWriting(ClientOnlyExample.java:78)

   at ClientOnlyExample.main(ClientOnlyExample.java:135)



 Regards,

 Rahul

 This email and any attachments are confidential, and may be legally
 privileged and protected by copyright. If you are not the intended recipient
 dissemination or copying of this email is prohibited. If you have received
 this in error, please notify the sender by replying by email and then delete
 the email completely from your system. Any views or opinions are solely
 those of the sender. This communication is not intended to form a binding
 contract unless expressly indicated to the contrary and properly authorised.
 Any actions taken on the basis of this email are at the recipient's own
 risk.


Re: Deleting old items during compaction (WAS: Deleting old items)

2013-02-17 Thread aaron morton
That's what the TTL does. 

Manually delete all the older data now, then start using TTL. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/02/2013, at 11:08 PM, Ilya Grebnov i...@metricshub.com wrote:

 Hi,
  
 We looking for solution for same problem. We have a wide column family with 
 counters and we want to delete old data like 1 months old. One of potential 
 ideas was to implement hook in compaction code and drop column which we don’t 
 need. Is this a viable option?
  
 Thanks,
 Ilya
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Tuesday, February 12, 2013 9:01 AM
 To: user@cassandra.apache.org
 Subject: Re: Deleting old items
  
 So is it possible to delete all the data inserted in some CF between 2 dates 
 or data older than 1 month ?
 No. 
  
 You need to issue row level deletes. If you don't know the row key you'll 
 need to do range scans to locate them. 
  
 If you are deleting parts of wide rows consider reducing the 
 min_compaction_level_threshold on the CF to 2
  
 Cheers
  
  
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
  
 @aaronmorton
 http://www.thelastpickle.com
  
 On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:
 
 
 Hi,
  
 I would like to know if there is a way to delete old/unused data easily ?
  
 I know about TTL but there are 2 limitations of TTL:
  
 - AFAIK, there is no TTL on counter columns
 - TTL need to be defined at write time, so it's too late for data already 
 inserted.
  
 I also could use a standard delete but it seems inappropriate for such a 
 massive.
  
 In some cases, I don't know the row key and would like to delete all the rows 
 starting by, let's say, 1050#... 
  
 Even better, I understood that columns are always inserted in C* with (name, 
 value, timestamp). So is it possible to delete all the data inserted in some 
 CF between 2 dates or data older than 1 month ?
  
 Alain
  



Re: Mutation dropped

2013-02-17 Thread aaron morton
You are hitting the maximum throughput on the cluster. 

The messages are dropped because the node fails to start processing them before 
rpc_timeout. 

However the request is still a success because the client requested CL was 
achieved. 

Testing with RF 2 and CL 1 really just tests the disks on one local machine. 
Both nodes replicate each row, and writes are sent to each replica, so the only 
thing the client is waiting on is the local node to write to it's commit log. 

Testing with (and running in prod) RF3 and CL QUROUM is a more real world 
scenario. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 15/02/2013, at 9:42 AM, Kanwar Sangha kan...@mavenir.com wrote:

 Hi – Is there a parameter which can be tuned to prevent the mutations from 
 being dropped ? Is this logic correct ?
  
 Node A and B with RF=2, CL =1. Load balanced between the two.
  
 --  Address   Load   Tokens  Owns (effective)  Host ID
Rack
 UN  10.x.x.x   746.78 GB  256 100.0%
 dbc9e539-f735-4b0b-8067-b97a85522a1a  rack1
 UN  10.x.x.x   880.77 GB  256 100.0%
 95d59054-be99-455f-90d1-f43981d3d778  rack1
  
 Once we hit a very high TPS (around 50k/sec of inserts), the nodes start 
 falling behind and we see the mutation dropped messages. But there are no 
 failures on the client. Does that mean other node is not able to persist the 
 replicated data ? Is there some timeout associated with replicated data 
 persistence ?
  
 Thanks,
 Kanwar
  
  
  
  
  
  
  
 From: Kanwar Sangha [mailto:kan...@mavenir.com] 
 Sent: 14 February 2013 09:08
 To: user@cassandra.apache.org
 Subject: Mutation dropped
  
 Hi – I am doing a load test using YCSB across 2 nodes in a cluster and seeing 
 a lot of mutation dropped messages.  I understand that this is due to the 
 replica not being written to the
 other node ? RF = 2, CL =1.
  
 From the wiki -
 For MUTATION messages this means that the mutation was not applied to all 
 replicas it was sent to. The inconsistency will be repaired by Read Repair or 
 Anti Entropy Repair
  
 Thanks,
 Kanwar
  



Re: [nodetool] repair with vNodes

2013-02-17 Thread aaron morton
I'm a bit late, but for reference. 

Repair runs in two stages, first differences are detected. You an monitor the 
validation compaction with nodetool compactionstats. 

Then the differences are streamed between the nodes, you can monitor that with 
nodetool netstats. 

 Nodetool repair command has been running for almost 24hours and I can’t see 
 any activity from the logs or JMX.
Grep for session completed

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 15/02/2013, at 11:38 PM, Haithem Jarraya haithem.jarr...@struq.com wrote:

 Hi,
  
 I am new to Cassandra and I would like to hear your thoughts on this.
 We are running our tests with Cassandra 1.2.1, in relatively small dataset 
 ~60GB.
 Nodetool repair command has been running for almost 24hours and I can’t see 
 any activity from the logs or JMX.
 What am I missing? Or there is a problem with node tool repair?
 What other commands that I can run to do a sanity check on the cluster?
 Can I run nodetool repair on different node in the same time?
  
  
 Here is the current test deployment of Cassandra
 $ nodetool status
 Datacenter: ams01 (Replication Factor 2)
 =
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns   Host ID   
 Rack
 UN  10.70.48.23   38.38 GB   256 19.0%  
 7c5fdfad-63c6-4f37-bb9f-a66271aa3423  RAC1
 UN  10.70.6.7858.13 GB   256 18.3%  
 94e7f48f-d902-4d4a-9b87-81ccd6aa9e65  RAC1
 UN  10.70.47.126  53.89 GB   256 19.4%  
 f36f1f8c-1956-4850-8040-b58273277d83  RAC1
 Datacenter: wdc01 (Replication Factor 1)
 =
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns   Host ID   
 Rack
 UN  10.24.116.66  65.81 GB   256 22.1%  
 f9dba004-8c3d-4670-94a0-d301a9b775a8  RAC1
 Datacenter: sjc01 (Replication Factor 1)
 =
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns   Host ID   
 Rack
 UN  10.55.104.90  63.31 GB   256 21.2%  
 4746f1bd-85e1-4071-ae5e-9c5baac79469  RAC1
  
  
 Many Thanks,
  
 Haithem
  



Re: Question on Cassandra Snapshot

2013-02-17 Thread aaron morton
 With incremental_backup turned OFF in Cassandra.yaml - Are all SSTables are 
 under /data/TestKeySpace/ColumnFamily at all times?
No. 
They are deleted when they are compacted and no internal operations are 
referencing them. 

 With incremental_backup turned ON in cassandra.yaml - Are current SSTables 
 under /data/TestKeySpace/ColumnFamily/ with a hardlink to 
 /data/TestKeySpace/ColumnFamily/backups? 
Yes, sort of. 
*All* SSTables ever created are in the backups directory. 
Not just the ones currently live.

 Lets say I have taken snapshot and moved the 
 /data/TestKeySpace/ColumnFamily/snapshots/snapshot-name/*.db to tape, at 
 what point should I be backing up *.db files from 
 /data/TestKeySpace/ColumnFamily/backups directory. Also, should I be deleting 
 the *.db files whose inode matches with the files in the snapshot? Is that a 
 correct approach? 
Backup all files in the snapshots. There may be non .db extensions files if you 
use levelled compactions
When you are finished with the snapshot delete it. If the inode is not longer 
referenced from the live data dir it will be deleted. 

 I noticed /data/TestKeySpace/ColumnFamily/snapshots/timestamp-ColumnFamily/ 
 what are these timestamp directories?
Probably automatic snapshot from dropping KS or CF's

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/02/2013, at 4:41 AM, S C as...@outlook.com wrote:

 I appreciate any advise or pointers on this.
 
 Thanks in advance.
 
 From: as...@outlook.com
 To: user@cassandra.apache.org
 Subject: Question on Cassandra Snapshot
 Date: Thu, 14 Feb 2013 20:47:14 -0600
 
 I have been looking at incremental backups and snapshots. I have done some 
 experimentation but could not come to a conclusion. Can somebody please help 
 me understanding it right?
 
 /data is my data partition
 
 With incremental_backup turned OFF in Cassandra.yaml - Are all SSTables are 
 under /data/TestKeySpace/ColumnFamily at all times?
 With incremental_backup turned ON in cassandra.yaml - Are current SSTables 
 under /data/TestKeySpace/ColumnFamily/ with a hardlink to 
 /data/TestKeySpace/ColumnFamily/backups? 
 Lets say I have taken snapshot and moved the 
 /data/TestKeySpace/ColumnFamily/snapshots/snapshot-name/*.db to tape, at 
 what point should I be backing up *.db files from 
 /data/TestKeySpace/ColumnFamily/backups directory. Also, should I be deleting 
 the *.db files whose inode matches with the files in the snapshot? Is that a 
 correct approach? 
 I noticed /data/TestKeySpace/ColumnFamily/snapshots/timestamp-ColumnFamily/ 
 what are these timestamp directories?
 
 Thanks in advance. 
 SC



unsubscribe

2013-02-17 Thread puneet loya
unsubscribe me please.

Thank you


RE: NPE in running ClientOnlyExample

2013-02-17 Thread Jain Rahul
Thanks Edward,

My Bad. I was confused as It does seems to create keyspace also, As I 
understand (although i'm not sure)

   ListCfDef cfDefList = new ArrayListCfDef();
CfDef columnFamily = new CfDef(KEYSPACE, COLUMN_FAMILY);
cfDefList.add(columnFamily);
try
{
client.system_add_keyspace(new KsDef(KEYSPACE, 
org.apache.cassandra.locator.SimpleStrategy, 1, cfDefList));
int magnitude = client.describe_ring(KEYSPACE).size();

Can I request you to please point me to some examples with I can start. I try 
to see some example from hector but it does seems to be in-line with 
Cassandra's 1.1 version.

Regards,
Rahul


-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
Sent: 17 February 2013 21:49
To: user@cassandra.apache.org
Subject: Re: NPE in running ClientOnlyExample

This is a bad example to follow. This is the internal client the Cassandra 
nodes use to talk to each other (fat client) usually you do not use this unless 
you want to write some embedded code on the Cassandra server.

Typically clients use thrift/native transport. But you are likely getting the 
error you are seeing because the keyspace or column family is not created yet.

On Sat, Feb 16, 2013 at 11:41 PM, Jain Rahul ja...@ivycomptech.com wrote:
 Hi All,



 I am newbie to Cassandra and trying to run an example program
 ClientOnlyExample  taken from
 https://raw.github.com/apache/cassandra/cassandra-1.2/examples/client_only/src/ClientOnlyExample.java.
 But while executing  the program it gives me a null pointer exception.
 Can you guys please help me out what I am missing.



 I am using Cassandra 1.2.1 version. I have pasted the logs at
 http://pastebin.com/pmADWCYe



 Exception in thread main java.lang.NullPointerException

   at
 org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:71)

   at
 org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:66)

   at
 org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:61)

   at
 org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:56)

   at org.apache.cassandra.db.RowMutation.add(RowMutation.java:183)

   at org.apache.cassandra.db.RowMutation.add(RowMutation.java:204)

   at ClientOnlyExample.testWriting(ClientOnlyExample.java:78)

   at ClientOnlyExample.main(ClientOnlyExample.java:135)



 Regards,

 Rahul

 This email and any attachments are confidential, and may be legally
 privileged and protected by copyright. If you are not the intended
 recipient dissemination or copying of this email is prohibited. If you
 have received this in error, please notify the sender by replying by
 email and then delete the email completely from your system. Any views
 or opinions are solely those of the sender. This communication is not
 intended to form a binding contract unless expressly indicated to the 
 contrary and properly authorised.
 Any actions taken on the basis of this email are at the recipient's
 own risk.
This email and any attachments are confidential, and may be legally privileged 
and protected by copyright. If you are not the intended recipient dissemination 
or copying of this email is prohibited. If you have received this in error, 
please notify the sender by replying by email and then delete the email 
completely from your system. Any views or opinions are solely those of the 
sender. This communication is not intended to form a binding contract unless 
expressly indicated to the contrary and properly authorised. Any actions taken 
on the basis of this email are at the recipient's own risk.


Re: unsubscribe

2013-02-17 Thread Dave Brosius

On 02/17/2013 01:26 PM, puneet loya wrote:

unsubscribe me please.

Thank you


if only directions were followed:

http://hadonejob.com/images/full/102.jpg


send to

user-unsubscr...@cassandra.apache.org




Re: odd production issue today 1.1.4

2013-02-17 Thread aaron morton
There is always this old chestnut 
http://wiki.apache.org/cassandra/FAQ#ubuntu_hangs

A
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/02/2013, at 8:22 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

 With hyper threading a core can show up as two or maybe even four
 physical system processors, this is something the kernel does.
 
 On Fri, Feb 15, 2013 at 11:41 AM, Hiller, Dean dean.hil...@nrel.gov wrote:
 We ran into an issue today where website became around 10 times slower.  We 
 found out node 5 out of our 6 nodes was hitting 2100% cpu (cat /proc/cpuinfo 
 reveals a 16 processor machine).  I am really not sure how to hit 2100% 
 unless we had 21 processors.  It bounces between 300% and 2100% so I tried 
 to a do a thread dump and had to use –F which then hotspot hit a nullpointer 
 :(.
 
 I copied off all my logs after restarting(should have done it before 
 restarting it).  Any ideas what I could even look for as to what went wrong 
 with this node?
 
 Also, we know our astyanax for some reason is not setup properly yet so we 
 probably would not have seen an issue had we had all nodes in the seed 
 list(which we changed today) as astyanax is supposed to be measuring time 
 per request and changing which nodes it hits but we know it only hits nodes 
 in our seedlist right now as we have not fixed that yet.  Our astyanax was 
 hitting 3,4,5,6 and did not have 1 and 2 in the seed list (we rollout a new 
 version next wed. with the new seedlist including the last two delaying the 
 dynamic discovery config we need to look at).
 
 Thanks,
 Dean
 
 Commands I ran with jstack that didn't work out too well….
 
 [cassandra@a5 ~]$ jstack -l 20907  threads.txt
 20907: Unable to open socket file: target process not responding or HotSpot 
 VM not loaded
 The -F option can be used when the target process is not responding
 [cassandra@a5 ~]$ jstack -l -F  20907  threads.txt
 Attaching to process ID 20907, please wait...
 Debugger attached successfully.
 Server compiler detected.
 JVM version is 20.7-b02
 java.lang.NullPointerException
 at 
 sun.jvm.hotspot.oops.InstanceKlass.computeSubtypeOf(InstanceKlass.java:426)
 at sun.jvm.hotspot.oops.Klass.isSubtypeOf(Klass.java:137)
 at sun.jvm.hotspot.oops.Oop.isA(Oop.java:100)
 at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:93)
 at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
 at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52)
 at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
 at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
 at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
 at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at sun.tools.jstack.JStack.runJStackTool(JStack.java:118)
 at sun.tools.jstack.JStack.main(JStack.java:84)
 [cassandra@a5 ~]$ java -version
 java version 1.6.0_32



Re: cassandra vs. mongodb quick question

2013-02-17 Thread aaron morton
If you have spinning disk and 1G networking and no virtual nodes, I would still 
say 300G to 500G is a soft limit. 

If you are using virtual nodes, SSD, JBOD disk configuration or faster 
networking you may go higher. 

The limiting factors are the time it take to repair, the time it takes to 
replace a node, the memory considerations for 100's of millions of rows. If you 
the performance of those operations is acceptable to you, then go crazy. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/02/2013, at 9:05 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 So I found out mongodb varies their node size from 1T to 42T per node 
 depending on the profile.  So if I was going to be writing a lot but rarely 
 changing rows, could I also use cassandra with a per node size of +20T or is 
 that not advisable?
 
 Thanks,
 Dean



Re: can we pull rows out compressed from cassandra(lots of rows)?

2013-02-17 Thread aaron morton
No. 
The rows are uncompressed deep down in the IO stack. 

There is compression in the binary protocol 
http://www.datastax.com/dev/blog/binary-protocol 
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=doc/native_protocol.spec;hb=refs/heads/cassandra-1.2

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/02/2013, at 9:35 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 
 Thanks,
 Dean



unsubscribe

2013-02-17 Thread James Wong
On Feb 17, 2013 10:27 AM, puneet loya puneetl...@gmail.com wrote:

 unsubscribe me please.

 Thank you


Re: unsubscribe

2013-02-17 Thread Michael Kjellman
Please see the Mailing Lists section of the home page.

http://cassandra.apache.org

user-unsubscr...@cassandra.apache.org



From: James Wong jwong...@gmail.commailto:jwong...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Sunday, February 17, 2013 12:06 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: unsubscribe


On Feb 17, 2013 10:27 AM, puneet loya 
puneetl...@gmail.commailto:puneetl...@gmail.com wrote:

 unsubscribe me please.

 Thank you


Re: Deleting old items

2013-02-17 Thread aaron morton
I'll email the docs people. 

I believe they are saying use compaction throttling rather than this not 
this does nothing

Although I used this in the last month on a machine with very little ram to 
limit compaction memory use.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/02/2013, at 7:05 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 Can you point to the docs.
 
 http://www.datastax.com/docs/1.1/configuration/storage_configuration#max-compaction-threshold
 
 And thanks about the rest of your answers, once again ;-).
 
 Alain
 
 
 2013/2/16 aaron morton aa...@thelastpickle.com
  Is that a feature that could possibly be developed one day ?
 No. 
 Timestamps are essentially internal implementation used to resolve different 
 values for the same column. 
 
 With min_compaction_level_threshold did you mean 
 min_compaction_threshold  ? If so, why should I do that, what are the 
 advantage/inconvenient of reducing this value ?
 
 Yes, min_compaction_threshold, my bad. 
 If you have a wide row and delete a lot of values you will end up with a lot 
 of tombstones. These may dramatically reduce the read performance until they 
 are purged. Reducing the compaction threshold makes compaction happen more 
 frequently. 
 
 Looking at the doc I saw that: max_compaction_threshold: Ignored in 
 Cassandra 1.1 and later.. How to ensure that I'll always keep a small 
 amount of SSTables then ?
 AFAIK it's not. 
 There may be some confusion about the location of the settings in CLI vs CQL. 
 Can you point to the docs. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 13/02/2013, at 10:14 PM, Alain RODRIGUEZ arodr...@gmail.com wrote:
 
 Hi Aaron, once again thanks for this answer.
 So is it possible to delete all the data inserted in some CF between 2 
 dates or data older than 1 month ?
 No. 
 
 Why is there no way of deleting or getting data using the internal timestamp 
 stored alongside of any inserted column (as described here: 
 http://www.datastax.com/docs/1.1/ddl/column_family#standard-columns) ? Is 
 that a feature that could possibly be developed one day ? It could be useful 
 to perform delete of old data or to bring to a dev cluster just the last 
 week of data for example.
 
 With min_compaction_level_threshold did you mean 
 min_compaction_threshold  ? If so, why should I do that, what are the 
 advantage/inconvenient of reducing this value ?
 
 Looking at the doc I saw that: max_compaction_threshold: Ignored in 
 Cassandra 1.1 and later.. How to ensure that I'll always keep a small 
 amount of SSTables then ? Why is this deprecated ?
 
 Alain
 
 
 2013/2/12 aaron morton aa...@thelastpickle.com
 So is it possible to delete all the data inserted in some CF between 2 
 dates or data older than 1 month ?
 No. 
 
 You need to issue row level deletes. If you don't know the row key you'll 
 need to do range scans to locate them. 
 
 If you are deleting parts of wide rows consider reducing the 
 min_compaction_level_threshold on the CF to 2
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:
 
 Hi,
 
 I would like to know if there is a way to delete old/unused data easily ?
 
 I know about TTL but there are 2 limitations of TTL:
 
 - AFAIK, there is no TTL on counter columns
 - TTL need to be defined at write time, so it's too late for data already 
 inserted.
 
 I also could use a standard delete but it seems inappropriate for such a 
 massive.
 
 In some cases, I don't know the row key and would like to delete all the 
 rows starting by, let's say, 1050#... 
 
 Even better, I understood that columns are always inserted in C* with 
 (name, value, timestamp). So is it possible to delete all the data inserted 
 in some CF between 2 dates or data older than 1 month ?
 
 Alain
 
 
 
 



Re: Is there any consolidated literature about Read/Write and Data Consistency in Cassandra ?

2013-02-17 Thread aaron morton
If you want the underlying ideas try the Dynamo paper, the Big Table paper and 
the original Cassandra paper from facebook. 

Start here http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/02/2013, at 7:40 AM, mateus mat...@tripleoxygen.net wrote:

 Like articles with tests and conclusions about it, and such, and not like the 
 documentation in DataStax, or the Cassandra Books.
 
 Thank you.
 



Re: nodetool repair with vnodes

2013-02-17 Thread aaron morton
 …so it seems to me that it is running on all vnodes ranges.
Yes.

 Also, whatever the node which I launch the command on is, only one node log 
 is moving and is always the same node. 
Not sure what you mean here. 

 So, to me, it's like the nodetool repair command is running always on the 
 same single node and repairing everything.
If you use nodetool repair without the -pr flag in your setup (3 nodes and I 
assume RF 3) it will repair all token ranges in the cluster. 

 Is there anything I'm missing ?
Look for messages with session completed in the log from the 
AntiEntropyService

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/02/2013, at 12:51 AM, Marco Matarazzo marco.matara...@hexkeep.com wrote:

 Greetings. 
 
 I'm trying to run nodetool repair on a Cassandra 1.2.1 cluster of 3 nodes 
 with 256 vnodes each.
 
 On a pre-1.2 cluster I used to launch a nodetool repair on every node every 
 24hrs. Now I'm getting a differenf behavior, and I'm sure I'm missing 
 something.
 
 What I see on the command line is: 
 
 [2013-02-17 10:20:15,186] Starting repair command #1, repairing 768 ranges 
 for keyspace goh_master
 [2013-02-17 10:48:13,401] Repair session 3d140e10-78e3-11e2-af53-d344dbdd69f5 
 for range (6556914650761469337,6580337080281832001] finished
 (…repeat the last line 767 times)
 
 …so it seems to me that it is running on all vnodes ranges.
 
 Also, whatever the node which I launch the command on is, only one node log 
 is moving and is always the same node. 
 
 So, to me, it's like the nodetool repair command is running always on the 
 same single node and repairing everything.
 
 I'm sure I'm making some mistakes, and I just can't find any clue of what's 
 wrong with my nodetool usage on the documentation (if anything is wrong, 
 btw). Is there anything I'm missing ?
 
 --
 Marco Matarazzo
 
 



Re: Nodetool doesn't shows two nodes

2013-02-17 Thread Boris Solovyov
Hi,

I've checked all things Alain suggested and set up a fresh 2-node cluster,
and I still get the same result: each node lists itself as only one.

This time I made the following changes:

   - I set listen_address to the public DNS name. Internally, AWS's DNS
   will map this to the 10.x IP, so this should work correctlly if I
   understand right. These are new EC2 instances, and I did not trust
   configured hostname or so on.
   - I opened all ports between nodes in security group.
   - I kept the snitch at Ec2MultiRegionSnitch. This cluster is small now
   but it will be very large and nationwide if I succeed and choose Cassandra
   for this purpose. Do I right understand that it is not possible to change
   this later, or at least is not easy?
   - I ensured all Alain suggestions, for example cluster_name is same with
   all nodes.
   - I set seed list to public DNS name of first node. This is identical on
   both node.
   - I checked Alain's suggest about auto_bootstrap. Docs say this is not
   needed to set. Is this docs wrong? (I look at DataStax 1.2 PDF docs)

Here is some more debugging evidence. On node 1, the seed,

[root@ip-10-113-19-24 ~]# ifconfig | grep inet.addr
  inet addr:10.113.19.24  Bcast:10.113.19.255  Mask:255.255.254.0
[root@ip-10-113-19-24 ~]# nodetool status
Datacenter: us-east
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns   Host ID
  Rack
UN  23.22.204.201 20.97 KB   256 100.0%
 4fadd4fd-c57c-4172-95aa-092368ba5743  1a
[root@ip-10-113-19-24 ~]# netstat -antp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address   Foreign Address
State   PID/Program name
tcp0  0 0.0.0.0:71990.0.0.0:*
LISTEN  1910/java
tcp0  0 0.0.0.0:47298   0.0.0.0:*
LISTEN  1910/java
tcp0  0 0.0.0.0:57030   0.0.0.0:*
LISTEN  1910/java
tcp0  0 0.0.0.0:91600.0.0.0:*
LISTEN  1910/java
tcp0  0 0.0.0.0:90420.0.0.0:*
LISTEN  1910/java
tcp0  0 0.0.0.0:22  0.0.0.0:*
LISTEN  1231/sshd
tcp0  0 10.113.19.24:7000   0.0.0.0:*
LISTEN  1910/java
tcp0  1 10.113.19.24:38948  54.234.147.60:7000
 SYN_SENT1910/java
tcp0  0 10.113.19.24:7000   10.113.19.24:45328
 ESTABLISHED 1910/java
tcp0  0 10.113.19.24:7000   10.114.205.157:47713
 ESTABLISHED 1910/java
tcp0  1 10.113.19.24:45597  23.22.204.201:7000
 SYN_SENT1910/java
tcp0  0 10.113.19.24:45328  10.113.19.24:7000
ESTABLISHED 1910/java

And in the log,

 INFO 20:58:12,472 Node /23.22.204.201 state jump to normal
 INFO 20:58:12,482 Startup completed! Now serving reads.

Now, this looks similar to the problem before with the private IP addresses
being used some times, public other times. By the way, the other node,
whose internal IP address is 10.114.205.157, is connected to this seed node
as you can see.

I think I could understand this problem if I understand which types of
network connections I should expect to see in the netstat, and what output
I should expect to see in the log. Can someone with more experience tell me
what is wrong/unexpected above? And am I working against Amazon's
architecture by using IPs the way I do?

While I wait for answer, I will shut down, delete all data, and reconfigure
with public IP addresses explicitly and not use DNS names :-) I have a
feeling this is the problem. From within Amazon EC2 server, requesting DNS
for a public DNS name returns the private IP address. (However, I still
feel unsure about what is right way to do this, because I do not know if
Cassandra will use DNS resolve and end up trying to connect to a private IP
that Cassandra is not listening.)

Thanks,
- Boris



On Wed, Feb 13, 2013 at 10:37 AM, Boris Solovyov
boris.solov...@gmail.comwrote:

 Thank you Alain. I will check the things you suggest and report my results.

 - Boris


 On Wed, Feb 13, 2013 at 7:54 AM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 Hi Boris.

 I feel like I have made a beginner's mistake
 That's an horrible feeling :D. I'll try to help ;)

 cluster_name: 'TS'
 Are you sure you used the same name for both node ?

 I can connect to port 7000
 You can check all the ports needed there
 http://www.datastax.com/docs/1.2/install/install_ami and open them in
 security group once and for all so you won't be wondering this anymore.

 listen_address: 10.145.232.190
 INFO 19:36:32,710 Node /107.22.114.19 state jump to normal
 There is 10.145.232.190 defined as listen address and you logs says
 that 107.22.114.19 joined the ring and your second ip seems to be
 23.21.11.193... When you stop an EC2 server, its internal ip may change.
 So I recommend you not to do so, but restart them instead. Anyway you
 should 

Re: Is C* common nickname for Cassandra?

2013-02-17 Thread Michael Kjellman
Why do you feel that link is unprofessional? Just wondering. I actually quite 
like the abbreviation personally.

On Feb 17, 2013, at 1:37 PM, Boris Solovyov 
boris.solov...@gmail.commailto:boris.solov...@gmail.com wrote:

Thanks. I don't know if anyone cares my opinion, but as a newcomer to the 
community, my feedback is that it is not needed. At best it confuses a newbie 
and makes him feel like an outsider. At worst it just looks totally 
unprofessional, like here: 
http://www.planetcassandra.org/blog/post/calling-all-apache-cassandra-speakers 
it is hard to form a good opinion of Cassandra project when it is being 
discussed like that.

Hopefully this is helpful constructive criticism and not just useless flamebait 
or trollbait.

Boris


On Fri, Feb 8, 2013 at 11:51 AM, Tyler Hobbs 
ty...@datastax.commailto:ty...@datastax.com wrote:
Yes, C* is short for Cassandra.


On Fri, Feb 8, 2013 at 10:43 AM, Boris Solovyov 
boris.solov...@gmail.commailto:boris.solov...@gmail.com wrote:
I see people refer to C* and I assume it mean Cassandra, but just wanted to 
check for sure. In case it is somethings else and I miss it :) Do I right 
understand?



--
Tyler Hobbs
DataStaxhttp://datastax.com/



Re: Is C* common nickname for Cassandra?

2013-02-17 Thread Boris Solovyov
Is hard to say, really. I guess just feels like not very serious, overly
casual, which mean not treating the project with respect? I guess I believe
if you want something treated with respect you must demonstrate how
seriously you take it oneself. I am sure this is personal opinion only, but
perhaps it is shared by others. Enterprise Pointy Haired Boss might make
purchase decision on this criteria instead of technical merits. You know
they make decision based on how pretty project logo is half the time :-)

Hope this helps
Boris


On Sun, Feb 17, 2013 at 4:42 PM, Michael Kjellman
mkjell...@barracuda.comwrote:

 Why do you feel that link is unprofessional? Just wondering. I actually
 quite like the abbreviation personally.

 On Feb 17, 2013, at 1:37 PM, Boris Solovyov boris.solov...@gmail.com
 wrote:

 Thanks. I don't know if anyone cares my opinion, but as a newcomer to the
 community, my feedback is that it is not needed. At best it confuses a
 newbie and makes him feel like an outsider. At worst it just looks totally
 unprofessional, like here:
 http://www.planetcassandra.org/blog/post/calling-all-apache-cassandra-speakers
  it
 is hard to form a good opinion of Cassandra project when it is being
 discussed like that.

 Hopefully this is helpful constructive criticism and not just useless
 flamebait or trollbait.

 Boris


 On Fri, Feb 8, 2013 at 11:51 AM, Tyler Hobbs ty...@datastax.com wrote:

 Yes, C* is short for Cassandra.


 On Fri, Feb 8, 2013 at 10:43 AM, Boris Solovyov boris.solov...@gmail.com
  wrote:

 I see people refer to C* and I assume it mean Cassandra, but just wanted
 to check for sure. In case it is somethings else and I miss it :) Do I
 right understand?




 --
 Tyler Hobbs
 DataStax http://datastax.com/





Re: Nodetool doesn't shows two nodes

2013-02-17 Thread Boris Solovyov
No, it doesn't works, same thing: both nodes seems to just exist solo and I
have 2 single-node clusters :-( OK, so now I am confused, and hope list
will help me out. To understand what wrong, I think I need to know what
happens in node bootstrap, and in node join ring. Who does node
communicate, on which address? What information it exchanges? What happens
then? What this process looks like normally?

I have read all docs, several time, don't think I missed it, so it might
not be explain there clearly. I will look again, and look to source code
next.

- Boris


On Sun, Feb 17, 2013 at 4:48 PM, Boris Solovyov boris.solov...@gmail.comwrote:

 Aha! I think I might have something breakthrough. I tried setting public
 IP in listen_address (and therefore in broadcast_address, because as I
 understand it inherits if it is commented out), and in seeds list.  Node
 fails to start, because Cassandra cannot bind to public IP address: it does
 not exists on box. Of course! This is why I cannot see it in ifconfig.

 SO, my next theory,

- set listen_address to private IP
- set broadcast_address to public IP, tells other nodes how to connect
- set seeds to public IP

 I will try this next and continue flood your inbox with stream of
 consciousness try-and-error ;-)



Re: nodetool repair with vnodes

2013-02-17 Thread Marco Matarazzo
 So, to me, it's like the nodetool repair command is running always on the 
 same single node and repairing everything.
 If you use nodetool repair without the -pr flag in your setup (3 nodes and I 
 assume RF 3) it will repair all token ranges in the cluster. 

That's correct, 3 nodes and RF 3. Sorry for not specifying it in the beginning.


So, running it periodically on just one node is enough for cluster maintenance 
? Does this depends on the fact that every vnode data is related with the 
previous and next vnode, and this particular setup makes this enough as it 
cover every physical node?


Also: running it with -pr does output:

[2013-02-17 12:29:25,293] Nothing to repair for keyspace 'system'
[2013-02-17 12:29:25,301] Starting repair command #2, repairing 1 ranges for 
keyspace keyspace_test
[2013-02-17 12:29:28,028] Repair session 487d0650-78f5-11e2-a73a-2f5b109ee83c 
for range (-9177680845984855691,-9171525326632276709] finished
[2013-02-17 12:29:28,028] Repair command #2 finished

… that, as far as I can understand, works on the first vnode on the specified 
node, or so it seems from the output range. Am I right? Is there a way to run 
it only for all vnodes on a single physical node ?

Thank you!

--
Marco Matarazzo


Re: Nodetool doesn't shows two nodes

2013-02-17 Thread Boris Solovyov
OK. I got it. I realized that storage_port wasn't actually open between the
nodes, because it is using the public IP. (I did find this information in
the docs, after looking more... it is in section on Types of snitches. It
explains everything I found by try and error.)

After opening this port 7000 to all IP addresses, the cluster boots OK and
the two nodes see each other. Now I have the happy result. But my nodes are
wide open to the entire internet on port 7000. This is a serious problem.
This obviously can't be put into production.

I definitely need cross-continent deployment. Single AZ or single region
deployment is not going to be enough. How do people solve this in practice?


Re: Size Tiered - Leveled Compaction

2013-02-17 Thread Wei Zhu
We doubled the SStable size to 10M. It still generates a lot of SSTable and we 
don't see much difference of the read latency.  We are able to finish the 
compactions after repair within serveral hours. We will increase the SSTable 
size again if we feel the number of SSTable hurts the performance. 

- Original Message -
From: Mike mthero...@yahoo.com
To: user@cassandra.apache.org
Sent: Sunday, February 17, 2013 4:50:40 AM
Subject: Re: Size Tiered - Leveled Compaction


Hello Wei, 

First thanks for this response. 

Out of curiosity, what SSTable size did you choose for your usecase, and what 
made you decide on that number? 

Thanks, 
-Mike 

On 2/14/2013 3:51 PM, Wei Zhu wrote: 




I haven't tried to switch compaction strategy. We started with LCS. 


For us, after massive data imports (5000 w/seconds for 6 days), the first 
repair is painful since there is quite some data inconsistency. For 150G nodes, 
repair brought in about 30 G and created thousands of pending compactions. It 
took almost a day to clear those. Just be prepared LCS is really slow in 1.1.X. 
System performance degrades during that time since reads could go to more 
SSTable, we see 20 SSTable lookup for one read.. (We tried everything we can 
and couldn't speed it up. I think it's single threaded and it's not 
recommended to turn on multithread compaction. We even tried that, it didn't 
help )There is parallel LCS in 1.2 which is supposed to alleviate the pain. 
Haven't upgraded yet, hope it works:) 


http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 





Since our cluster is not write intensive, only 100 w/seconds. I don't see any 
pending compactions during regular operation. 


One thing worth mentioning is the size of the SSTable, default is 5M which is 
kind of small for 200G (all in one CF) data set, and we are on SSD. It more 
than 150K files in one directory. (200G/5M = 40K SSTable and each SSTable 
creates 4 files on disk) You might want to watch that and decide the SSTable 
size. 


By the way, there is no concept of Major compaction for LCS. Just for fun, you 
can look at a file called $CFName.json in your data directory and it tells you 
the SSTable distribution among different levels. 


-Wei 





From: Charles Brophy cbro...@zulily.com 
To: user@cassandra.apache.org 
Sent: Thursday, February 14, 2013 8:29 AM 
Subject: Re: Size Tiered - Leveled Compaction 


I second these questions: we've been looking into changing some of our CFs to 
use leveled compaction as well. If anybody here has the wisdom to answer them 
it would be of wonderful help. 


Thanks 
Charles 


On Wed, Feb 13, 2013 at 7:50 AM, Mike  mthero...@yahoo.com  wrote: 


Hello, 

I'm investigating the transition of some of our column families from Size 
Tiered - Leveled Compaction. I believe we have some high-read-load column 
families that would benefit tremendously. 

I've stood up a test DB Node to investigate the transition. I successfully 
alter the column family, and I immediately noticed a large number (1000+) 
pending compaction tasks become available, but no compaction get executed. 

I tried running nodetool sstableupgrade on the column family, and the 
compaction tasks don't move. 

I also notice no changes to the size and distribution of the existing SSTables. 

I then run a major compaction on the column family. All pending compaction 
tasks get run, and the SSTables have a distribution that I would expect from 
LeveledCompaction (lots and lots of 10MB files). 

Couple of questions: 

1) Is a major compaction required to transition from size-tiered to leveled 
compaction? 
2) Are major compactions as much of a concern for LeveledCompaction as their 
are for Size Tiered? 

All the documentation I found concerning transitioning from Size Tiered to 
Level compaction discuss the alter table cql command, but I haven't found too 
much on what else needs to be done after the schema change. 

I did these tests with Cassandra 1.1.9. 

Thanks, 
-Mike 







Re: Nodetool doesn't shows two nodes

2013-02-17 Thread Jared Biel
This is something that I found while using the multi-region snitch -
it uses public IPs for communication. See the original ticket here:
https://issues.apache.org/jira/browse/CASSANDRA-2452. It'd be nice if
it used the private IPs to communicate with nodes that are in the same
region as itself, but I do not believe this is the case. Be aware that
you will be charged for external data transfer even for nodes in the
same region because the traffic will not fall under their free (for
same AZ) or reduced (for intra-AZ) tiers.

If you continue using this snitch in the mean time, it is not
necessary (or recommended) to have those ports open to 0.0.0.0/0.
You'll simply need to add the public IPs of your C* servers to the
correct security group(s) to allow access.

There's something else that's a little strange about the EC2 snitches:
us-east-1 is (incorrectly) represented as the datacenter us-east.
Other regions are recognized and named properly (us-west-2 for
example) This is kind-of covered in the ticket here:
https://issues.apache.org/jira/browse/CASSANDRA-4026 I wish it could
be fixed properly.

Good luck!


On 17 February 2013 16:16, Boris Solovyov boris.solov...@gmail.com wrote:
 OK. I got it. I realized that storage_port wasn't actually open between the
 nodes, because it is using the public IP. (I did find this information in
 the docs, after looking more... it is in section on Types of snitches. It
 explains everything I found by try and error.)

 After opening this port 7000 to all IP addresses, the cluster boots OK and
 the two nodes see each other. Now I have the happy result. But my nodes are
 wide open to the entire internet on port 7000. This is a serious problem.
 This obviously can't be put into production.

 I definitely need cross-continent deployment. Single AZ or single region
 deployment is not going to be enough. How do people solve this in practice?