date:20130711

Re: data model question : finding out the n most recent changes items

2013-07-11 Thread aaron morton

What you described this sounds like the most appropriate:

CREATE TABLE user_file (
user_id uuid,
modified_date timestamp,
file_id timeuuid,
PRIMARY KEY(user_id, modified_date)
);

If you normally need more information about  the file then either store that as 
additional fields or pack the data using something like JSON or Protobuf. 

 my return list may still not accurate because a single directory could have 
 lot of modification changes. I basically end up pulling out series of 
 modification timestamp for the same directory.
Not sure I understand the problem. 

Cheers


-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/07/2013, at 6:51 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 I have an application that need to find out the n most recent modified files 
 for a given user id. I started out few tables but still couldn't get what i 
 want, I hope someone get point to some right direction...
 
 See my tables below.
 
 #1 won't work, because file_id's timeuuid contains creation time, not the 
 modification time.
 
 #2 won't work, because i can't order by a non primary key 
 column(modified_date)
 #3,#4 although i can now get  a time series of modification time of each file 
 belongs to a user, my return list may still not accurate because a single 
 directory could have lot of modification changes. I basically end up pulling 
 out series of modification timestamp for the same directory.
 
 Any suggestion?
 
 Thanks
 
 
 
 #1
 
 CREATE TABLE user_file (
 
 user_id uuid,
 
 file_id timeuuid,
 
 PRIMARY KEY(user_id, file_id)
 
 );
 
 
 
 #2
 
 CREATE TABLE user_file (
 
 user_id uuid,
 
 file_id timeuuid,
 
 modified_date timestamp,
 
 PRIMARY KEY(user_id, file_id)
 
 );
 
 
 
 #3
 
 CREATE TABLE user_file (
 
 user_id uuid,
 
 file_id timeuuid,
 
 modified_date timestamp,
 
 PRIMARY KEY(user_id, file_id, modified_date)
 
 );
 
 
 
 #4
 
 CREATE TABLE user_file (
 
 user_id uuid,
 
 modified_date timestamp,
 
 file_id timeuuid,
 
 PRIMARY KEY(user_id, modified_date, file_id)
 
 );

Re: data model question : finding out the n most recent changes items

2013-07-11 Thread Jimmy Lin

what I mean is, I really just want the last modified date instead of series
of timestamp and still able to sort or order by it.
(maybe I should rephrase my question as how to sort or order by last
modified column in a row)

CREATE TABLE user_file (
user_id uuid,
modified_date timestamp,
file_id timeuuid,
PRIMARY KEY(user_id, modified_date)
);

e.g user1 update file A 3 times in a row, and update file B, then update
file A again.
insert into user_file values(user1_uuid, date1, file_a_uuid);
insert into user_file values(user1_uuid, date2, file_a_uuid);
insert into user_file values(user1_uuid, date3, file_a_uuid);
insert into user_file values(user1_uuid, date4, file_b_uuid);
insert into user_file values(user1_uuid, date5, file_a_uuid);

#trying to get top 3 most recent changed files
select * from user_file where user_id=user1_uuid limit 3

using CQL, I will get 3 rows back(all file a)
(user1_uuid, date1, file_a_uuid);
(user1_uuid, date2, file_a_uuid);
(user1_uuid, date3, file_a_uuid);

what I want is (file a AND file b)
user1_uuid, date1, file_a_uuid
user1_uuid, date4, file_b_uuid

So how do I order by/sort by last modified column in a row?

thanks




On Thu, Jul 11, 2013 at 12:00 AM, aaron morton aa...@thelastpickle.comwrote:

 What you described this sounds like the most appropriate:

 CREATE TABLE user_file (
 user_id uuid,
 modified_date timestamp,
 file_id timeuuid,
 PRIMARY KEY(user_id, modified_date)
 );

 If you normally need more information about  the file then either store
 that as additional fields or pack the data using something like JSON or
 Protobuf.

  my return list may still not accurate because a single directory could
 have lot of modification changes. I basically end up pulling out series of
 modification timestamp for the same directory.
 Not sure I understand the problem.

 Cheers


 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 10/07/2013, at 6:51 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:

  I have an application that need to find out the n most recent modified
 files for a given user id. I started out few tables but still couldn't get
 what i want, I hope someone get point to some right direction...
 
  See my tables below.
 
  #1 won't work, because file_id's timeuuid contains creation time, not
 the modification time.
 
  #2 won't work, because i can't order by a non primary key
 column(modified_date)
  #3,#4 although i can now get  a time series of modification time of each
 file belongs to a user, my return list may still not accurate because a
 single directory could have lot of modification changes. I basically end up
 pulling out series of modification timestamp for the same directory.
 
  Any suggestion?
 
  Thanks
 
 
 
  #1
 
  CREATE TABLE user_file (
 
  user_id uuid,
 
  file_id timeuuid,
 
  PRIMARY KEY(user_id, file_id)
 
  );
 
 
 
  #2
 
  CREATE TABLE user_file (
 
  user_id uuid,
 
  file_id timeuuid,
 
  modified_date timestamp,
 
  PRIMARY KEY(user_id, file_id)
 
  );
 
 
 
  #3
 
  CREATE TABLE user_file (
 
  user_id uuid,
 
  file_id timeuuid,
 
  modified_date timestamp,
 
  PRIMARY KEY(user_id, file_id, modified_date)
 
  );
 
 
 
  #4
 
  CREATE TABLE user_file (
 
  user_id uuid,
 
  modified_date timestamp,
 
  file_id timeuuid,
 
  PRIMARY KEY(user_id, modified_date, file_id)
 
  );

RE: data model question : finding out the n most recent changes items

2013-07-11 Thread Lohith Samaga M

Hi,
Do you need to store the history of updates to a file?
If this is not required, then you can make the userid and file id as the row 
key. You need to simply update the modified_date timestamp. There will be only 
one row per file per user.

Thanks and Regards
M. Lohith Samaga





-Original Message-
From: y2k...@gmail.com on behalf of Jimmy Lin
Sent: Thu 11-Jul-13 13:09
To: user@cassandra.apache.org
Subject: Re: data model question : finding out the n most recent changes items
 
what I mean is, I really just want the last modified date instead of series
of timestamp and still able to sort or order by it.
(maybe I should rephrase my question as how to sort or order by last
modified column in a row)

CREATE TABLE user_file (
user_id uuid,
modified_date timestamp,
file_id timeuuid,
PRIMARY KEY(user_id, modified_date)
);

e.g user1 update file A 3 times in a row, and update file B, then update
file A again.
insert into user_file values(user1_uuid, date1, file_a_uuid);
insert into user_file values(user1_uuid, date2, file_a_uuid);
insert into user_file values(user1_uuid, date3, file_a_uuid);
insert into user_file values(user1_uuid, date4, file_b_uuid);
insert into user_file values(user1_uuid, date5, file_a_uuid);

#trying to get top 3 most recent changed files
select * from user_file where user_id=user1_uuid limit 3

using CQL, I will get 3 rows back(all file a)
(user1_uuid, date1, file_a_uuid);
(user1_uuid, date2, file_a_uuid);
(user1_uuid, date3, file_a_uuid);

what I want is (file a AND file b)
user1_uuid, date1, file_a_uuid
user1_uuid, date4, file_b_uuid

So how do I order by/sort by last modified column in a row?

thanks




On Thu, Jul 11, 2013 at 12:00 AM, aaron morton aa...@thelastpickle.comwrote:

 What you described this sounds like the most appropriate:

 CREATE TABLE user_file (
 user_id uuid,
 modified_date timestamp,
 file_id timeuuid,
 PRIMARY KEY(user_id, modified_date)
 );

 If you normally need more information about  the file then either store
 that as additional fields or pack the data using something like JSON or
 Protobuf.

  my return list may still not accurate because a single directory could
 have lot of modification changes. I basically end up pulling out series of
 modification timestamp for the same directory.
 Not sure I understand the problem.

 Cheers


 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 10/07/2013, at 6:51 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:

  I have an application that need to find out the n most recent modified
 files for a given user id. I started out few tables but still couldn't get
 what i want, I hope someone get point to some right direction...
 
  See my tables below.
 
  #1 won't work, because file_id's timeuuid contains creation time, not
 the modification time.
 
  #2 won't work, because i can't order by a non primary key
 column(modified_date)
  #3,#4 although i can now get  a time series of modification time of each
 file belongs to a user, my return list may still not accurate because a
 single directory could have lot of modification changes. I basically end up
 pulling out series of modification timestamp for the same directory.
 
  Any suggestion?
 
  Thanks
 
 
 
  #1
 
  CREATE TABLE user_file (
 
  user_id uuid,
 
  file_id timeuuid,
 
  PRIMARY KEY(user_id, file_id)
 
  );
 
 
 
  #2
 
  CREATE TABLE user_file (
 
  user_id uuid,
 
  file_id timeuuid,
 
  modified_date timestamp,
 
  PRIMARY KEY(user_id, file_id)
 
  );
 
 
 
  #3
 
  CREATE TABLE user_file (
 
  user_id uuid,
 
  file_id timeuuid,
 
  modified_date timestamp,
 
  PRIMARY KEY(user_id, file_id, modified_date)
 
  );
 
 
 
  #4
 
  CREATE TABLE user_file (
 
  user_id uuid,
 
  modified_date timestamp,
 
  file_id timeuuid,
 
  PRIMARY KEY(user_id, modified_date, file_id)
 
  );
 
 
 
 
 




Information transmitted by this e-mail is proprietary to MphasiS, its 
associated companies and/ or its customers and is intended 
for use only by the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient or it appears that this mail has been forwarded 
to you without proper authority, you are notified that any use or dissemination 
of this information in any manner is strictly 
prohibited. In such cases, please notify us immediately at 
mailmas...@mphasis.com and delete this mail from your records.

Re: data model question : finding out the n most recent changes items

2013-07-11 Thread Jimmy Lin

Thanks for the suggestion.
I don't care the history of the update time to a file, BUT I do want to
ordered by it.

Reason for that is, without that, and if I have 10k+ file belongs to a
user, I have to fetch all the last modified time of all these 10k+ file and
sort through them in my application and only return the top N. Kind of
expensive.

I would like to see if it is possible to rely on Cassandra native storage
to achieve this.


CREATE TABLE user_file (
user_id uuid,
file_id timeuuid,

last_modified_time timestamp,

PRIMARY KEY(user_id, file_id)
);


select * from user_file where user_id=user1_uuid order by
last_modified_time limit 10

Above CQL would be invalid, because last_modified_time is not part of the
compound key, and is not allowed to used for order by purpose.




On Thu, Jul 11, 2013 at 12:51 AM, Lohith Samaga M lohith.sam...@mphasis.com
 wrote:

 **

 Hi,
 Do you need to store the history of updates to a file?
 If this is not required, then you can make the userid and file id as the
 row key. You need to simply update the modified_date timestamp. There will
 be only one row per file per user.

 Thanks and Regards
 M. Lohith Samaga






 -Original Message-
 From: y2k...@gmail.com on behalf of Jimmy Lin
 Sent: Thu 11-Jul-13 13:09
 To: user@cassandra.apache.org
 Subject: Re: data model question : finding out the n most recent changes
 items

 what I mean is, I really just want the last modified date instead of series
 of timestamp and still able to sort or order by it.
 (maybe I should rephrase my question as how to sort or order by last
 modified column in a row)

 CREATE TABLE user_file (
 user_id uuid,
 modified_date timestamp,
 file_id timeuuid,
 PRIMARY KEY(user_id, modified_date)
 );

 e.g user1 update file A 3 times in a row, and update file B, then update
 file A again.
 insert into user_file values(user1_uuid, date1, file_a_uuid);
 insert into user_file values(user1_uuid, date2, file_a_uuid);
 insert into user_file values(user1_uuid, date3, file_a_uuid);
 insert into user_file values(user1_uuid, date4, file_b_uuid);
 insert into user_file values(user1_uuid, date5, file_a_uuid);

 #trying to get top 3 most recent changed files
 select * from user_file where user_id=user1_uuid limit 3

 using CQL, I will get 3 rows back(all file a)
 (user1_uuid, date1, file_a_uuid);
 (user1_uuid, date2, file_a_uuid);
 (user1_uuid, date3, file_a_uuid);

 what I want is (file a AND file b)
 user1_uuid, date1, file_a_uuid
 user1_uuid, date4, file_b_uuid

 So how do I order by/sort by last modified column in a row?

 thanks




 On Thu, Jul 11, 2013 at 12:00 AM, aaron morton aa...@thelastpickle.com
 wrote:

  What you described this sounds like the most appropriate:
 
  CREATE TABLE user_file (
  user_id uuid,
  modified_date timestamp,
  file_id timeuuid,
  PRIMARY KEY(user_id, modified_date)
  );
 
  If you normally need more information about  the file then either store
  that as additional fields or pack the data using something like JSON or
  Protobuf.
 
   my return list may still not accurate because a single directory could
  have lot of modification changes. I basically end up pulling out series
 of
  modification timestamp for the same directory.
  Not sure I understand the problem.
 
  Cheers
 
 
  -
  Aaron Morton
  Freelance Cassandra Consultant
  New Zealand
 
  @aaronmorton
  http://www.thelastpickle.com
 
  On 10/07/2013, at 6:51 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:
 
   I have an application that need to find out the n most recent modified
  files for a given user id. I started out few tables but still couldn't
 get
  what i want, I hope someone get point to some right direction...
  
   See my tables below.
  
   #1 won't work, because file_id's timeuuid contains creation time, not
  the modification time.
  
   #2 won't work, because i can't order by a non primary key
  column(modified_date)
   #3,#4 although i can now get  a time series of modification time of
 each
  file belongs to a user, my return list may still not accurate because a
  single directory could have lot of modification changes. I basically end
 up
  pulling out series of modification timestamp for the same directory.
  
   Any suggestion?
  
   Thanks
  
  
  
   #1
  
   CREATE TABLE user_file (
  
   user_id uuid,
  
   file_id timeuuid,
  
   PRIMARY KEY(user_id, file_id)
  
   );
  
  
  
   #2
  
   CREATE TABLE user_file (
  
   user_id uuid,
  
   file_id timeuuid,
  
   modified_date timestamp,
  
   PRIMARY KEY(user_id, file_id)
  
   );
  
  
  
   #3
  
   CREATE TABLE user_file (
  
   user_id uuid,
  
   file_id timeuuid,
  
   modified_date timestamp,
  
   PRIMARY KEY(user_id, file_id, modified_date)
  
   );
  
  
  
   #4
  
   CREATE TABLE user_file (
  
   user_id uuid,
  
   modified_date timestamp,
  
   file_id timeuuid,
  
   PRIMARY KEY(user_id, modified_date, file_id)
  
   );

Alternate major compaction

2013-07-11 Thread Tomàs Núnez

Hi

About a year ago, we did a major compaction in our cassandra cluster (a
n00b mistake, I know), and since then we've had huge sstables that never
get compacted, and we were condemned to repeat the major compaction process
every once in a while (we are using SizeTieredCompaction strategy, and
we've not avaluated yet LeveledCompaction, because it has its downsides,
and we've had no time to test all of them in our environment).

I was trying to find a way to solve this situation (that is, do something
like a major compaction that writes small sstables, not huge as major
compaction does), and I couldn't find it in the documentation. I tried
cleanup and scrub/upgradesstables, but they don't do that (as documentation
states). Then I tried deleting all data in a node and then bootstrapping it
(or nodetool rebuild-ing it), hoping that this way the sstables would get
cleaned from deleted records and updates. But the deleted node just copied
the sstables from another node as they were, cleaning nothing.

So I tried a new approach: I switched the sstable compaction strategy
(SizeTiered to Leveled), forcing the sstables to be rewritten from scratch,
and then switching it back (Leveled to SizeTiered). It took a while (but so
do the major compaction process) and it worked, I have smaller sstables,
and I've regained a lot of disk space.

I'm happy with the results, but it doesn't seem a orthodox way of
cleaning the sstables. What do you think, is it something wrong or crazy?
Is there a different way to achieve the same thing?

Let's put an example:
Suppose you have a write-only columnfamily (no updates and no deletes, so
no need for LeveledCompaction, because SizeTiered works perfectly and
requires less I/O) and you mistakenly run a major compaction on it. After a
few months you need more space and you delete half the data, and you find
out that you're not freeing half the disk space, because most of those
records were in the major compacted sstables. How can you free the disk
space? Waiting will do you no good, because the huge sstable won't get
compacted anytime soon. You can run another major compaction, but that
would just postpone the real problem. Then you can switch compaction
strategy and switch it back, as I just did. Is there any other way?

-- 
[image: Groupalia] http://es.groupalia.com/
www.groupalia.com http://es.groupalia.com/Tomàs NúñezIT-SysprodTel. + 34
93 159 31 00 Fax. + 34 93 396 18 52Llull, 95-97, 2º planta, 08005
BarcelonaSkype:
tomas.nunez.groupaliatomas.nu...@groupalia.comnombre.apell...@groupalia.com[image:
Twitter] Twitter http://twitter.com/#%21/groupaliaes[image: Twitter]
 Facebook https://www.facebook.com/GroupaliaEspana[image: Twitter]
 Linkedin http://www.linkedin.com/company/groupalia
linkedin.pngtwitter.pngfacebook.pnggroupalia.jpg

listen_address and rpc_address address on different interface

2013-07-11 Thread Christopher Wirt

Hello,

 

I was wondering if anyone has measured the performance improvements to
having the listen address and client address bound to different interface?

 

We a have 2gbit connection serving both at the moment and this doesn't come
close to being saturated. But being very keen on fast reads at the 99th
percentile we're interested in even the smallest improvements.

 

Next question - Has anyone ever moved an existing node to have the listen
address and client access address bound to different addresses?

 

Our Problem  

Currently our only address is a DNS entry which we would like to keep bound
to the client access.

If we were to take down a node and change the listen address then re-join
the ring, the other nodes will mark the node as dead when we take it down
and assume we have a new node when we bring it back on a different address. 

Lots of wasted rebalancing and compaction will start.

We use Cassandra 1.2.4 w/vnodes.

Not sure there will be anyway around this.

So back to question one, am I wasting my time?

 

Thanks,

Chris

Re: High performance hardware with lot of data per node - Global learning about configuration

2013-07-11 Thread Aiman Parvaiz

Hi,
We also recently migrated to 3 hi.4xlarge boxes(Raid0 SSD) and the disk IO 
performance is definitely better than the earlier non SSD servers, we are 
serving up to 14k reads/s with a latency of 3-3.5 ms/op. 
I wanted to share our config options and ask about the data back up strategy 
for Raid0.

We are using C* 1.2.6 with

key_chache and row_cache of 300MB
I have not changed/ modified any other parameter except for going with 
multithreaded GC. I will be playing around with other factors and update 
everyone if I find something interesting.

Also, just wanted to share backup strategy and see if I can get something 
useful from how others are taking backup of their raid0. I am using tablesnap 
to upload SSTables to s3 and I have attached a separate EBS volume to every box 
and have set up rsync to mirror Cassandra data from Raid0 to EBS. I would 
really appreciate if you guys can share how you taking backups.

Thanks 


On Jul 9, 2013, at 7:11 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 Hi,
 
 Using C*1.2.2.
 
 We recently dropped our 18 m1.xLarge (4CPU, 15GB RAM, 4 Raid-0 Disks) servers 
 to get 3 hi1.4xLarge (16CPU, 60GB RAM, 2 Raid-0 SSD) servers instead, for 
 about the same price.
 
 We tried it after reading some benchmark published by Netflix.
 
 It is awesome and I recommend it to anyone who is using more than 18 xLarge 
 server or can afford these high cost / high performance EC2 instances. SSD 
 gives a very good throughput with an awesome latency.
 
 Yet, we had about 200 GB data per server and now about 1 TB.
 
 To alleviate memory pressure inside the heap I had to reduce the index 
 sampling. I changed the index_interval value from 128 to 512, with no visible 
 impact on latency, but a great improvement inside the heap which doesn't 
 complain about any pressure anymore.
 
 Is there some more tuning I could use, more tricks that could be useful while 
 using big servers, with a lot of data per node and relatively high throughput 
 ?
 
 SSD are at 20-40 % of their throughput capacity (according to OpsCenter), CPU 
 almost never reach a bigger load than 5 or 6 (with 16 CPU), 15 GB RAM used 
 out of 60GB.
 
 At this point I have kept my previous configuration, which is almost the 
 default one from the Datastax community AMI. There is a part of it, you can 
 consider that any property that is not in here is configured as default :
 
 cassandra.yaml
 
 key_cache_size_in_mb: (empty) - so default - 100MB (hit rate between 88 % and 
 92 %, good enough ?)
 row_cache_size_in_mb: 0 (not usable in our use case, a lot of different and 
 random reads)
 flush_largest_memtables_at: 0.80
 reduce_cache_sizes_at: 0.90
 
 concurrent_reads: 32 (I am thinking to increase this to 64 or more since I 
 have just a few servers to handle more concurrence)
 concurrent_writes: 32 (I am thinking to increase this to 64 or more too)
 memtable_total_space_in_mb: 1024 (to avoid having a full heap, shoul I use 
 bigger value, why for ?)
 
 rpc_server_type: sync (I tried hsha and had the ERROR 12:02:18,971 Read an 
 invalid frame size of 0. Are you using TFramedTransport on the client side? 
 error). No idea how to fix this, and I use 5 different clients for different 
 purpose  (Hector, Cassie, phpCassa, Astyanax, Helenus)...
 
 multithreaded_compaction: false (Should I try enabling this since I now use 
 SSD ?)
 compaction_throughput_mb_per_sec: 16 (I will definitely up this to 32 or even 
 more)
 
 cross_node_timeout: true
 endpoint_snitch: Ec2MultiRegionSnitch
 
 index_interval: 512
 
 cassandra-env.sh
 
 I am not sure about how to tune the heap, so I mainly use defaults
 
 MAX_HEAP_SIZE=8G
 HEAP_NEWSIZE=400M (I tried with higher values, and it produced bigger GC 
 times (1600 ms instead of  200 ms now with 400M)
 
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=8
 -XX:MaxTenuringThreshold=1
 -XX:CMSInitiatingOccupancyFraction=70
 -XX:+UseCMSInitiatingOccupancyOnly
 
 Does this configuration seems coherent ? Right now, performance are correct, 
 latency  5ms almost all the time. What can I do to handle more data per node 
 and keep these performances or get even better once ?
 
 I know this is a long message but if you have any comment or insight even on 
 part of it, don't hesitate to share it. I guess this kind of comment on 
 configuration is usable by the entire community.
 
 Alain

IllegalArgumentException on query with AbstractCompositeType

2013-07-11 Thread Pruner, Anne (Anne)

Hi,
I've been tearing my hair out trying to figure out why this 
query fails.  In fact, it only fails on machines with slower CPUs and after 
having previously run some other junit tests.  I'm running junits to an 
embedded Cassandra server, which works well in pretty much all other cases, but 
this one is flaky.  I've tried to rule out timing issues by placing a 10 second 
delay just before this query, just in case somehow the data isn't getting into 
the db in a timely manner, but that doesn't have any effect.  I've also tried 
removing the ORDER BY clause, which seems to be the place in the code it's 
getting hung up on, but that also doesn't have any effect.  The ALLOW 
FILTERING clause also has no effect.

DEBUG [Native-Transport-Requests:16] 2013-07-10 16:28:21,993 Message.java (line 
277) Received: QUERY SELECT * FROM conv_msgdata_by_participant_cql WHERE 
entityConversationId='bulktestfromus...@test.cacontact_811b5efc-b621-4361-9dc9-2e4755be7d89'
 AND messageId'2013-07-10T20:29:09.773Zzz' ORDER BY messageId DESC LIMIT 
15 ALLOW FILTERING;
ERROR [ReadStage:34] 2013-07-10 16:28:21,995 CassandraDaemon.java (line 132) 
Exception in thread Thread[ReadStage:34,5,main]
java.lang.RuntimeException: java.lang.IllegalArgumentException
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1582)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:247)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:51)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:78)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:31)
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader$BlockFetcher.isColumnBeforeSliceFinish(IndexedSliceReader.java:216)
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader$SimpleBlockFetcher.init(IndexedSliceReader.java:450)
at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.init(IndexedSliceReader.java:85)
at 
org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:68)
at 
org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:44)
at 
org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:101)
at 
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68)
at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:275)
at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132)
at org.apache.cassandra.db.Table.getRow(Table.java:355)
at 
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70)
at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052)
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578)

Here's the table it's querying from:

CREATE TABLE conv_msgdata_by_participant_cql (
entityConversationId text,
messageId text,
jsonMessage text,
msgReadFlag boolean,
msgReadDate text,
PRIMARY KEY (entityConversationId, messageId)
) ;

CREATE INDEX ON conv_msgdata_by_participant_cql(msgReadFlag);


Any ideas?

Thanks,
Anne

Re: Alternate major compaction

2013-07-11 Thread srmore

Thanks Takenori,
Looks like the tool provides some good info that people can use. It would
be great if you can share it with the community.



On Thu, Jul 11, 2013 at 6:51 AM, Takenori Sato ts...@cloudian.com wrote:

 Hi,

 I think it is a common headache for users running a large Cassandra
 cluster in production.


 Running a major compaction is not the only cause, but more. For example, I
 see two typical scenario.

 1. backup use case
 2. active wide row

 In the case of 1, say, one data is removed a year later. This means,
 tombstone on the row is 1 year away from the original row. To remove an
 expired row entirely, a compaction set has to include all the rows. So,
 when do the original, 1 year old row, and the tombstoned row are included
 in a compaction set? It is likely to take one year.

 In the case of 2, such an active wide row exists in most of sstable files.
 And it typically contains many expired columns. But none of them wouldn't
 be removed entirely because a compaction set practically do not include all
 the row fragments.


 Btw, there is a very convenient MBean API is available. It is
 CompactionManager's forceUserDefinedCompaction. You can invoke a minor
 compaction on a file set you define. So the question is how to find an
 optimal set of sstable files.

 Then, I wrote a tool to check garbage, and print outs some useful
 information to find such an optimal set.

 Here's a simple log output.

 # /opt/cassandra/bin/checksstablegarbage -e 
 /cassandra_data/UserData/Test5_BLOB-hc-4-Data.db
 [Keyspace, ColumnFamily, gcGraceSeconds(gcBefore)] = [UserData, Test5_BLOB, 
 300(1373504071)]
 ===
 ROW_KEY, TOTAL_SIZE, COMPACTED_SIZE, TOMBSTONED, EXPIRED, 
 REMAINNING_SSTABLE_FILES
 ===
 hello5/100.txt.1373502926003, 40, 40, YES, YES, Test5_BLOB-hc-3-Data.db
 ---
 TOTAL, 40, 40
 ===

 REMAINNING_SSTABLE_FILES means any other sstable files that contain the
 respective row. So, the following is an optimal set.

 # /opt/cassandra/bin/checksstablegarbage -e 
 /cassandra_data/UserData/Test5_BLOB-hc-4-Data.db 
 /cassandra_data/UserData/Test5_BLOB-hc-3-Data.db
 [Keyspace, ColumnFamily, gcGraceSeconds(gcBefore)] = [UserData, Test5_BLOB, 
 300(1373504131)]
 ===
 ROW_KEY, TOTAL_SIZE, COMPACTED_SIZE, TOMBSTONED, EXPIRED, 
 REMAINNING_SSTABLE_FILES
 ===
 hello5/100.txt.1373502926003, 223, 0, YES, YES
 ---
 TOTAL, 223, 0
 ===

 This tool relies on SSTableReader and an aggregation iterator as Cassandra
 does in compaction. I was considering to share this with the community. So
 let me know if anyone is interested.

 Ah, note that it is based on 1.0.7. So I will need to check and update for
 newer versions.

 Thanks,
 Takenori


 On Thu, Jul 11, 2013 at 6:46 PM, Tomàs Núnez tomas.nu...@groupalia.comwrote:

 Hi

 About a year ago, we did a major compaction in our cassandra cluster (a
 n00b mistake, I know), and since then we've had huge sstables that never
 get compacted, and we were condemned to repeat the major compaction process
 every once in a while (we are using SizeTieredCompaction strategy, and
 we've not avaluated yet LeveledCompaction, because it has its downsides,
 and we've had no time to test all of them in our environment).

 I was trying to find a way to solve this situation (that is, do something
 like a major compaction that writes small sstables, not huge as major
 compaction does), and I couldn't find it in the documentation. I tried
 cleanup and scrub/upgradesstables, but they don't do that (as documentation
 states). Then I tried deleting all data in a node and then bootstrapping it
 (or nodetool rebuild-ing it), hoping that this way the sstables would get
 cleaned from deleted records and updates. But the deleted node just copied
 the sstables from another node as they were, cleaning nothing.

 So I tried a new approach: I switched the sstable compaction strategy
 (SizeTiered to Leveled), forcing the sstables to be rewritten from scratch,
 and then switching it back (Leveled to SizeTiered). It took a while (but so
 do the major compaction process) and it worked, I have smaller sstables,
 and I've regained a lot of disk space.

 I'm happy with the results, but it doesn't seem a orthodox way of
 cleaning the sstables. What do you think, is it something wrong or crazy?
 Is there a different way to achieve the same thing?

 Let's put an example:
 Suppose you have a

Re: Alternate major compaction

2013-07-11 Thread Brian Tarbox

Perhaps I should already know this but why is running a major compaction
considered so bad?  We're running 1.1.6.

Thanks.


On Thu, Jul 11, 2013 at 7:51 AM, Takenori Sato ts...@cloudian.com wrote:

 Hi,

 I think it is a common headache for users running a large Cassandra
 cluster in production.


 Running a major compaction is not the only cause, but more. For example, I
 see two typical scenario.

 1. backup use case
 2. active wide row

 In the case of 1, say, one data is removed a year later. This means,
 tombstone on the row is 1 year away from the original row. To remove an
 expired row entirely, a compaction set has to include all the rows. So,
 when do the original, 1 year old row, and the tombstoned row are included
 in a compaction set? It is likely to take one year.

 In the case of 2, such an active wide row exists in most of sstable files.
 And it typically contains many expired columns. But none of them wouldn't
 be removed entirely because a compaction set practically do not include all
 the row fragments.


 Btw, there is a very convenient MBean API is available. It is
 CompactionManager's forceUserDefinedCompaction. You can invoke a minor
 compaction on a file set you define. So the question is how to find an
 optimal set of sstable files.

 Then, I wrote a tool to check garbage, and print outs some useful
 information to find such an optimal set.

 Here's a simple log output.

 # /opt/cassandra/bin/checksstablegarbage -e 
 /cassandra_data/UserData/Test5_BLOB-hc-4-Data.db
 [Keyspace, ColumnFamily, gcGraceSeconds(gcBefore)] = [UserData, Test5_BLOB, 
 300(1373504071)]
 ===
 ROW_KEY, TOTAL_SIZE, COMPACTED_SIZE, TOMBSTONED, EXPIRED, 
 REMAINNING_SSTABLE_FILES
 ===
 hello5/100.txt.1373502926003, 40, 40, YES, YES, Test5_BLOB-hc-3-Data.db
 ---
 TOTAL, 40, 40
 ===

 REMAINNING_SSTABLE_FILES means any other sstable files that contain the
 respective row. So, the following is an optimal set.

 # /opt/cassandra/bin/checksstablegarbage -e 
 /cassandra_data/UserData/Test5_BLOB-hc-4-Data.db 
 /cassandra_data/UserData/Test5_BLOB-hc-3-Data.db
 [Keyspace, ColumnFamily, gcGraceSeconds(gcBefore)] = [UserData, Test5_BLOB, 
 300(1373504131)]
 ===
 ROW_KEY, TOTAL_SIZE, COMPACTED_SIZE, TOMBSTONED, EXPIRED, 
 REMAINNING_SSTABLE_FILES
 ===
 hello5/100.txt.1373502926003, 223, 0, YES, YES
 ---
 TOTAL, 223, 0
 ===

 This tool relies on SSTableReader and an aggregation iterator as Cassandra
 does in compaction. I was considering to share this with the community. So
 let me know if anyone is interested.

 Ah, note that it is based on 1.0.7. So I will need to check and update for
 newer versions.

 Thanks,
 Takenori


 On Thu, Jul 11, 2013 at 6:46 PM, Tomàs Núnez tomas.nu...@groupalia.comwrote:

 Hi

 About a year ago, we did a major compaction in our cassandra cluster (a
 n00b mistake, I know), and since then we've had huge sstables that never
 get compacted, and we were condemned to repeat the major compaction process
 every once in a while (we are using SizeTieredCompaction strategy, and
 we've not avaluated yet LeveledCompaction, because it has its downsides,
 and we've had no time to test all of them in our environment).

 I was trying to find a way to solve this situation (that is, do something
 like a major compaction that writes small sstables, not huge as major
 compaction does), and I couldn't find it in the documentation. I tried
 cleanup and scrub/upgradesstables, but they don't do that (as documentation
 states). Then I tried deleting all data in a node and then bootstrapping it
 (or nodetool rebuild-ing it), hoping that this way the sstables would get
 cleaned from deleted records and updates. But the deleted node just copied
 the sstables from another node as they were, cleaning nothing.

 So I tried a new approach: I switched the sstable compaction strategy
 (SizeTiered to Leveled), forcing the sstables to be rewritten from scratch,
 and then switching it back (Leveled to SizeTiered). It took a while (but so
 do the major compaction process) and it worked, I have smaller sstables,
 and I've regained a lot of disk space.

 I'm happy with the results, but it doesn't seem a orthodox way of
 cleaning the sstables. What do you think, is it something wrong or crazy?
 Is there a different way to achieve the same thing?

 Let's put an example:
 Suppose you have a write-only

Re: Cassandra performance tuning...

2013-07-11 Thread Eric Stevens

You should be able to set the key_validation_class on the column family to
use a different data type for the row keys.  You may not be able to change
this for a CF with existing data without some troubles due to a mismatch of
data types; if that's a concern you'll have to create a separate CF and
migrate your data.


On Wed, Jul 10, 2013 at 2:20 PM, Tony Anecito adanec...@yahoo.com wrote:

 Hi All,

 I am trying to compare Cassandra to another relational database. I am
 getting around 2-3msec response time using Datastax driver, Java 1.7.0_05
 64-bit jre and the other database is under 500 microseconds for the jdbc
 SQL preparedStatement execute.. One of the major differences is Cassandra
 uses text for the default primary key in the Column family and the SQL
 table I use int which is faster. Can the primary column family key data
 type be changed to a int? I also know Casandra uses varint for IntegerType
 and not sure that will be what I need but I will try it if I can change
 key column to that. If I try Int32Type for the primary key I suspect I
 will need to reload the data after that change.

 I have looked at the default Java Options in the Cassandra bat file and
 they seem a good starting point but I am just starting to tune now that I
 can get Column Family caching to work.

 Regards,
 -Tony

Re: Alternate major compaction

2013-07-11 Thread Michael Theroux

Information is only deleted from Cassandra during a compaction.  Using 
SizeTieredCompaction, compaction only occurs when a number of similarly sized 
sstables are combined into a new sstable.  

When you perform a major compaction, all sstables are combined into one, very 
large, sstable.  As a result, any tombstoned data in that large sstable will 
only be removed when a number of very large sstable exists.  This means 
tombstoned data maybe trapped in that sstable for a very long time (or 
indefinitely depending on your usecase).

-Mike

On Jul 11, 2013, at 9:31 AM, Brian Tarbox wrote:

 Perhaps I should already know this but why is running a major compaction 
 considered so bad?  We're running 1.1.6.
 
 Thanks.
 
 
 On Thu, Jul 11, 2013 at 7:51 AM, Takenori Sato ts...@cloudian.com wrote:
 Hi,
 
 I think it is a common headache for users running a large Cassandra cluster 
 in production.
 
 
 Running a major compaction is not the only cause, but more. For example, I 
 see two typical scenario.
 
 1. backup use case
 2. active wide row
 
 In the case of 1, say, one data is removed a year later. This means, 
 tombstone on the row is 1 year away from the original row. To remove an 
 expired row entirely, a compaction set has to include all the rows. So, when 
 do the original, 1 year old row, and the tombstoned row are included in a 
 compaction set? It is likely to take one year.
 
 In the case of 2, such an active wide row exists in most of sstable files. 
 And it typically contains many expired columns. But none of them wouldn't be 
 removed entirely because a compaction set practically do not include all the 
 row fragments.
 
 
 Btw, there is a very convenient MBean API is available. It is 
 CompactionManager's forceUserDefinedCompaction. You can invoke a minor 
 compaction on a file set you define. So the question is how to find an 
 optimal set of sstable files.
 
 Then, I wrote a tool to check garbage, and print outs some useful information 
 to find such an optimal set.
 
 Here's a simple log output.
 
 # /opt/cassandra/bin/checksstablegarbage -e 
 /cassandra_data/UserData/Test5_BLOB-hc-4-Data.db
 [Keyspace, ColumnFamily, gcGraceSeconds(gcBefore)] = [UserData, Test5_BLOB, 
 300(1373504071)]
 ===
 ROW_KEY, TOTAL_SIZE, COMPACTED_SIZE, TOMBSTONED, EXPIRED, 
 REMAINNING_SSTABLE_FILES
 ===
 hello5/100.txt.1373502926003, 40, 40, YES, YES, Test5_BLOB-hc-3-Data.db
 ---
 TOTAL, 40, 40
 ===
 REMAINNING_SSTABLE_FILES means any other sstable files that contain the 
 respective row. So, the following is an optimal set.
 
 # /opt/cassandra/bin/checksstablegarbage -e 
 /cassandra_data/UserData/Test5_BLOB-hc-4-Data.db 
 /cassandra_data/UserData/Test5_BLOB-hc-3-Data.db 
 [Keyspace, ColumnFamily, gcGraceSeconds(gcBefore)] = [UserData, Test5_BLOB, 
 300(1373504131)]
 ===
 ROW_KEY, TOTAL_SIZE, COMPACTED_SIZE, TOMBSTONED, EXPIRED, 
 REMAINNING_SSTABLE_FILES
 ===
 hello5/100.txt.1373502926003, 223, 0, YES, YES
 ---
 TOTAL, 223, 0
 ===
 This tool relies on SSTableReader and an aggregation iterator as Cassandra 
 does in compaction. I was considering to share this with the community. So 
 let me know if anyone is interested.
 
 Ah, note that it is based on 1.0.7. So I will need to check and update for 
 newer versions.
 
 Thanks,
 Takenori
 
 
 On Thu, Jul 11, 2013 at 6:46 PM, Tomàs Núnez tomas.nu...@groupalia.com 
 wrote:
 Hi
 
 About a year ago, we did a major compaction in our cassandra cluster (a n00b 
 mistake, I know), and since then we've had huge sstables that never get 
 compacted, and we were condemned to repeat the major compaction process every 
 once in a while (we are using SizeTieredCompaction strategy, and we've not 
 avaluated yet LeveledCompaction, because it has its downsides, and we've had 
 no time to test all of them in our environment).
 
 I was trying to find a way to solve this situation (that is, do something 
 like a major compaction that writes small sstables, not huge as major 
 compaction does), and I couldn't find it in the documentation. I tried 
 cleanup and scrub/upgradesstables, but they don't do that (as documentation 
 states). Then I tried deleting all data in a node and then bootstrapping it 
 (or nodetool rebuild-ing it), hoping that this way the sstables would get 
 cleaned from deleted records and updates. But the deleted node just copied 
 the

Re: High performance hardware with lot of data per node - Global learning about configuration

2013-07-11 Thread Mike Heffner

We've also noticed very good read and write latencies with the hi1.4xls
compared to our previous instance classes. We actually ran a mixed cluster
of hi1.4xls and m2.4xls to watch side-by-side comparison.

Despite the significant improvement in underlying hardware, we've noticed
that streaming performance with 1.2.6+vnodes is a lot slower than we would
expect. Bootstrapping a node into a ring with large storage loads can take
6+ hours. We have a JIRA open that describes our current config:
https://issues.apache.org/jira/browse/CASSANDRA-5726

Aiman: We also use tablesnap for our backups. We're using a slightly
modified version [1]. We currently backup every sst as soon as they hit
disk (tablesnap's inotify), but we're considering moving to a periodic
snapshot approach as the sst churn after going from 24 nodes - 6 nodes is
quite high.

Mike


[1]: https://github.com/librato/tablesnap


On Thu, Jul 11, 2013 at 7:33 AM, Aiman Parvaiz ai...@grapheffect.comwrote:

 Hi,
 We also recently migrated to 3 hi.4xlarge boxes(Raid0 SSD) and the disk IO
 performance is definitely better than the earlier non SSD servers, we are
 serving up to 14k reads/s with a latency of 3-3.5 ms/op.
 I wanted to share our config options and ask about the data back up
 strategy for Raid0.

 We are using C* 1.2.6 with

 key_chache and row_cache of 300MB
 I have not changed/ modified any other parameter except for going with
 multithreaded GC. I will be playing around with other factors and update
 everyone if I find something interesting.

 Also, just wanted to share backup strategy and see if I can get something
 useful from how others are taking backup of their raid0. I am using
 tablesnap to upload SSTables to s3 and I have attached a separate EBS
 volume to every box and have set up rsync to mirror Cassandra data from
 Raid0 to EBS. I would really appreciate if you guys can share how you
 taking backups.

 Thanks


 On Jul 9, 2013, at 7:11 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

  Hi,
 
  Using C*1.2.2.
 
  We recently dropped our 18 m1.xLarge (4CPU, 15GB RAM, 4 Raid-0 Disks)
 servers to get 3 hi1.4xLarge (16CPU, 60GB RAM, 2 Raid-0 SSD) servers
 instead, for about the same price.
 
  We tried it after reading some benchmark published by Netflix.
 
  It is awesome and I recommend it to anyone who is using more than 18
 xLarge server or can afford these high cost / high performance EC2
 instances. SSD gives a very good throughput with an awesome latency.
 
  Yet, we had about 200 GB data per server and now about 1 TB.
 
  To alleviate memory pressure inside the heap I had to reduce the index
 sampling. I changed the index_interval value from 128 to 512, with no
 visible impact on latency, but a great improvement inside the heap which
 doesn't complain about any pressure anymore.
 
  Is there some more tuning I could use, more tricks that could be useful
 while using big servers, with a lot of data per node and relatively high
 throughput ?
 
  SSD are at 20-40 % of their throughput capacity (according to
 OpsCenter), CPU almost never reach a bigger load than 5 or 6 (with 16 CPU),
 15 GB RAM used out of 60GB.
 
  At this point I have kept my previous configuration, which is almost the
 default one from the Datastax community AMI. There is a part of it, you can
 consider that any property that is not in here is configured as default :
 
  cassandra.yaml
 
  key_cache_size_in_mb: (empty) - so default - 100MB (hit rate between 88
 % and 92 %, good enough ?)
  row_cache_size_in_mb: 0 (not usable in our use case, a lot of different
 and random reads)
  flush_largest_memtables_at: 0.80
  reduce_cache_sizes_at: 0.90
 
  concurrent_reads: 32 (I am thinking to increase this to 64 or more since
 I have just a few servers to handle more concurrence)
  concurrent_writes: 32 (I am thinking to increase this to 64 or more too)
  memtable_total_space_in_mb: 1024 (to avoid having a full heap, shoul I
 use bigger value, why for ?)
 
  rpc_server_type: sync (I tried hsha and had the ERROR 12:02:18,971 Read
 an invalid frame size of 0. Are you using TFramedTransport on the client
 side? error). No idea how to fix this, and I use 5 different clients for
 different purpose  (Hector, Cassie, phpCassa, Astyanax, Helenus)...
 
  multithreaded_compaction: false (Should I try enabling this since I now
 use SSD ?)
  compaction_throughput_mb_per_sec: 16 (I will definitely up this to 32 or
 even more)
 
  cross_node_timeout: true
  endpoint_snitch: Ec2MultiRegionSnitch
 
  index_interval: 512
 
  cassandra-env.sh
 
  I am not sure about how to tune the heap, so I mainly use defaults
 
  MAX_HEAP_SIZE=8G
  HEAP_NEWSIZE=400M (I tried with higher values, and it produced bigger
 GC times (1600 ms instead of  200 ms now with 400M)
 
  -XX:+UseParNewGC
  -XX:+UseConcMarkSweepGC
  -XX:+CMSParallelRemarkEnabled
  -XX:SurvivorRatio=8
  -XX:MaxTenuringThreshold=1
  -XX:CMSInitiatingOccupancyFraction=70
  -XX:+UseCMSInitiatingOccupancyOnly
 
  Does this configuration

Token Aware Routing: Routing Key Vs Composite Key with vnodes

2013-07-11 Thread Haithem Jarraya

Hi All,

I am a bit confused on how the underlying token aware routing is working in
the case of composite key.
Let's say I have a column family like this USERS( uuid userId, text
firstname, text lastname, int age, PRIMARY KEY(userId, firstname, lastname))

My question is do we need to have the values of the userId, firstName and
lastName available in the same time to create the token from the composite
key, or we can get the right token just by looking at the routing key
userId?

Looking at the datastax driver code, is a bit confusing, it seems that it
calculate the token only when all the values of a composite key is
available, or I am missing something?

Thanks,

Haithem

Re: Working with libcql

2013-07-11 Thread Sorin Manolache

On 2013-07-09 11:46, Shubham Mittal wrote:

yeah I tried that and below is the output I get

LOG: resolving remote host localhost:9160

libcql is an implementation for the new binary transport protocol:
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=doc/native_protocol.spec;hb=refs/heads/cassandra-1.2

It is not a thrift transport.

By default it uses the 9042 port. You'll have to activate it on the
server. Write (or uncomment) start_native_transport: true in
conf/cassandra.yaml.

According to the posted log, you connect to the thrift transport port,
9160. As you send a frame of the new transport protocol to the old
thrift protocol, the server does not understand it and closes your
connection.

Regards,
Sorin

LOG: resolved remote host, attempting to connect
LOG: connection successful to remote host
LOG: sending message: 0x0105 {version: 0x01, flags: 0x00,
stream: 0x00, opcode: 0x05, length: 0} OPTIONS
LOG: wrote to socket 8 bytes
LOG: error reading header End of file

and I checked all the keyspaces in my cluster, it changes nothing in the
cluster.

I couldn't understand the code much. What is this code supposed to do
anyways?

On Tue, Jul 9, 2013 at 4:20 AM, aaron morton aa...@thelastpickle.com
mailto:aa...@thelastpickle.com wrote:

Did you see the demo app ?
Seems to have a few examples of reading data.

https://github.com/mstump/libcql/blob/master/demo/main.cpp#L85

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 9/07/2013, at 1:14 AM, Shubham Mittal smsmitta...@gmail.com
mailto:smsmitta...@gmail.com wrote:

Hi,

I found out that there exist a C++ client libcql for cassandra but
its github repository just provides the example on how to connect
to cassandra. Is there anyone who has written some code
using libcql to read and write data to a cassandra DB, kindly
share it.

Thanks

Re: alter column family ?

2013-07-11 Thread Langston, Jim

Hi Rob,

Are the schema's held somewhere else ? Going through the
process that you sent, when I restart the nodes, the original
schema's show up (btw, you were correct on your assessment,
even though the schema shows they are the same with the
gossipinfo command, they are not the same when looking
at them with cassandra-cli, not even close on 2 of the nodes).
So, I went through the process of clearing out the system CF's,
in steps 4 and 5, when the cassandra's restarted two of them (the
ones with the incorrect schema's), complained about the schema
and loaded what looks like a generic one. But, all of them have
schemas and 2 are correct and one is not.

This means I cannot execute step 7 , since the schema now exists
with the name on all the nodes. For example, the incorrect schema is
called MySchema, after the restart and the messages complaining
about CF's not existing, there is a schema called MySchema, on 2 nodes
they are correct, on 2 nodes they are not.

I have also tried to force the node with the incorrect schema to come
up on its own by shutting down the cluster except for a node with a
correct schema. I went through the same steps and brought that
node down and back up, same results.

Thoughts ? ideas ?

Jim

From: Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tue, 9 Jul 2013 17:10:53 -0700
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: alter column family ?

On Tue, Jul 9, 2013 at 11:52 AM, Langston, Jim 
jim.langs...@compuware.commailto:jim.langs...@compuware.com wrote:

 On the command (4 node cluster):

 nodetool gossipinfo -h localhost |grep SCHEMA |sort | uniq -c | sort -n
   4   SCHEMA:60edeaa8-70a4-3825-90a5-d7746ffa8e4d

If your schemas actually agree (and given that you're in 1.1.2) you probably 
are encountering :

https://issues.apache.org/jira/browse/CASSANDRA-4432

Which is one of the 1.1.2 era schema stuck issues I was referring to earlier.

 On the second part, I have the same Cassandra version in staging and
 production, with staging being a smaller cluster. Not sure what you mean
 by nuking schema's (ie. delete directories ?)

I like when googling things returns related threads in which I have previously 
advised people to do a detailed list of things, heh :

http://mail-archives.apache.org/mod_mbox/cassandra-user/201208.mbox/%3CCAN1VBD-01aD7wT2w1eyY2KpHwcj+CoMjvE4=j5zaswybmw_...@mail.gmail.com%3E

Here's a slightly clarified version of these steps...

0. Dump your existing schema to schema_definition_file
1. Take all nodes out of service;
2. Run nodetool drain on each and verify that they have drained (grep -i 
DRAINED system.log)
3. Stop cassandra on each node;
4. Move /var/lib/cassandra/data/system out of the way
5. Move /var/lib/cassandra/saved_caches/system-* out of the way
6. Start all nodes;
7. cassandra-cli  schema_definition_file on one node only. (includes
create keyspace and create column familiy entries)

Note: you should not literally do this, you should break your 
schema_definition_file into individual statements and wait until schema 
agreement between each DDL statement.

8. Put the nodes back in service.
9. Done.

=Rob

Re: alter column family ?

2013-07-11 Thread Robert Coli

On Thu, Jul 11, 2013 at 9:17 AM, Langston, Jim
jim.langs...@compuware.comwrote:

  Are the schema's held somewhere else ? Going through the
 process that you sent, when I restart the nodes, the original
 schema's show up


If you do not stop all nodes at once and then remove the system CFs, the
existing schema will re-propogate via Gossip.

To be clear, I was suggesting that you dump the schema with cassandra-cli,
erase the current schema with the cluster down, bring the cluster back up
(NOW WITH NO SCHEMA) and then load the schema from the dump via
cassandra-cli.

Also, in case I didn't mention it before, you should upgrade your version
of Cassandra ASAP. :)

=Rob

Re: Logging Cassandra Reads/Writes

2013-07-11 Thread hajjat

Aaron,

Thanks for the references! I'll try the things you mentioned and see how it
goes!

Best,
Mohammad


On Wed, Jul 10, 2013 at 8:07 PM, aaron morton [via
cassandra-u...@incubator.apache.org] 
ml-node+s3065146n7588930...@n2.nabble.com wrote:

 Some info on request tracing
 http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2

 1) Is it possible to log which node provides the real data in a read
 operation?

 It's available at the DEBUG level of logging. You probably just want to
 enable it on the org.apache.cassandra.db.StorageProxy class, see
 log4j-server.properties for info

 2) Also, is it possible to log the different delays involved in each
 operation-- for example, 0.1 seconds to get digests from all nodes, 1
 second
 to transfer data, etc.?

 Not Applicable as you've seen, we request to all replicas at the same
 time. There is more logging that will show when the responses are
 processed, try turning DEBUG logging on for a small 3 node cluster and send
 one request.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 10/07/2013, at 8:58 AM, Mohit Anchlia [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=7588930i=0
 wrote:

 There is a new tracing feature in Cassandra 1.2 that might help you with
 this.

 On Tue, Jul 9, 2013 at 1:31 PM, Blair Zajac [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=7588930i=1
  wrote:

 No idea on the logging, I'm pretty new to Cassandra.

 Regards,
 Blair

 On Jul 9, 2013, at 12:50 PM, hajjat [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=7588930i=2
 wrote:

  Blair, thanks for the clarification! My friend actually just told me the
  same..
 
  Any idea on how to do logging??
 
  Thanks!
 
 
 
  --
  View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Logging-Cassandra-Reads-Writes-tp7588893p7588896.html
   Sent from the [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=7588930i=3mailing list 
 archive at
 Nabble.com.
 





 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Logging-Cassandra-Reads-Writes-tp7588893p7588930.html
  To unsubscribe from Logging Cassandra Reads/Writes, click 
 herehttp://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=7588893code=aGFqamF0QHB1cmR1ZS5lZHV8NzU4ODg5M3w4NTA5MDAwMjU=
 .
 NAMLhttp://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
*Mohammad Hajjat*
*Ph.D. Student*
*Electrical and Computer Engineering*
*Purdue University*




--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Logging-Cassandra-Reads-Writes-tp7588893p7588957.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Alternate major compaction

2013-07-11 Thread Robert Coli

On Thu, Jul 11, 2013 at 2:46 AM, Tomàs Núnez tomas.nu...@groupalia.comwrote:

 Hi

 About a year ago, we did a major compaction in our cassandra cluster (a
 n00b mistake, I know), and since then we've had huge sstables that never
 get compacted, and we were condemned to repeat the major compaction process
 every once in a while (we are using SizeTieredCompaction strategy, and
 we've not avaluated yet LeveledCompaction, because it has its downsides,
 and we've had no time to test all of them in our environment).

 I was trying to find a way to solve this situation (that is, do something
 like a major compaction that writes small sstables, not huge as major
 compaction does), and I couldn't find it in the documentation.


https://github.com/pcmanus/cassandra/tree/sstable_split

1) run sstable_split on One Big SSTable (being careful to avoid name
collisions if done with node running)
2) stop node
3) remove One Big SSTable
4) start node

This approach is significantly more i/o efficient than your online
solution, but does require a node restart and messing around directly with
SSTables. Your online solution is clever!

If you choose to use this tool, please let us know the result. With some
feedback, pcmanus (Sylvain) is likely to merge it into Cassandra as a
useful tool for dealing with for example this situation.

=Rob

Re: node tool ring displays 33.33% owns on 3 node cluster with replication

2013-07-11 Thread Jason Tyler

Thanks Rob!  I was able to confirm with getendpoints.

Cheers,

~Jason

From: Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, July 10, 2013 4:09 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Cc: Francois Richard frich...@yahoo-inc.commailto:frich...@yahoo-inc.com
Subject: Re: node tool ring displays 33.33% owns on 3 node cluster with 
replication

On Wed, Jul 10, 2013 at 4:04 PM, Jason Tyler 
jaty...@yahoo-inc.commailto:jaty...@yahoo-inc.com wrote:
Is this simply a display issue, or have I lost replication?

Almost certainly just a display issue. Do nodetool -h localhost getendpoints 
keyspace columnfamily 0, which will tell you the endpoints for the 
non-transformed key 0. It should give you 3 endpoints. You could also do this 
test with a known existing key and then go to those nodes and verify that they 
have that data on disk via sstable2json.

(FWIW, it is an odd display issue/bug if it is one. Because it has reverted to 
pre-1.1 behavior...)

=Rob

Re: alter column family ?

2013-07-11 Thread Langston, Jim

Yes, I got the gist of what you were after, even making sure I broke
out the schema dump and load them in individually, but I haven't
gotten that far. It feels like the 2 node that are not coming up with
the right schema are not seeing the nodes with the correct ones.

And yes, I hear the beat of the upgrade drum, I was hoping to
do one step at a time so I don't carry my problem over.

Jim

From: Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Thu, 11 Jul 2013 09:43:43 -0700
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: alter column family ?

On Thu, Jul 11, 2013 at 9:17 AM, Langston, Jim 
jim.langs...@compuware.commailto:jim.langs...@compuware.com wrote:
Are the schema's held somewhere else ? Going through the
process that you sent, when I restart the nodes, the original
schema's show up

If you do not stop all nodes at once and then remove the system CFs, the 
existing schema will re-propogate via Gossip.

To be clear, I was suggesting that you dump the schema with cassandra-cli, 
erase the current schema with the cluster down, bring the cluster back up (NOW 
WITH NO SCHEMA) and then load the schema from the dump via cassandra-cli.

Also, in case I didn't mention it before, you should upgrade your version of 
Cassandra ASAP. :)

=Rob

Re: alter column family ?

2013-07-11 Thread Robert Coli

On Thu, Jul 11, 2013 at 10:16 AM, Langston, Jim
jim.langs...@compuware.comwrote:

  It feels like the 2 node that are not coming up with
 the right schema are not seeing the nodes with the correct ones.


At the time that the nodes come up, they should have no schema other than
the system columnfamilies. Only once all 3 nodes see each other should you
be re-creating the schema. I'm not understanding your above sentence in
light of this?

=Rob

Re: High performance hardware with lot of data per node - Global learning about configuration

2013-07-11 Thread Aiman Parvaiz

Thanks for the info Mike, we ran in to a race condition which was killing table 
snap, I want to share the problem and the solution/ work around and may be 
someone can throw some light on the effects of the solution.

tablesnap was getting killed with this error message:

Failed uploading %s. Aborting.\n%s 

Looking at the code it took me to the following:

def worker(self):
bucket = self.get_bucket()

while True:
f = self.fileq.get()
keyname = self.build_keyname(f)
try:
self.upload_sstable(bucket, keyname, f)
except:
self.log.critical(Failed uploading %s. Aborting.\n%s %
 (f, format_exc()))
# Brute force kill self
os.kill(os.getpid(), signal.SIGKILL)

self.fileq.task_done()

It builds the filename and then before it could upload it, the file disappears 
(which is possible), I simply commented out the line which kills tablesnap if 
the file is not found, it fixes the issue we were having but I would appreciate 
if some one has any insights on any ill effects this might have on backup or 
restoration process.

Thanks


On Jul 11, 2013, at 7:03 AM, Mike Heffner m...@librato.com wrote:

 We've also noticed very good read and write latencies with the hi1.4xls 
 compared to our previous instance classes. We actually ran a mixed cluster of 
 hi1.4xls and m2.4xls to watch side-by-side comparison.
 
 Despite the significant improvement in underlying hardware, we've noticed 
 that streaming performance with 1.2.6+vnodes is a lot slower than we would 
 expect. Bootstrapping a node into a ring with large storage loads can take 6+ 
 hours. We have a JIRA open that describes our current config: 
 https://issues.apache.org/jira/browse/CASSANDRA-5726
 
 Aiman: We also use tablesnap for our backups. We're using a slightly modified 
 version [1]. We currently backup every sst as soon as they hit disk 
 (tablesnap's inotify), but we're considering moving to a periodic snapshot 
 approach as the sst churn after going from 24 nodes - 6 nodes is quite high.
 
 Mike
 
 
 [1]: https://github.com/librato/tablesnap
 
 
 On Thu, Jul 11, 2013 at 7:33 AM, Aiman Parvaiz ai...@grapheffect.com wrote:
 Hi,
 We also recently migrated to 3 hi.4xlarge boxes(Raid0 SSD) and the disk IO 
 performance is definitely better than the earlier non SSD servers, we are 
 serving up to 14k reads/s with a latency of 3-3.5 ms/op.
 I wanted to share our config options and ask about the data back up strategy 
 for Raid0.
 
 We are using C* 1.2.6 with
 
 key_chache and row_cache of 300MB
 I have not changed/ modified any other parameter except for going with 
 multithreaded GC. I will be playing around with other factors and update 
 everyone if I find something interesting.
 
 Also, just wanted to share backup strategy and see if I can get something 
 useful from how others are taking backup of their raid0. I am using tablesnap 
 to upload SSTables to s3 and I have attached a separate EBS volume to every 
 box and have set up rsync to mirror Cassandra data from Raid0 to EBS. I would 
 really appreciate if you guys can share how you taking backups.
 
 Thanks
 
 
 On Jul 9, 2013, at 7:11 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:
 
  Hi,
 
  Using C*1.2.2.
 
  We recently dropped our 18 m1.xLarge (4CPU, 15GB RAM, 4 Raid-0 Disks) 
  servers to get 3 hi1.4xLarge (16CPU, 60GB RAM, 2 Raid-0 SSD) servers 
  instead, for about the same price.
 
  We tried it after reading some benchmark published by Netflix.
 
  It is awesome and I recommend it to anyone who is using more than 18 xLarge 
  server or can afford these high cost / high performance EC2 instances. SSD 
  gives a very good throughput with an awesome latency.
 
  Yet, we had about 200 GB data per server and now about 1 TB.
 
  To alleviate memory pressure inside the heap I had to reduce the index 
  sampling. I changed the index_interval value from 128 to 512, with no 
  visible impact on latency, but a great improvement inside the heap which 
  doesn't complain about any pressure anymore.
 
  Is there some more tuning I could use, more tricks that could be useful 
  while using big servers, with a lot of data per node and relatively high 
  throughput ?
 
  SSD are at 20-40 % of their throughput capacity (according to OpsCenter), 
  CPU almost never reach a bigger load than 5 or 6 (with 16 CPU), 15 GB RAM 
  used out of 60GB.
 
  At this point I have kept my previous configuration, which is almost the 
  default one from the Datastax community AMI. There is a part of it, you can 
  consider that any property that is not in here is configured as default :
 
  cassandra.yaml
 
  key_cache_size_in_mb: (empty) - so default - 100MB (hit rate between 88 % 
  and 92 %, good enough ?)
  row_cache_size_in_mb: 0 (not usable in our use case, a lot of different and 
  random reads)
  flush_largest_memtables_at: 0.80
  reduce_cache_sizes_at: 0.90

Re: alter column family ?

2013-07-11 Thread Langston, Jim

Thanks Rob,

I went through the whole sequence again and now have gotten to the point of
being able to try and pull in the schema, but now getting this error from the 
one
node I'm executing on.

[default@unknown] create keyspace OTracker
...  with placement_strategy = 'SimpleStrategy'
...  and strategy_options = {replication_factor : 3}
...  and durable_writes = true;
9209ec36-3b3f-3e24-9dfb-8a45a5b29a2a
Waiting for schema agreement...
... schemas agree across the cluster
NotFoundException()
[default@unknown]


All the nodes see each other and are available, all only contain a system
schema, none have a OTracker schema


Jim

From: Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Thu, 11 Jul 2013 10:35:43 -0700
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: alter column family ?

On Thu, Jul 11, 2013 at 10:16 AM, Langston, Jim 
jim.langs...@compuware.commailto:jim.langs...@compuware.com wrote:
It feels like the 2 node that are not coming up with
the right schema are not seeing the nodes with the correct ones.

At the time that the nodes come up, they should have no schema other than the 
system columnfamilies. Only once all 3 nodes see each other should you be 
re-creating the schema. I'm not understanding your above sentence in light of 
this?

=Rob

Re: alter column family ?

2013-07-11 Thread Robert Coli

On Thu, Jul 11, 2013 at 11:00 AM, Langston, Jim
jim.langs...@compuware.comwrote:

  I went through the whole sequence again and now have gotten to the point
 of
 being able to try and pull in the schema, but now getting this error from
 the one
 node I'm executing on.
 [default@unknown] create keyspace OTracker
 9209ec36-3b3f-3e24-9dfb-8a45a5b29a2a
 Waiting for schema agreement...
 ... schemas agree across the cluster
 NotFoundException()


This is pretty unusual.


 All the nodes see each other and are available, all only contain a system
 schema, none have a OTracker schema


If you look in the logs for schema related stuff when you try to create
OTracker, what do you see?

Do you see the above UUID schema version in the logs?

At this point I am unable to suggest anything other than upgrading to the
head of 1.1 line and try to create your keyspace there. There should be no
chance of old state being implicated in your now stuck schema, so it seems
likely that the problem has re-occured due to the version of Cassandra you
are running.

Sorry I am unable to be of more assistance and that my advice appears to
have resulted in your cluster being in worse condition than when you
started. I probably mentioned but will do so again that if you have the old
system keyspace directories, you can stop cassandra on all nodes and then
revert to them.

=Rob

Re: alter column family ?

2013-07-11 Thread Langston, Jim

Was just looking at a bug with uppercase , could that be the error ?

And, yes, definitely saved off the original system keyspaces.

I'm tailing the logs when running the cassandra-cli, but I do not
see anything in the logs ..

Jim

From: Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Thu, 11 Jul 2013 11:07:55 -0700
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: alter column family ?

On Thu, Jul 11, 2013 at 11:00 AM, Langston, Jim 
jim.langs...@compuware.commailto:jim.langs...@compuware.com wrote:
I went through the whole sequence again and now have gotten to the point of
being able to try and pull in the schema, but now getting this error from the 
one
node I'm executing on.
[default@unknown] create keyspace OTracker
9209ec36-3b3f-3e24-9dfb-8a45a5b29a2a
Waiting for schema agreement...
... schemas agree across the cluster
NotFoundException()

This is pretty unusual.

All the nodes see each other and are available, all only contain a system
schema, none have a OTracker schema

If you look in the logs for schema related stuff when you try to create 
OTracker, what do you see?

Do you see the above UUID schema version in the logs?

At this point I am unable to suggest anything other than upgrading to the head 
of 1.1 line and try to create your keyspace there. There should be no chance of 
old state being implicated in your now stuck schema, so it seems likely that 
the problem has re-occured due to the version of Cassandra you are running.

Sorry I am unable to be of more assistance and that my advice appears to have 
resulted in your cluster being in worse condition than when you started. I 
probably mentioned but will do so again that if you have the old system 
keyspace directories, you can stop cassandra on all nodes and then revert to 
them.

=Rob

merge sstables

2013-07-11 Thread chandra Varahala

Hello ,

 I have small size of sstables like 5mb around 2000 files. Is there a way i
can merge into  bigger size ?

thanks
chandra

Re: merge sstables

2013-07-11 Thread Faraaz Sareshwala

I assume you are using the leveled compaction strategy because you have 5mb
sstables and 5mb is the default size for leveled compaction.

To change this default, you can run the following in the cassandra-cli:

update column family cf_name with compaction_strategy_options = 
{sstable_size_in_mb: 256};

To force the current sstables to be rewritten, I think you'll need to issue a
nodetool scrub on each node. Someone please correct me if I'm wrong on this.

Faraaz

On Thu, Jul 11, 2013 at 11:34:08AM -0700, chandra Varahala wrote:
 Hello ,
 
  I have small size of sstables like 5mb around 2000 files. Is there a way i 
 can
 merge into  bigger size ?
 
 thanks
 chandra

Re: merge sstables

2013-07-11 Thread chandra Varahala

yes, but nodetool  scrub is not working ..


thanks
chandra


On Thu, Jul 11, 2013 at 2:39 PM, Faraaz Sareshwala 
fsareshw...@quantcast.com wrote:

 I assume you are using the leveled compaction strategy because you have 5mb
 sstables and 5mb is the default size for leveled compaction.

 To change this default, you can run the following in the cassandra-cli:

 update column family cf_name with compaction_strategy_options =
 {sstable_size_in_mb: 256};

 To force the current sstables to be rewritten, I think you'll need to
 issue a
 nodetool scrub on each node. Someone please correct me if I'm wrong on
 this.

 Faraaz

 On Thu, Jul 11, 2013 at 11:34:08AM -0700, chandra Varahala wrote:
  Hello ,
 
   I have small size of sstables like 5mb around 2000 files. Is there a
 way i can
  merge into  bigger size ?
 
  thanks
  chandra

Re: High performance hardware with lot of data per node - Global learning about configuration

2013-07-11 Thread Mike Heffner

Aiman,

I believe that is one of the cases we added a check for:

https://github.com/librato/tablesnap/blob/master/tablesnap#L203-L207


Mike


On Thu, Jul 11, 2013 at 1:54 PM, Aiman Parvaiz ai...@grapheffect.comwrote:

 Thanks for the info Mike, we ran in to a race condition which was killing
 table snap, I want to share the problem and the solution/ work around and
 may be someone can throw some light on the effects of the solution.

 tablesnap was getting killed with this error message:

 Failed uploading %s. Aborting.\n%s

 Looking at the code it took me to the following:

 def worker(self):
 bucket = self.get_bucket()

 while True:
 f = self.fileq.get()
 keyname = self.build_keyname(f)
 try:
 self.upload_sstable(bucket, keyname, f)
 except:
 self.log.critical(Failed uploading %s. Aborting.\n%s %
  (f, format_exc()))
 # Brute force kill self
 os.kill(os.getpid(), signal.SIGKILL)

 self.fileq.task_done()


 It builds the filename and then before it could upload it, the file
 disappears (which is possible), I simply commented out the line which kills
 tablesnap if the file is not found, it fixes the issue we were having but I
 would appreciate if some one has any insights on any ill effects this might
 have on backup or restoration process.

 Thanks


 On Jul 11, 2013, at 7:03 AM, Mike Heffner m...@librato.com wrote:

 We've also noticed very good read and write latencies with the hi1.4xls
 compared to our previous instance classes. We actually ran a mixed cluster
 of hi1.4xls and m2.4xls to watch side-by-side comparison.

 Despite the significant improvement in underlying hardware, we've noticed
 that streaming performance with 1.2.6+vnodes is a lot slower than we would
 expect. Bootstrapping a node into a ring with large storage loads can take
 6+ hours. We have a JIRA open that describes our current config:
 https://issues.apache.org/jira/browse/CASSANDRA-5726

 Aiman: We also use tablesnap for our backups. We're using a slightly
 modified version [1]. We currently backup every sst as soon as they hit
 disk (tablesnap's inotify), but we're considering moving to a periodic
 snapshot approach as the sst churn after going from 24 nodes - 6 nodes is
 quite high.

 Mike


 [1]: https://github.com/librato/tablesnap


 On Thu, Jul 11, 2013 at 7:33 AM, Aiman Parvaiz ai...@grapheffect.comwrote:

 Hi,
 We also recently migrated to 3 hi.4xlarge boxes(Raid0 SSD) and the disk
 IO performance is definitely better than the earlier non SSD servers, we
 are serving up to 14k reads/s with a latency of 3-3.5 ms/op.
 I wanted to share our config options and ask about the data back up
 strategy for Raid0.

 We are using C* 1.2.6 with

 key_chache and row_cache of 300MB
 I have not changed/ modified any other parameter except for going with
 multithreaded GC. I will be playing around with other factors and update
 everyone if I find something interesting.

 Also, just wanted to share backup strategy and see if I can get something
 useful from how others are taking backup of their raid0. I am using
 tablesnap to upload SSTables to s3 and I have attached a separate EBS
 volume to every box and have set up rsync to mirror Cassandra data from
 Raid0 to EBS. I would really appreciate if you guys can share how you
 taking backups.

 Thanks


 On Jul 9, 2013, at 7:11 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

  Hi,
 
  Using C*1.2.2.
 
  We recently dropped our 18 m1.xLarge (4CPU, 15GB RAM, 4 Raid-0 Disks)
 servers to get 3 hi1.4xLarge (16CPU, 60GB RAM, 2 Raid-0 SSD) servers
 instead, for about the same price.
 
  We tried it after reading some benchmark published by Netflix.
 
  It is awesome and I recommend it to anyone who is using more than 18
 xLarge server or can afford these high cost / high performance EC2
 instances. SSD gives a very good throughput with an awesome latency.
 
  Yet, we had about 200 GB data per server and now about 1 TB.
 
  To alleviate memory pressure inside the heap I had to reduce the index
 sampling. I changed the index_interval value from 128 to 512, with no
 visible impact on latency, but a great improvement inside the heap which
 doesn't complain about any pressure anymore.
 
  Is there some more tuning I could use, more tricks that could be useful
 while using big servers, with a lot of data per node and relatively high
 throughput ?
 
  SSD are at 20-40 % of their throughput capacity (according to
 OpsCenter), CPU almost never reach a bigger load than 5 or 6 (with 16 CPU),
 15 GB RAM used out of 60GB.
 
  At this point I have kept my previous configuration, which is almost
 the default one from the Datastax community AMI. There is a part of it, you
 can consider that any property that is not in here is configured as default
 :
 
  cassandra.yaml
 
  key_cache_size_in_mb: (empty) - so default - 100MB (hit rate between 88
 %

Re: merge sstables

2013-07-11 Thread sankalp kohli

Scrub will keep the file size same. YOu need to move all sstables to be L0.
the way to do this is to remove the json file which has level information.


On Thu, Jul 11, 2013 at 11:48 AM, chandra Varahala 
hadoopandcassan...@gmail.com wrote:

 yes, but nodetool  scrub is not working ..


 thanks
 chandra


 On Thu, Jul 11, 2013 at 2:39 PM, Faraaz Sareshwala 
 fsareshw...@quantcast.com wrote:

 I assume you are using the leveled compaction strategy because you have
 5mb
 sstables and 5mb is the default size for leveled compaction.

 To change this default, you can run the following in the cassandra-cli:

 update column family cf_name with compaction_strategy_options =
 {sstable_size_in_mb: 256};

 To force the current sstables to be rewritten, I think you'll need to
 issue a
 nodetool scrub on each node. Someone please correct me if I'm wrong on
 this.

 Faraaz

 On Thu, Jul 11, 2013 at 11:34:08AM -0700, chandra Varahala wrote:
  Hello ,
 
   I have small size of sstables like 5mb around 2000 files. Is there a
 way i can
  merge into  bigger size ?
 
  thanks
  chandra

Re: merge sstables

2013-07-11 Thread Robert Coli

On Thu, Jul 11, 2013 at 1:52 PM, sankalp kohli kohlisank...@gmail.comwrote:

 Scrub will keep the file size same. YOu need to move all sstables to be
 L0. the way to do this is to remove the json file which has level
 information.


This will work, but I believe is subject to this?

./src/java/org/apache/cassandra/db/compaction/LeveledManifest.java line
228 of 577

// LevelDB gives each level a score of how much data it contains vs
its ideal amount, and
// compacts the level with the highest score. But this falls apart
spectacularly once you
// get behind.  Consider this set of levels:
// L0: 988 [ideal: 4]
// L1: 117 [ideal: 10]
// L2: 12  [ideal: 100]
//
// The problem is that L0 has a much higher score (almost 250) than
L1 (11), so what we'll
// do is compact a batch of MAX_COMPACTING_L0 sstables with all 117
L1 sstables, and put the
// result (say, 120 sstables) in L1. Then we'll compact the next
batch of MAX_COMPACTING_L0,
// and so forth.  So we spend most of our i/o rewriting the L1 data
with each batch.
//
// If we could just do *all* L0 a single time with L1, that would
be ideal.  But we can't
// -- see the javadoc for MAX_COMPACTING_L0.
//
// LevelDB's way around this is to simply block writes if L0
compaction falls behind.
// We don't have that luxury.
//
// So instead, we force compacting higher levels first.  This may
not minimize the number
// of reads done as quickly in the short term, but it minimizes the
i/o needed to compact
// optimially which gives us a long term win.


Ideal would be something like a major compaction for LCS which allows end
user to change resulting SSTable sizes without forcing everything back to
L0.

=Rob

Rhombus - A time-series object store for Cassandra

2013-07-11 Thread Rob Righter

Hello,

Just wanted to share a project that we have been working on. It's a
time-series object store for Cassandra. We tried to generalize the
common use cases for storing time-series data in Cassandra and
automatically handle the denormalization, indexing, and wide row
sharding. It currently exists as a Java Library. We have it deployed
as a web service in a Dropwizard app server with a REST style
interface. The plan is to eventually release that Dropwizard app too.

The project and explanation is available on Github at:
https://github.com/Pardot/Rhombus

I would love to hear feedback.

Many Thanks,
Rob

Re: Token Aware Routing: Routing Key Vs Composite Key with vnodes

2013-07-11 Thread Colin Blower

It is my understanding that you must have all parts of the partition key
in order to calculate the token. The partition key is the first part of
the primary key, in your case the userId.

You should be able to get the token from the userId. Give it a try:
cqlsh select userId, token(userId) from users limit 10;

On 07/11/2013 08:54 AM, Haithem Jarraya wrote:
 Hi All,

 I am a bit confused on how the underlying token aware routing is
 working in the case of composite key.
 Let's say I have a column family like this USERS( uuid userId, text
 firstname, text lastname, int age, PRIMARY KEY(userId, firstname,
 lastname))

 My question is do we need to have the values of the userId, firstName
 and lastName available in the same time to create the token from the
 composite key, or we can get the right token just by looking at the
 routing key userId?

 Looking at the datastax driver code, is a bit confusing, it seems that
 it calculate the token only when all the values of a composite key is
 available, or I am missing something?

 Thanks,

 Haithem



-- 
*Colin Blower*
/Software Engineer/
Barracuda Networks Inc.
+1 408-342-5576 (o)

Re: merge sstables

2013-07-11 Thread sankalp kohli

He has around 10G of data so should not be bad. This problem is if you have
lot of data.


On Thu, Jul 11, 2013 at 2:10 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Jul 11, 2013 at 1:52 PM, sankalp kohli kohlisank...@gmail.comwrote:

 Scrub will keep the file size same. YOu need to move all sstables to be
 L0. the way to do this is to remove the json file which has level
 information.


 This will work, but I believe is subject to this?

 ./src/java/org/apache/cassandra/db/compaction/LeveledManifest.java line
 228 of 577
 
 // LevelDB gives each level a score of how much data it contains
 vs its ideal amount, and
 // compacts the level with the highest score. But this falls apart
 spectacularly once you
 // get behind.  Consider this set of levels:
 // L0: 988 [ideal: 4]
 // L1: 117 [ideal: 10]
 // L2: 12  [ideal: 100]
 //
 // The problem is that L0 has a much higher score (almost 250)
 than L1 (11), so what we'll
 // do is compact a batch of MAX_COMPACTING_L0 sstables with all
 117 L1 sstables, and put the
 // result (say, 120 sstables) in L1. Then we'll compact the next
 batch of MAX_COMPACTING_L0,
 // and so forth.  So we spend most of our i/o rewriting the L1
 data with each batch.
 //
 // If we could just do *all* L0 a single time with L1, that would
 be ideal.  But we can't
 // -- see the javadoc for MAX_COMPACTING_L0.
 //
 // LevelDB's way around this is to simply block writes if L0
 compaction falls behind.
 // We don't have that luxury.
 //
 // So instead, we force compacting higher levels first.  This may
 not minimize the number
 // of reads done as quickly in the short term, but it minimizes
 the i/o needed to compact
 // optimially which gives us a long term win.
 

 Ideal would be something like a major compaction for LCS which allows end
 user to change resulting SSTable sizes without forcing everything back to
 L0.

 =Rob

unsubscribe

2013-07-11 Thread crigano

Re: listen_address and rpc_address address on different interface

2013-07-11 Thread Robert Coli

On Thu, Jul 11, 2013 at 2:53 AM, Christopher Wirt chris.w...@struq.comwrote:

 **

 If we were to take down a node and change the listen address then re-join
 the ring, the other nodes will mark the node as dead when we take it down
 and assume we have a new node when we bring it back on a different address.

**

 Lots of wasted rebalancing and compaction will start.

 We use Cassandra 1.2.4 w/vnodes.


In theory you can :

1) stop cassandra
2) change ip/config/etc.
3) restart cassandra with auto_bootstrap=false in cassandra.yaml

I believe this should just work because the node knows what tokens it is
claiming from the system keyspace, it simply announces to the cluster that
it is now responsible for each of those ranges. The other nodes say should
just say ok.

If you do this, please let us know the results! Obviously you should try it
first on a non-production cluster...

So back to question one, am I wasting my time?


My hunch is probably but it is just a hunch.

=Rob

How many DCs can you have in a cluster?

2013-07-11 Thread Blair Zajac

In this C* Summit 2013 talk titled A Deep Dive Into How Cassandra 
Resolves Inconsistent Data [1], Jason Brown of Netflix mentions that 
they have 5 data centers  in the same cluster, two in the US, one in 
Europe, one in Brazil and one in Asia (I'm going from memory now since I 
don't want to watch the video again).


Is there a practical limit on how many different data centers one can 
have in a single cluster?


Thanks,
Blair

[1] 
http://www.youtube.com/watch?v=VRZk-NhfX18list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUUindex=57

Re: Alternate major compaction

2013-07-11 Thread Takenori Sato

Hi,

I made the repository public. Now you can checkout from here.

https://github.com/cloudian/support-tools

checksstablegarbage is the tool.

Enjoy, and any feedback is welcome.

Thanks,
- Takenori


On Thu, Jul 11, 2013 at 10:12 PM, srmore comom...@gmail.com wrote:

 Thanks Takenori,
 Looks like the tool provides some good info that people can use. It would
 be great if you can share it with the community.




 On Thu, Jul 11, 2013 at 6:51 AM, Takenori Sato ts...@cloudian.com wrote:

 Hi,

 I think it is a common headache for users running a large Cassandra
 cluster in production.


 Running a major compaction is not the only cause, but more. For example,
 I see two typical scenario.

 1. backup use case
 2. active wide row

 In the case of 1, say, one data is removed a year later. This means,
 tombstone on the row is 1 year away from the original row. To remove an
 expired row entirely, a compaction set has to include all the rows. So,
 when do the original, 1 year old row, and the tombstoned row are included
 in a compaction set? It is likely to take one year.

 In the case of 2, such an active wide row exists in most of sstable
 files. And it typically contains many expired columns. But none of them
 wouldn't be removed entirely because a compaction set practically do not
 include all the row fragments.


 Btw, there is a very convenient MBean API is available. It is
 CompactionManager's forceUserDefinedCompaction. You can invoke a minor
 compaction on a file set you define. So the question is how to find an
 optimal set of sstable files.

 Then, I wrote a tool to check garbage, and print outs some useful
 information to find such an optimal set.

 Here's a simple log output.

 # /opt/cassandra/bin/checksstablegarbage -e 
 /cassandra_data/UserData/Test5_BLOB-hc-4-Data.db
 [Keyspace, ColumnFamily, gcGraceSeconds(gcBefore)] = [UserData, Test5_BLOB, 
 300(1373504071)]
 ===
 ROW_KEY, TOTAL_SIZE, COMPACTED_SIZE, TOMBSTONED, EXPIRED, 
 REMAINNING_SSTABLE_FILES
 ===
 hello5/100.txt.1373502926003, 40, 40, YES, YES, Test5_BLOB-hc-3-Data.db
 ---
 TOTAL, 40, 40
 ===

 REMAINNING_SSTABLE_FILES means any other sstable files that contain the
 respective row. So, the following is an optimal set.

 # /opt/cassandra/bin/checksstablegarbage -e 
 /cassandra_data/UserData/Test5_BLOB-hc-4-Data.db 
 /cassandra_data/UserData/Test5_BLOB-hc-3-Data.db
 [Keyspace, ColumnFamily, gcGraceSeconds(gcBefore)] = [UserData, Test5_BLOB, 
 300(1373504131)]
 ===
 ROW_KEY, TOTAL_SIZE, COMPACTED_SIZE, TOMBSTONED, EXPIRED, 
 REMAINNING_SSTABLE_FILES
 ===
 hello5/100.txt.1373502926003, 223, 0, YES, YES
 ---
 TOTAL, 223, 0
 ===

 This tool relies on SSTableReader and an aggregation iterator as
 Cassandra does in compaction. I was considering to share this with the
 community. So let me know if anyone is interested.

 Ah, note that it is based on 1.0.7. So I will need to check and update
 for newer versions.

 Thanks,
 Takenori


 On Thu, Jul 11, 2013 at 6:46 PM, Tomàs Núnez 
 tomas.nu...@groupalia.comwrote:

 Hi

 About a year ago, we did a major compaction in our cassandra cluster (a
 n00b mistake, I know), and since then we've had huge sstables that never
 get compacted, and we were condemned to repeat the major compaction process
 every once in a while (we are using SizeTieredCompaction strategy, and
 we've not avaluated yet LeveledCompaction, because it has its downsides,
 and we've had no time to test all of them in our environment).

 I was trying to find a way to solve this situation (that is, do
 something like a major compaction that writes small sstables, not huge as
 major compaction does), and I couldn't find it in the documentation. I
 tried cleanup and scrub/upgradesstables, but they don't do that (as
 documentation states). Then I tried deleting all data in a node and then
 bootstrapping it (or nodetool rebuild-ing it), hoping that this way the
 sstables would get cleaned from deleted records and updates. But the
 deleted node just copied the sstables from another node as they were,
 cleaning nothing.

 So I tried a new approach: I switched the sstable compaction strategy
 (SizeTiered to Leveled), forcing the sstables to be rewritten from scratch,
 and then switching it back (Leveled to SizeTiered). It took a while (but so
 do the major compaction process) and it worked, I have smaller sstables,

39 matches

Mail list logo