Read-repair working, repair not working?

2013-02-10 Thread Brian Fleming
 **

Hi,

** **

I have a 20 node cluster running v1.0.7 split between 5 data centres, each
with an RF of 2, containing a ~1TB unique dataset/~10TB of total data. 

** **

I’ve had some intermittent issues with a new data centre (3 nodes, RF=2) I
brought online late last year with data consistency  availability: I’d
request data, nothing would be returned, I would then re-request the data
and it would correctly be returned: i.e. read-repair appeared to be
occurring.  However running repairs on the nodes didn’t resolve this (I
tried general ‘*repair’* commands as well as targeted keyspace commands) –
this didn’t alter the behaviour.

** **

After a lot of fruitless investigation, I decided to wipe 
re-install/re-populate the nodes.  The re-install  repair operations are
now complete: I see the expected amount of data on the nodes, however I am
still seeing the same behaviour, i.e. I only get data after one failed
attempt.

** **

When I run repair commands, I don’t see any errors in the logs. 

I see the expected ‘AntiEntropySessions’ count in ‘nodetool tpstats’ during
repair sessions.

I see a number of dropped ‘MUTATION’ operations : just under 5% of the
total ‘MutationStage’ count.

** **

Questions :

**-  **Could anybody suggest anything specific to look at to see
why the repair operations aren’t having the desired effect? 

**-  **Would increasing logging level to ‘DEBUG’ show read-repair
activity (to confirm that this is happening, when  for what proportion of
total requests)?

**-  **Is there something obvious that I could be missing here?

** **

Many thanks,

Brian

**


High CPU usage during repair

2013-02-10 Thread Tamar Fraenkel
Hi!
I run repair weekly, using a scheduled cron job.
During repair I see high CPU consumption, and messages in the log file
INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line
122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is
3894411264
From time to time, there are also messages of the form
INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java
(line 607) 1 READ messages dropped in last 5000ms

Using opscenter, jmx and nodetool compactionstats I can see that during the
time the CPU consumption is high, there are compactions waiting.

I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
I have the default settings:
compaction_throughput_mb_per_sec: 16
in_memory_compaction_limit_in_mb: 64
multithreaded_compaction: false
compaction_preheat_key_cache: true

I am thinking on the following solution, and wanted to ask if I am on the
right track:
I thought of adding a call to my repair script, before repair starts to do:
nodetool setcompactionthroughput 0
and then when repair finishes call
nodetool setcompactionthroughput 16

Is this a right solution?
Thanks,
Tamar

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956
tokLogo.png

Re: Netflix/Astynax Client for Cassandra

2013-02-10 Thread Renato Marroquín Mogrovejo
Sorry to hijack into this email thread, but what are the use
cases/benefits of using the new binary protocol? and why doesn't
Cassandra offer a drive as a project driver?


Renato M.

2013/2/8 aaron morton aa...@thelastpickle.com:
 I'm going to guess Netflix are running Astynax in production with Cassandra
 1.1.

 cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 8/02/2013, at 6:50 AM, Cassa L lcas...@gmail.com wrote:

 Thank you all for the responses to this thread. I am  planning to use
 Cassandra 1.1.9 with Astynax. Does anyone has Cassandra 1.x version running
 in production with astynax? Did you come across any show-stopper issues?

 Thanks
 LCassa


 On Thu, Feb 7, 2013 at 8:50 AM, Bartłomiej Romański b...@sentia.pl wrote:

 Hi,

 Does anyone know how about virtual nodes support in Astynax? Are they
 handled correctly? Especially with ConnectionPoolType.TOKEN_AWARE?

 Thanks,
 BR





Re: High CPU usage during repair

2013-02-10 Thread aaron morton
 During repair I see high CPU consumption, 
Repair reads the data and computes a hash, this is a CPU intensive operation.
Is the CPU over loaded or is just under load?

 I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
What machine size?

 there are compactions waiting.
That's normally ok. How many are waiting?

 I thought of adding a call to my repair script, before repair starts to do:
 nodetool setcompactionthroughput 0
 and then when repair finishes call
 nodetool setcompactionthroughput 16
That will remove throttling on compaction and the validation compaction used 
for the repair. Which may in turn add additional IO load, CPU load and GC 
pressure. You probably do not want to do this. 

Try reducing the compaction throughput to say 12 normally and see the effect.

Cheers


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/02/2013, at 1:01 AM, Tamar Fraenkel ta...@tok-media.com wrote:

 Hi!
 I run repair weekly, using a scheduled cron job.
 During repair I see high CPU consumption, and messages in the log file
 INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line 122) 
 GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is 3894411264
 From time to time, there are also messages of the form
 INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java (line 
 607) 1 READ messages dropped in last 5000ms
 
 Using opscenter, jmx and nodetool compactionstats I can see that during the 
 time the CPU consumption is high, there are compactions waiting.
 
 I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
 I have the default settings:
 compaction_throughput_mb_per_sec: 16
 in_memory_compaction_limit_in_mb: 64
 multithreaded_compaction: false
 compaction_preheat_key_cache: true
 
 I am thinking on the following solution, and wanted to ask if I am on the 
 right track:
 I thought of adding a call to my repair script, before repair starts to do:
 nodetool setcompactionthroughput 0
 and then when repair finishes call
 nodetool setcompactionthroughput 16
 
 Is this a right solution?
 Thanks,
 Tamar
 
 Tamar Fraenkel 
 Senior Software Engineer, TOK Media 
 
 tokLogo.png
 
 ta...@tok-media.com
 Tel:   +972 2 6409736 
 Mob:  +972 54 8356490 
 Fax:   +972 2 5612956 
 
 



Re: Read-repair working, repair not working?

2013-02-10 Thread aaron morton
 I’d request data, nothing would be returned, I would then re-request the data 
 and it would correctly be returned:
 
What CL are you using for reads and writes?

 I see a number of dropped ‘MUTATION’ operations : just under 5% of the total 
 ‘MutationStage’ count.
 
Dropped mutations in a multi DC setup may be a sign of network congestion or 
overloaded nodes. 


 -  Could anybody suggest anything specific to look at to see why the 
 repair operations aren’t having the desired effect? 
 
I would first build a test case to ensure correct operation when using strong 
consistency. i.e. QUOURM write and read. Because you are using RF 2 per DC I 
assume you are not using LOCAL_QUOURM because that is 2 and you would not have 
any redundancy in the DC. 

 
 
 -  Would increasing logging level to ‘DEBUG’ show read-repair 
 activity (to confirm that this is happening, when  for what proportion of 
 total requests)?
It would, but the INFO logging for the AES is pretty good. I would hold off for 
now. 

 
 -  Is there something obvious that I could be missing here?
When a new AES session starts it logs this

logger.info(String.format([repair #%s] new session: will sync %s 
on range %s for %s.%s, getName(), repairedNodes(), range, tablename, 
Arrays.toString(cfnames)));

When it completes it logs this

logger.info(String.format([repair #%s] session completed successfully, 
getName()));

Or this on failure 

logger.error(String.format([repair #%s] session completed with the following 
error, getName()), exception);


Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/02/2013, at 9:56 PM, Brian Fleming bigbrianflem...@gmail.com wrote:

 
  
 
 Hi,
 
  
 
 I have a 20 node cluster running v1.0.7 split between 5 data centres, each 
 with an RF of 2, containing a ~1TB unique dataset/~10TB of total data. 
 
  
 
 I’ve had some intermittent issues with a new data centre (3 nodes, RF=2) I 
 brought online late last year with data consistency  availability: I’d 
 request data, nothing would be returned, I would then re-request the data and 
 it would correctly be returned: i.e. read-repair appeared to be occurring.  
 However running repairs on the nodes didn’t resolve this (I tried general 
 ‘repair’ commands as well as targeted keyspace commands) – this didn’t alter 
 the behaviour.
 
  
 
 After a lot of fruitless investigation, I decided to wipe  
 re-install/re-populate the nodes.  The re-install  repair operations are now 
 complete: I see the expected amount of data on the nodes, however I am still 
 seeing the same behaviour, i.e. I only get data after one failed attempt.
 
  
 
 When I run repair commands, I don’t see any errors in the logs. 
 
 I see the expected ‘AntiEntropySessions’ count in ‘nodetool tpstats’ during 
 repair sessions.
 
 I see a number of dropped ‘MUTATION’ operations : just under 5% of the total 
 ‘MutationStage’ count.
 
  
 
 Questions :
 
 -  Could anybody suggest anything specific to look at to see why the 
 repair operations aren’t having the desired effect? 
 
 -  Would increasing logging level to ‘DEBUG’ show read-repair 
 activity (to confirm that this is happening, when  for what proportion of 
 total requests)?
 
 -  Is there something obvious that I could be missing here?
 
  
 
 Many thanks,
 
 Brian
 
  
 



Re: High CPU usage during repair

2013-02-10 Thread Tamar Fraenkel
Hi!
Thanks for the response.
See my answers and questions below.
Thanks!
Tamar

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956




On Sun, Feb 10, 2013 at 10:04 PM, aaron morton aa...@thelastpickle.comwrote:

 During repair I see high CPU consumption,

 Repair reads the data and computes a hash, this is a CPU intensive
 operation.
 Is the CPU over loaded or is just under load?

 Usually just load, but in the past two weeks I have seen CPU of over 90%!

 I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.

 What machine size?

m1.large


 there are compactions waiting.

 That's normally ok. How many are waiting?

 I have seen 4 this morning

 I thought of adding a call to my repair script, before repair starts to do:
 nodetool setcompactionthroughput 0
 and then when repair finishes call
 nodetool setcompactionthroughput 16

 That will remove throttling on compaction and the validation compaction
 used for the repair. Which may in turn add additional IO load, CPU load and
 GC pressure. You probably do not want to do this.

 Try reducing the compaction throughput to say 12 normally and see the
 effect.

 Just to make sure I understand you correctly, you suggest that I change
throughput to 12 regardless of whether repair is ongoing or not. I will do
it using nodetool and change the yaml file in case a restart will occur in
the future?

 Cheers


-
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 11/02/2013, at 1:01 AM, Tamar Fraenkel ta...@tok-media.com wrote:

 Hi!
 I run repair weekly, using a scheduled cron job.
 During repair I see high CPU consumption, and messages in the log file
 INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line
 122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is
 3894411264
 From time to time, there are also messages of the form
 INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java
 (line 607) 1 READ messages dropped in last 5000ms

 Using opscenter, jmx and nodetool compactionstats I can see that during
 the time the CPU consumption is high, there are compactions waiting.

 I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
 I have the default settings:
 compaction_throughput_mb_per_sec: 16
 in_memory_compaction_limit_in_mb: 64
 multithreaded_compaction: false
 compaction_preheat_key_cache: true

 I am thinking on the following solution, and wanted to ask if I am on the
 right track:
 I thought of adding a call to my repair script, before repair starts to do:
 nodetool setcompactionthroughput 0
 and then when repair finishes call
 nodetool setcompactionthroughput 16

 Is this a right solution?
 Thanks,
 Tamar

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 tokLogo.png


 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956




tokLogo.png

Re: Issues with writing data to Cassandra column family using a Hive script

2013-02-10 Thread aaron morton
Don't use the variable length Cassandra integer, use the Int32Type. It also 
sounds like you want to use a DoubleType rather than FloatType. 
http://www.datastax.com/docs/datastax_enterprise2.2/solutions/about_hive#hive-to-cassandra-table-mapping
 
Cheers

 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/02/2013, at 4:15 PM, Dinusha Dilrukshi sdddilruk...@gmail.com wrote:

 Hi All,
 
 Data was originally stored in column family called test_cf. Definition of 
 column family is as follows:
 
 CREATE COLUMN FAMILY test_cf 
 WITH COMPARATOR = 'IntegerType' 
  AND key_validation_class = UTF8Type 
  AND default_validation_class = FloatType;  
 
 And, following is the sample data set that contains in test_cf.
 
 cqlsh:temp_ks select * from test_cf;
  key| column1| value
 --++---
  localhost:8282 | 1350468600 |76
  localhost:8282 | 1350468601 |76
 
 
 Hive script (shown in the end of mail) is use to take the data from above 
 column family test_cf and insert into a new column family called 
 cpu_avg_5min_new7. Column family description of cpu_avg_5min_new7 is also 
 same as the test_cf. Issue is, data written in to cpu_avg_5min_new7 column 
 family after executing the hive script is as follows. It's not in the format  
 of data present in the original column family test_cf. Any explanations 
 would highly appreciate..
 
 
 cqlsh:temp_ks select * from cpu_avg_5min_new7;
  key| column1  | value
 --+--+--
  localhost:8282 | 232340574229062170849328 | 1.09e-05
  localhost:8282 | 232340574229062170849329 | 1.09e-05
 
 
 Hive script:
 
 drop table cpu_avg_5min_new7_hive;
 CREATE EXTERNAL TABLE IF NOT EXISTS cpu_avg_5min_new7_hive (src_id STRING, 
 start_time INT, cpu_avg FLOAT) STORED BY 
 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH 
 SERDEPROPERTIES (
  cassandra.host = 127.0.0.1 , cassandra.port = 9160 , 
 cassandra.ks.name = temp_ks , 
  cassandra.ks.username = xxx , cassandra.ks.password = xxx , 
  cassandra.columns.mapping = :key,:column,:value , cassandra.cf.name = 
 cpu_avg_5min_new7 ); 
 
 drop table xxx;
 CREATE EXTERNAL TABLE IF NOT EXISTS xxx (src_id STRING, start_time INT, 
 cpu_avg FLOAT) STORED BY
  'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH 
 SERDEPROPERTIES ( 
  cassandra.host = 127.0.0.1 , cassandra.port = 9160 , 
 cassandra.ks.name = temp_ks ,
   cassandra.ks.username = xxx , cassandra.ks.password = xxx ,
cassandra.columns.mapping = :key,:column,:value , cassandra.cf.name 
 = test_cf );
 
 insert overwrite table cpu_avg_5min_new7_hive select 
 src_id,start_time,cpu_avg from xxx;
 
 Regards,
 Dinusha.
 
 



Re: Cassandra 1.1.2 - 1.1.8 upgrade

2013-02-10 Thread aaron morton
I would do #1.

You can play with nodetool setcompactionthroughput to speed things up, but 
beware nothing comes for free.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/02/2013, at 6:40 AM, Mike mthero...@yahoo.com wrote:

 Thank you,
 
 Another question on this topic.
 
 Upgrading from 1.1.2-1.1.9 requires running upgradesstables, which will take 
 many hours on our dataset (about 12).  For this upgrade, is it recommended 
 that I:
 
 1) Upgrade all the DB nodes to 1.1.9 first, then go around the ring and run a 
 staggered upgrade of the sstables over a number of days.
 2) Upgrade one node at a time, running the clustered in a mixed 1.1.2-1.1.9 
 configuration for a number of days.
 
 I would prefer #1, as with #2, streaming will not work until all the nodes 
 are upgraded.
 
 I appreciate your thoughts,
 -Mike
 
 On 1/16/2013 11:08 AM, Jason Wee wrote:
 always check NEWS.txt for instance for cassandra 1.1.3 you need to run 
 nodetool upgradesstables if your cf has counter.
 
 
 On Wed, Jan 16, 2013 at 11:58 PM, Mike mthero...@yahoo.com wrote:
 Hello,
 
 We are looking to upgrade our Cassandra cluster from 1.1.2 - 1.1.8 (or 
 possibly 1.1.9 depending on timing).  It is my understanding that rolling 
 upgrades of Cassandra is supported, so as we upgrade our cluster, we can do 
 so one node at a time without experiencing downtime.
 
 Has anyone had any gotchas recently that I should be aware of before 
 performing this upgrade?
 
 In order to upgrade, is the only thing that needs to change are the JAR 
 files?  Can everything remain as-is?
 
 Thanks,
 -Mike
 
 



Re: Cassandra flush spin?

2013-02-10 Thread aaron morton
Sounds like flushing due to memory consumption. 

The flush log messages include the number of ops, so you can see if this node 
was processing more mutations that the others. Try to see if there was more 
(serialised) data being written or more operations being processed. 

Also just for fun check the JVM and yaml settings are as expected. 

Cheers 


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/02/2013, at 6:29 AM, Mike mthero...@yahoo.com wrote:

 Hello,
 
 We just hit a very odd issue in our Cassandra cluster.  We are running 
 Cassandra 1.1.2 in a 6 node cluster.  We use a replication factor of 3, and 
 all operations utilize LOCAL_QUORUM consistency.
 
 We noticed a large performance hit in our application's maintenance 
 activities and I've been investigating.  I discovered a node in the cluster 
 that was flushing a memtable like crazy.  It was flushing every 2-3 minutes, 
 and has been apparently doing this for days. Typically, during this time of 
 day, a flush would happen every 30 minutes or so.
 
 alldb.sh cat /var/log/cassandra/system.log | grep \flushing high-traffic 
 column family CFS(Keyspace='open', ColumnFamily='msgs')\ | grep 02-08 | wc 
 -l
 [1] 18:41:04 [SUCCESS] db-1c-1
 59
 [2] 18:41:05 [SUCCESS] db-1c-2
 48
 [3] 18:41:05 [SUCCESS] db-1a-1
 1206
 [4] 18:41:05 [SUCCESS] db-1d-2
 54
 [5] 18:41:05 [SUCCESS] db-1a-2
 56
 [6] 18:41:05 [SUCCESS] db-1d-1
 52
 
 
 I restarted the database node, and, at least for now, the problem appears to 
 have stopped.
 
 There are a number of things that don't make sense here.  We use a 
 replication factor of 3, so if this was being caused by our application, I 
 would have expected 3 nodes in the cluster to have issues.  Also, I would 
 have expected the issue to continue once the node restarted.
 
 Another information point of interest, and I'm wondering if its exposed a 
 bug, was this node was recently converted to use ephemeral storage on EC2, 
 and was restored from a snapshot.  After the restore, a nodetool repair was 
 run.  However, repair was going to run into some heavy activity for our 
 application, and we canceled that validation compaction (2 of the 3 
 anti-entropy sessions had completed).  The spin appears to have started at 
 the start of the second session.
 
 Any hints?
 
 -Mike
 
 
 
 
 



Re: persisted ring state

2013-02-10 Thread aaron morton
  Is that the right way to do?
No. 
If you want to change the token for a node use nodetool move. 

Changing it like this will not make the node change it's token. Because after 
startup the token is stored in the System.LocationInfo CF. 

 or -Dcassandra.load_ring_state=false|true is only limited to changes to 
 seed/listen_address ?
it's used when a node somehow as a bad view of the ring, and you want it to 
forget things. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/02/2013, at 3:35 AM, S C as...@outlook.com wrote:

 In one of the scenarios that I encountered, I needed to change the token on 
 the node. I added new token and started the node with
 -Dcassandra.load_ring_state=false in anticipation that the node will not pick 
 from the locally persisted data. Is that the right way to do? or 
 -Dcassandra.load_ring_state=false|true is only limited to changes to 
 seed/listen_address ?
 
 
 Thanks,
 SC



Re: Issues with writing data to Cassandra column family using a Hive script

2013-02-10 Thread Dinusha Dilrukshi
Hi Aaron,

Thanks for the reply.. I ll try out your suggestion.

Regards,
Dinusha.

On Mon, Feb 11, 2013 at 1:55 AM, aaron morton aa...@thelastpickle.comwrote:

 Don't use the variable length Cassandra integer, use the Int32Type. It
 also sounds like you want to use a DoubleType rather than FloatType.

 http://www.datastax.com/docs/datastax_enterprise2.2/solutions/about_hive#hive-to-cassandra-table-mapping

 Cheers


 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 10/02/2013, at 4:15 PM, Dinusha Dilrukshi sdddilruk...@gmail.com
 wrote:

 Hi All,

 Data was originally stored in column family called test_cf. Definition
 of column family is as follows:

 CREATE COLUMN FAMILY test_cf
 WITH COMPARATOR = 'IntegerType'
  AND key_validation_class = UTF8Type
  AND default_validation_class = FloatType;

 And, following is the sample data set that contains in test_cf.

 cqlsh:temp_ks select * from test_cf;
  key| column1| value
 --++---
  localhost:8282 | 1350468600 |76
  localhost:8282 | 1350468601 |76


 Hive script (shown in the end of mail) is use to take the data from above
 column family test_cf and insert into a new column family
 called cpu_avg_5min_new7. Column family description
 of cpu_avg_5min_new7 is also same as the test_cf. Issue is, data written
 in to cpu_avg_5min_new7 column family after executing the hive script is
 as follows. It's not in the format  of data present in the original column
 family test_cf. Any explanations would highly appreciate..


 cqlsh:temp_ks select * from cpu_avg_5min_new7;
  key| column1  | value
 --+--+--
  localhost:8282 | 232340574229062170849328 | 1.09e-05
  localhost:8282 | 232340574229062170849329 | 1.09e-05


 Hive script:
 
 drop table cpu_avg_5min_new7_hive;
 CREATE EXTERNAL TABLE IF NOT EXISTS cpu_avg_5min_new7_hive (src_id STRING,
 start_time INT, cpu_avg FLOAT) STORED BY
 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH
 SERDEPROPERTIES (
  cassandra.host = 127.0.0.1 , cassandra.port = 9160 , 
 cassandra.ks.name = temp_ks ,
  cassandra.ks.username = xxx , cassandra.ks.password = xxx ,
  cassandra.columns.mapping = :key,:column,:value , cassandra.cf.name
 = cpu_avg_5min_new7 );

 drop table xxx;
 CREATE EXTERNAL TABLE IF NOT EXISTS xxx (src_id STRING, start_time INT,
 cpu_avg FLOAT) STORED BY
  'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH
 SERDEPROPERTIES (
  cassandra.host = 127.0.0.1 , cassandra.port = 9160 , 
 cassandra.ks.name = temp_ks ,
   cassandra.ks.username = xxx , cassandra.ks.password = xxx ,
cassandra.columns.mapping = :key,:column,:value , 
 cassandra.cf.name = test_cf );

 insert overwrite table cpu_avg_5min_new7_hive select
 src_id,start_time,cpu_avg from xxx;

 Regards,
 Dinusha.






Querying composite keys

2013-02-10 Thread Rishabh Agrawal
Hello

I have key and columns defined in following fashion:



HotelName1:RoomNum1

HotelName2:RoomNum2

HotelName3:RoomNum3

Key1:TimeStamp:VersionNum









Is there a way that I can query this schema by only 'key' or 'HotelName' i.e. 
querying using a part of composite key and not the full key.


Thanks and Regards
Rishabh Agrawal









NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Querying composite keys

2013-02-10 Thread Vivek Mishra
You can query over composite columns as:
1) Partition key
2) First part of clustered key(using EQ ops).

Secondary indexes over non composite columns are not possible.

-Vivek
On Mon, Feb 11, 2013 at 12:06 PM, Rishabh Agrawal 
rishabh.agra...@impetus.co.in wrote:

  Hello



 I have key and columns defined in following fashion:





 HotelName1:RoomNum1

 HotelName2:RoomNum2

 HotelName3:RoomNum3

 Key1:TimeStamp:VersionNum











 Is there a way that I can query this schema by only ‘key’ or ‘HotelName’
 i.e. querying using a part of composite key and not the full key.





 Thanks and Regards

 Rishabh Agrawal



 --






 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use of this email is prohibited when
 received in error. Impetus does not represent, warrant and/or guarantee,
 that the integrity of this communication has been maintained nor that the
 communication is free of errors, virus, interception or interference.



Re: Cassandra 1.1.2 - 1.1.8 upgrade

2013-02-10 Thread Michal Michalski



2) Upgrade one node at a time, running the clustered in a mixed
1.1.2-1.1.9 configuration for a number of days.


I'm about to upgrade my 1.1.0 cluster and
http://www.datastax.com/docs/1.1/install/upgrading#info says:

If you are upgrading to Cassandra 1.1.9 from a version earlier than 
1.1.7, all nodes must be upgraded before any streaming can take place. 
Until you upgrade all nodes, you cannot add version 1.1.7 nodes or later 
to a 1.1.7 or earlier cluster.


Which one is correct then? Can I run mixed 1.1.2 (in my case 1.1.0)  
1.1.9 cluster or not?


M.