Timeout reading row from CF with collections

2013-07-12 Thread Paul Ingalls
I'm running into a problem trying to read data from a column family that 
includes a number of collections.  

Cluster details:
4 nodes running 1.2.6 on VMs with 4 cpus and 7 Gb of ram.
raid 0 striped across 4 disks for the data and logs
each node has about 500 MB of data currently loaded

Here is the schema:

create table user_scores
(
user_id varchar,
post_type varchar,
score double,
team_to_score_map mapvarchar, double,
affiliation_to_score_map mapvarchar, double,
campaign_to_score_map mapvarchar, double,
person_to_score_map mapvarchar, double,
primary key(user_id, post_type)
)
with compaction =
{
  'class' : 'LeveledCompactionStrategy',
  'sstable_size_in_mb' : 10
};

I used the leveled compaction strategy as I thought it would help with read 
latency…

Here is a trace of a simple select against the cluster when it had nothing else 
was reading or writing (cpu was  2%):

 activity| 
timestamp| source | source_elapsed
-+--++
  execute_cql3_query | 
05:51:34,557 |  100.69.176.51 |  0
Message received from /100.69.176.51 | 
05:51:34,195 | 100.69.184.134 |102
 Executing single-partition query on user_scores | 
05:51:34,199 | 100.69.184.134 |   3512
Acquiring sstable references | 
05:51:34,199 | 100.69.184.134 |   3741
 Merging memtable tombstones | 
05:51:34,199 | 100.69.184.134 |   3890
 Key cache hit for sstable 5 | 
05:51:34,199 | 100.69.184.134 |   4040
 Seeking to partition beginning in data file | 
05:51:34,199 | 100.69.184.134 |   4059
  Merging data from memtables and 1 sstables | 
05:51:34,200 | 100.69.184.134 |   4412
 Parsing select * from user_scores where user_id='26257166' LIMIT 1; | 
05:51:34,558 |  100.69.176.51 | 91
  Peparing statement | 
05:51:34,558 |  100.69.176.51 |238
   Enqueuing data request to /100.69.184.134 | 
05:51:34,558 |  100.69.176.51 |567
  Sending message to /100.69.184.134 | 
05:51:34,558 |  100.69.176.51 |979
Request complete | 
05:51:54,562 |  100.69.176.51 |   20005209

You can see that I increased the timeout and it still fails.  This seems to 
happen with rows that have maps with a larger number of entries.  It is very 
reproducible with my current data set.

Any ideas on why I can't query for a row?

Thanks!

Paul




Cassandra-CQL-Csharp-driver-sample

2013-07-12 Thread Murali
Hi,
I created a very simple CRUD operation using Cassandra CQL C-sharp driver.
If somebody  is interested, please try it out and feedback / comments are
welcome.

https://github.com/muralidharand/cassandra-CQL-csharp-driver-sample

-- 
Thanks,
Murali


Re: Node tokens / data move

2013-07-12 Thread aaron morton
  Can he not specify all 256 tokens in the YAML of the new cluster 
 and then copy sstables? 
 I know it is a bit ugly but should work.
You can pass a comma separated list of tokens to the -Dcassandra.replace_token 
JVM param. 

AFAIK it's not possible to provide the list in the yaml file. 

Cheers
A

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/07/2013, at 5:07 AM, Baskar Duraikannu baskar.duraikannu...@gmail.com 
wrote:

 
 I copied the sstables and then ran a repair. It worked. Looks like export and 
 import may have been much faster given that we had very little data.
 
 Thanks everyone.
 
 
 
 
 On Tue, Jul 9, 2013 at 1:34 PM, sankalp kohli kohlisank...@gmail.com wrote:
 Hi Aaron,
  Can he not specify all 256 tokens in the YAML of the new cluster 
 and then copy sstables? 
 I know it is a bit ugly but should work.
 
 Sankalp
 
 
 On Tue, Jul 9, 2013 at 3:19 AM, Baskar Duraikannu 
 baskar.duraikannu...@gmail.com wrote:
 Thanks Aaron
 
 On 7/9/13, aaron morton aa...@thelastpickle.com wrote:
  Can I just copy data files for the required keyspaces, create schema
  manually and run repair?
  If you have something like RF 3 and 3 nodes then yes, you can copy the data
  from one node in the source cluster to all nodes in the dest cluster and use
  cleanup to remove the unneeded data. Because each node in the source cluster
  has a full copy of the data.
 
  If that's not the case you cannot copy the data files, even if they have the
  same number of nodes, because the nodes in the dest cluster will have
  different tokens. AFAIK you need to export the full data set from the source
  DC and then import it into the dest system.
 
  The Bulk Load utility may be of help
  http://www.datastax.com/docs/1.2/references/bulkloader . You could copy the
  SSTables from every node in the source system and bulk load them into the
  dest system. That process will ensure rows are sent to nodes that are
  replicas.
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Consultant
  New Zealand
 
  @aaronmorton
  http://www.thelastpickle.com
 
  On 9/07/2013, at 12:45 PM, Baskar Duraikannu
  baskar.duraikannu...@gmail.com wrote:
 
  We have two clusters used by two different groups with vnodes enabled. Now
  there is a need to move some of the keyspaces from cluster 1 to cluster 2.
 
 
  Can I just copy data files for the required keyspaces, create schema
  manually and run repair?
 
  Anything else required?  Please help.
  --
  Thanks,
  Baskar Duraikannu
 
 
 
 



Re: how to determine RF on the fly ?

2013-07-12 Thread aaron morton
It's available on the Thrift API call describe_keyspaces() 
https://github.com/apache/cassandra/blob/trunk/interface/cassandra.thrift#L730

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/07/2013, at 7:04 AM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jul 10, 2013 at 12:58 AM, Илья Шипицин chipits...@gmail.com wrote:
 is there easy way to determine current RF, for instance, via mx4j ?
 
 The methods which show keyspace or schema (from CLI or cqlsh) show the 
 replication factor, as the replication factor is a keyspace property.
 
 I don't believe it's available via JMX, but there's no reason it couldn't 
 be...
 
 =Rob 



Re: Quorum reads and response time

2013-07-12 Thread aaron morton
 But when I run the same query with consistency level as Quorum, it is taking 
 ~2.3 seconds.  It feels as if querying of the nodes are in sequence.  
No. 

As Sankalp says look for GC issues. If none then take a look at how much data 
you are pulling back, and tell us what sort of query you are using. 

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/07/2013, at 7:10 AM, sankalp kohli kohlisank...@gmail.com wrote:

 The coordinator node has to merge the results from 2 nodes and the request is 
 done in parallel. I have seen lot of GC pressure with range queries because 
 of tombstones. 
 Can you see logs to see if there is lot of GC going on. Also try to have GC 
 log enabled. 
 
 
 On Wed, Jul 10, 2013 at 9:57 AM, Baskar Duraikannu 
 baskar.duraikannu...@gmail.com wrote:
 Just adding few other details to my question.
 
 - We are using RandomPartitioner
 - 256 virtual nodes configured. 
 
 
 On Wed, Jul 10, 2013 at 12:54 PM, Baskar Duraikannu 
 baskar.duraikannu...@gmail.com wrote:
 I have a 3 node cluster with RF=3.  All nodes are running. I have a table 
 with 39 rows and ~44,000 columns evenly spread across 39 rows. 
 
 When I do range slice query on this table with consistency of one, it returns 
 the data back in about  ~600 ms.  I tried the same from all of the 3 nodes,no 
 matter which node I ran it from, queries were answered in 600 ms for 
 consistency level of one.
 
 But when I run the same query with consistency level as Quorum, it is taking 
 ~2.3 seconds.  It feels as if querying of the nodes are in sequence.  
 
 Is this normal? 
 
 --
 Regards,
 Baskar Duraikannu
 
 
 



Re: Timeout reading row from CF with collections

2013-07-12 Thread Sylvain Lebresne
My bet is that you're hitting
https://issues.apache.org/jira/browse/CASSANDRA-5677.

--
Sylvain


On Fri, Jul 12, 2013 at 8:17 AM, Paul Ingalls paulinga...@gmail.com wrote:

 I'm running into a problem trying to read data from a column family that
 includes a number of collections.

 Cluster details:
 4 nodes running 1.2.6 on VMs with 4 cpus and 7 Gb of ram.
 raid 0 striped across 4 disks for the data and logs
 each node has about 500 MB of data currently loaded

 Here is the schema:

 create table user_scores
 (
 user_id varchar,
 post_type varchar,
 score double,
 team_to_score_map mapvarchar, double,
 affiliation_to_score_map mapvarchar, double,
 campaign_to_score_map mapvarchar, double,
 person_to_score_map mapvarchar, double,
 primary key(user_id, post_type)
 )
 with compaction =
 {
   'class' : 'LeveledCompactionStrategy',
   'sstable_size_in_mb' : 10
 };

 I used the leveled compaction strategy as I thought it would help with
 read latency…

 Here is a trace of a simple select against the cluster when it had nothing
 else was reading or writing (cpu was  2%):

  activity|
 timestamp| source | source_elapsed

 -+--++
   execute_cql3_query |
 05:51:34,557 |  100.69.176.51 |  0
 Message received from /100.69.176.51| 
 05:51:34,195 | 100.69.184.134 |102
  Executing single-partition query on user_scores |
 05:51:34,199 | 100.69.184.134 |   3512
 Acquiring sstable references |
 05:51:34,199 | 100.69.184.134 |   3741
  Merging memtable tombstones |
 05:51:34,199 | 100.69.184.134 |   3890
  Key cache hit for sstable 5 |
 05:51:34,199 | 100.69.184.134 |   4040
  Seeking to partition beginning in data file |
 05:51:34,199 | 100.69.184.134 |   4059
   Merging data from memtables and 1 sstables |
 05:51:34,200 | 100.69.184.134 |   4412
  Parsing select * from user_scores where user_id='26257166' LIMIT 1; |
 05:51:34,558 |  100.69.176.51 | 91
   Peparing statement |
 05:51:34,558 |  100.69.176.51 |238
Enqueuing data request to /100.69.184.134| 
 05:51:34,558 |  100.69.176.51 |567
   Sending message to /100.69.184.134| 
 05:51:34,558 |  100.69.176.51 |979
 Request complete |
 05:51:54,562 |  100.69.176.51 |   20005209

 You can see that I increased the timeout and it still fails.  This seems
 to happen with rows that have maps with a larger number of entries.  It is
 very reproducible with my current data set.

 Any ideas on why I can't query for a row?

 Thanks!

 Paul





Re: temporarily running a cassandra side by side in production

2013-07-12 Thread aaron morton
  We are starting to think we are going to try to run a side by side cassandra 
 instance in production while we map/reduce from one cassandra into the new 
 instance. 
What do you mean by side-by-side ?

 Can I assume a cassandra instance will not only bind to the new ports when I 
 change these values but will talk to the other cassandra nodes on those same 
 ports as well such that this cassandra instance is completely independent of 
 my other cassandra instance?
Not sure what you mean, but all nodes in the same cluster must be configured 
with the same storage port. 
The best way to ensure clusters to not interfere with each other is to have 
different seed lists and different cluster names. 

 Are there other gotchas that I have to be aware of?
I'm not sure what you are attempting to do. 

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/07/2013, at 11:37 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 We have a 12 node production cluster and a 4 node QA cluster.  We are 
 starting to think we are going to try to run a side by side cassandra 
 instance in production while we map/reduce from one cassandra into the new 
 instance.  We are intending to do something like this
 
 Modify all ports in cassandra.yaml and the jmx port in cassandra-env.sh, 
 7000, 7001, 9160, 9042, and cassandra-env 7199.
 
 Can I assume a cassandra instance will not only bind to the new ports when I 
 change these values but will talk to the other cassandra nodes on those same 
 ports as well such that this cassandra instance is completely independent of 
 my other cassandra instance?
 
 Are there other gotchas that I have to be aware of?
 
 (we are refactoring our model into a new faster model that we tested in QA 
 with live data as well as moving randompartitioner to murmur)
 
 Thanks,
 Dean
 
 



Re: manually removing sstable

2013-07-12 Thread aaron morton
That sounds sane to me. Couple of caveats:

* Remember that Expiring Columns turn into Tombstones and can only be purged 
after TTL and gc_grace.
* Tombstones will only be purged if all fragments of a row are in the 
SStable(s) being compacted. 

Cheers
  
-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/07/2013, at 10:17 PM, Theo Hultberg t...@iconara.net wrote:

 a colleague of mine came up with an alternative solution that also seems to 
 work, and I'd just like your opinion on if it's sound.
 
 we run find to list all old sstables, and then use cmdline-jmxclient to run 
 the forceUserDefinedCompaction function on each of them, this is roughly what 
 we do (but with find and xargs to orchestrate it)
 
   java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199 
 org.apache.cassandra.db:type=CompactionManager 
 forceUserDefinedCompaction=the_keyspace,db_file_name
 
 the downside is that c* needs to read the file and do disk io, but the upside 
 is that it doesn't require a restart. c* does a little more work, but we can 
 schedule that during off-peak hours. another upside is that it feels like 
 we're pretty safe from screwups, we won't accidentally remove an sstable with 
 live data, the worst case is that we ask c* to compact an sstable with live 
 data and end up with an identical sstable.
 
 if anyone else wants to do the same thing, this is the full cron command:
 
 0 4 * * * find /path/to/cassandra/data/the_keyspace_name -maxdepth 1 -type f 
 -name '*-Data.db' -mtime +8 -printf 
 forceUserDefinedCompaction=the_keyspace_name,\%P\n | xargs -t 
 --no-run-if-empty java -jar 
 /usr/local/share/java/cmdline-jmxclient-0.10.3.jar - localhost:7199 
 org.apache.cassandra.db:type=CompactionManager
 
 just change the keyspace name and the path to the data directory.
 
 T#
 
 
 On Thu, Jul 11, 2013 at 7:09 AM, Theo Hultberg t...@iconara.net wrote:
 thanks a lot. I can confirm that it solved our problem too.
 
 looks like the C* 2.0 feature is perfect for us.
 
 T#
 
 
 On Wed, Jul 10, 2013 at 7:28 PM, Marcus Eriksson krum...@gmail.com wrote:
 yep that works, you need to remove all components of the sstable though, not 
 just -Data.db
 
 and, in 2.0 there is this:
 https://issues.apache.org/jira/browse/CASSANDRA-5228
 
 /Marcus
 
 
 On Wed, Jul 10, 2013 at 2:09 PM, Theo Hultberg t...@iconara.net wrote:
 Hi,
 
 I think I remember reading that if you have sstables that you know contain 
 only data that whose ttl has expired, it's safe to remove them manually by 
 stopping c*, removing the *-Data.db files and then starting up c* again. is 
 this correct?
 
 we have a cluster where everything is written with a ttl, and sometimes c* 
 needs to compact over a 100 gb of sstables where we know ever has expired, 
 and we'd rather just manually get rid of those.
 
 T#
 
 
 



Re: IllegalArgumentException on query with AbstractCompositeType

2013-07-12 Thread aaron morton
 The “ALLOW FILTERING” clause also has no effect.
You only need that when the WHERE clause contains predicates for columns that 
are not part of the primary key. 

 CREATE INDEX ON conv_msgdata_by_participant_cql(msgReadFlag);
On general this is a bad idea in Cassandra (also in a relational DB IMHO). You 
will get poor performance from it. 

 Caused by: java.lang.IllegalArgumentException
 at java.nio.Buffer.limit(Buffer.java:247)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:51)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:78)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:31)
 at 
 org.apache.cassandra.db.columniterator.IndexedSliceReader$BlockFetcher.isColumnBeforeSliceFinish(IndexedSliceReader.java:216)
 at 
 org.apache.cassandra.db.columniterator.IndexedSliceReader$SimpleBlockFetcher.init(IndexedSliceReader.java:450)
 at 
 org.apache.cassandra.db.columniterator.IndexedSliceReader.init(IndexedSliceReader.java:85)
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:68)
This looks like an error in the on disk data, or maybe in passing the value for 
the messageId value but I doubt it. 

What version are you using ? 
Can you reproduce this outside of your unit tests ?

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 12/07/2013, at 12:40 AM, Pruner, Anne (Anne) pru...@avaya.com wrote:

 Hi,
 I’ve been tearing my hair out trying to figure out why this 
 query fails.  In fact, it only fails on machines with slower CPUs and after 
 having previously run some other junit tests.  I’m running junits to an 
 embedded Cassandra server, which works well in pretty much all other cases, 
 but this one is flaky.  I’ve tried to rule out timing issues by placing a 10 
 second delay just before this query, just in case somehow the data isn’t 
 getting into the db in a timely manner, but that doesn’t have any effect.  
 I’ve also tried removing the “ORDER BY” clause, which seems to be the place 
 in the code it’s getting hung up on, but that also doesn’t have any effect.  
 The “ALLOW FILTERING” clause also has no effect.
  
 DEBUG [Native-Transport-Requests:16] 2013-07-10 16:28:21,993 Message.java 
 (line 277) Received: QUERY SELECT * FROM conv_msgdata_by_participant_cql 
 WHEREentityConversationId='bulktestfromus...@test.cacontact_811b5efc-b621-4361-9dc9-2e4755be7d89'
  AND messageId'2013-07-10T20:29:09.773Zzz' ORDER BY messageId DESC LIMIT 
 15 ALLOW FILTERING;
 ERROR [ReadStage:34] 2013-07-10 16:28:21,995 CassandraDaemon.java (line 132) 
 Exception in thread Thread[ReadStage:34,5,main]
 java.lang.RuntimeException: java.lang.IllegalArgumentException
 at 
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1582)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.IllegalArgumentException
 at java.nio.Buffer.limit(Buffer.java:247)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:51)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:78)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:31)
 at 
 org.apache.cassandra.db.columniterator.IndexedSliceReader$BlockFetcher.isColumnBeforeSliceFinish(IndexedSliceReader.java:216)
 at 
 org.apache.cassandra.db.columniterator.IndexedSliceReader$SimpleBlockFetcher.init(IndexedSliceReader.java:450)
 at 
 org.apache.cassandra.db.columniterator.IndexedSliceReader.init(IndexedSliceReader.java:85)
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:68)
 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.init(SSTableSliceIterator.java:44)
 at 
 org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:101)
 at 
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68)
 at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:275)
 at 
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
 at 
 

Re: manually removing sstable

2013-07-12 Thread Theo Hultberg
thanks aaron, the second point I had not considered, and it could explain
why the sstables don't always disapear completely, sometimes a small file
(but megabytes instead of gigabytes) is left behind.

T#


On Fri, Jul 12, 2013 at 10:25 AM, aaron morton aa...@thelastpickle.comwrote:

 That sounds sane to me. Couple of caveats:

 * Remember that Expiring Columns turn into Tombstones and can only be
 purged after TTL and gc_grace.
 * Tombstones will only be purged if all fragments of a row are in the
 SStable(s) being compacted.

 Cheers

 -
 Aaron Morton
 Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 11/07/2013, at 10:17 PM, Theo Hultberg t...@iconara.net wrote:

 a colleague of mine came up with an alternative solution that also seems
 to work, and I'd just like your opinion on if it's sound.

 we run find to list all old sstables, and then use cmdline-jmxclient to
 run the forceUserDefinedCompaction function on each of them, this is
 roughly what we do (but with find and xargs to orchestrate it)

   java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199
 org.apache.cassandra.db:type=CompactionManager 
 forceUserDefinedCompaction=the_keyspace,db_file_name

 the downside is that c* needs to read the file and do disk io, but the
 upside is that it doesn't require a restart. c* does a little more work,
 but we can schedule that during off-peak hours. another upside is that it
 feels like we're pretty safe from screwups, we won't accidentally remove an
 sstable with live data, the worst case is that we ask c* to compact an
 sstable with live data and end up with an identical sstable.

 if anyone else wants to do the same thing, this is the full cron command:

 0 4 * * * find /path/to/cassandra/data/the_keyspace_name -maxdepth 1 -type
 f -name '*-Data.db' -mtime +8 -printf
 forceUserDefinedCompaction=the_keyspace_name,\%P\n | xargs -t
 --no-run-if-empty java -jar
 /usr/local/share/java/cmdline-jmxclient-0.10.3.jar - localhost:7199
 org.apache.cassandra.db:type=CompactionManager

 just change the keyspace name and the path to the data directory.

 T#


 On Thu, Jul 11, 2013 at 7:09 AM, Theo Hultberg t...@iconara.net wrote:

 thanks a lot. I can confirm that it solved our problem too.

 looks like the C* 2.0 feature is perfect for us.

 T#


 On Wed, Jul 10, 2013 at 7:28 PM, Marcus Eriksson krum...@gmail.comwrote:

 yep that works, you need to remove all components of the sstable though,
 not just -Data.db

 and, in 2.0 there is this:
 https://issues.apache.org/jira/browse/CASSANDRA-5228

 /Marcus


 On Wed, Jul 10, 2013 at 2:09 PM, Theo Hultberg t...@iconara.net wrote:

 Hi,

 I think I remember reading that if you have sstables that you know
 contain only data that whose ttl has expired, it's safe to remove them
 manually by stopping c*, removing the *-Data.db files and then starting up
 c* again. is this correct?

 we have a cluster where everything is written with a ttl, and sometimes
 c* needs to compact over a 100 gb of sstables where we know ever has
 expired, and we'd rather just manually get rid of those.

 T#








Extract meta-data using cql 3

2013-07-12 Thread Murali
Hi experts,

How to extract meta-data of a table or a keyspace using CQL 3.0?

-- 
Thanks,
Murali


Re: Extract meta-data using cql 3

2013-07-12 Thread Sylvain Lebresne
The raw answer is that you should query the system tables. The schema is
stored in the 3 following tables: System.schema_keyspaces,
System.schema_columnfamilies and System.schema_columns. Unfortunately, the
information stored in there is, for different reasons, not in a form that
makes
a lot of sense from a CQL3 point of view.

So in practice, you should probably rely on your client drivers that might
provide that same information but in a more usable way. For instance, with
cqlsh, you have a DESCRIBE command. Of if you say use the DataStax Java
driver,
you can access all those metadata through
cluster.getMetadata().getKeyspaces(),
etc...



On Fri, Jul 12, 2013 at 10:52 AM, Murali muralidharan@gmail.com wrote:

 Hi experts,

 How to extract meta-data of a table or a keyspace using CQL 3.0?

 --
 Thanks,
 Murali




Re: Extract meta-data using cql 3

2013-07-12 Thread Theo Hultberg
there's a keyspace called system which has a few tables that contain the
metadata. for example schema_keyspaces that contain keyspace metadata, and
schema_columnfamilies that contain table metadata. there are more, just
fire up cqlsh and do a describe keyspace in the system keyspace to find
them.

T#


On Fri, Jul 12, 2013 at 10:52 AM, Murali muralidharan@gmail.com wrote:

 Hi experts,

 How to extract meta-data of a table or a keyspace using CQL 3.0?

 --
 Thanks,
 Murali




Re: Alternate major compaction

2013-07-12 Thread Radim Kolar
with some very little work (less then 10 KB of code) is possible to have 
online sstable splitter and exported this functionality over JMX.


Error: Main method not found in class org.apache.cassandra.service.CassandraDaemon

2013-07-12 Thread Vivek Mishra
Earlier, everything was working fine but now i am getting this strange
error.
Initially i was working via tarball installation and did install a
Cassandra rpm package.

Since then, i am getting
Error: Main method not found in class
org.apache.cassandra.service.CassandraDaemon, please define the main method
as:
   public static void main(String[] args)


running from tarball installation. I did try setting CASSANDRA_HOME as

CASSANDRA_HOME=/home/impadmin/software/apache-cassandra-1.2.4/

but no luck.

This error is quite confusing, how can a user define a main method within
Cassandra source code??

-Vivek


[BETA RELEASE] Apache Cassandra 2.0.0-beta1 released

2013-07-12 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of the first beta for
the future Apache Cassandra 2.0.0.

Let me first stress that this is beta software and as such is *not* ready
for
production use.

The goal of this release is to give a preview of what will become Cassandra
2.0 and to get wider testing before the final release. As such, it is likely
not bug free but all help in testing this beta would be greatly appreciated
and
will help make 2.0 a solid release. So please report any problem you may
encounter[3,4] with this release and have a look at the change log[1] and
release notes[2] to see where Cassandra 2.0 differs from the previous
series.

Apache Cassandra 2.0.0-beta1[5] is available as usual from the cassandra
website (http://cassandra.apache.org/download/) and a debian package is
available using the 20x branch (see
http://wiki.apache.org/cassandra/DebianPackaging).

Thank you for your help in testing and have fun with it.

[1]: http://goo.gl/TjQGd (CHANGES.txt)
[2]: http://goo.gl/K4QsX (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA
[4]: user@cassandra.apache.org
[5]:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-2.0.0-beta1


Re: temporarily running a cassandra side by side in production

2013-07-12 Thread Hiller, Dean
Heh, oops, yes, We have 12 nodes and are trying to run 2 instances of cassandra 
on those 12 nodes.  So far, in QA this appears to be working.  I like 
clustername change idea as a just in case so I will definitely be doing that 
one.
Thanks,
Dean

From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Friday, July 12, 2013 2:20 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: temporarily running a cassandra side by side in production

 We are starting to think we are going to try to run a side by side cassandra 
instance in production while we map/reduce from one cassandra into the new 
instance.
What do you mean by side-by-side ?

Can I assume a cassandra instance will not only bind to the new ports when I 
change these values but will talk to the other cassandra nodes on those same 
ports as well such that this cassandra instance is completely independent of my 
other cassandra instance?
Not sure what you mean, but all nodes in the same cluster must be configured 
with the same storage port.
The best way to ensure clusters to not interfere with each other is to have 
different seed lists and different cluster names.

Are there other gotchas that I have to be aware of?
I'm not sure what you are attempting to do.

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/07/2013, at 11:37 AM, Hiller, Dean 
dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:

We have a 12 node production cluster and a 4 node QA cluster.  We are starting 
to think we are going to try to run a side by side cassandra instance in 
production while we map/reduce from one cassandra into the new instance.  We 
are intending to do something like this

Modify all ports in cassandra.yaml and the jmx port in cassandra-env.sh, 7000, 
7001, 9160, 9042, and cassandra-env 7199.

Can I assume a cassandra instance will not only bind to the new ports when I 
change these values but will talk to the other cassandra nodes on those same 
ports as well such that this cassandra instance is completely independent of my 
other cassandra instance?

Are there other gotchas that I have to be aware of?

(we are refactoring our model into a new faster model that we tested in QA with 
live data as well as moving randompartitioner to murmur)

Thanks,
Dean





Compression ratio

2013-07-12 Thread cem
Hi All,

Can anyone explain the compression ratio?

Is it the compressed data / original or original/ compressed ? Or
something else.

thanks a lot.

Best Regards,
Cem


Representation of dynamically added columns in table (column family) schema using cqlsh

2013-07-12 Thread Shahab Yunus
A basic question and it seems that I have a gap in my understanding.

I have a simple table in Cassandra with multiple column families. I add new
columns to each of these column families on the fly. When I view (using the
'DESCRIBE table' command) the schema of a particular column family, I see
only one entry for column (bolded below). What is the reason for that? The
column that I am adding have string names and byte values, written using
Hector 1.1-3 (
HFactory.createColumn(...) method).

CREATE TABLE mytable (
  key text,
  *column1* ascii,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=1.00 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};

cqlsh 3.0.2
Cassandra 1.2.5
CQL spec 3.0.0
Thrift protocol 19.36.0


Given this, I can also only query on this one column1 or value using the
'SELECT' statement.

The OpsCenter on the other hand, displays multiple columns as
expected. Basically the demarcation of multiple columns i clearer.

Thanks a lot.

Regards,
Shahab


Re: node tool ring displays 33.33% owns on 3 node cluster with replication

2013-07-12 Thread Andrew Bialecki
Not sure if it's the best/intended behavior, but you should see it go back
to 100% if you run: nodetool -h 127.0.0.1 -p 8080 ring keyspace.

I think the rationale for showing 33% is that different keyspaces might
have different RFs, so it's unclear what to show for ownership. However, if
you include the keyspace as part of your query, you'll get it weighted by
the RF of that keyspace. I believe the same logic applies for nodetool
status.

Andrew


On Thu, Jul 11, 2013 at 12:58 PM, Jason Tyler jaty...@yahoo-inc.com wrote:

  Thanks Rob!  I was able to confirm with getendpoints.

  Cheers,

  ~Jason

   From: Robert Coli rc...@eventbrite.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Wednesday, July 10, 2013 4:09 PM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Cc: Francois Richard frich...@yahoo-inc.com
 Subject: Re: node tool ring displays 33.33% owns on 3 node cluster with
 replication

   On Wed, Jul 10, 2013 at 4:04 PM, Jason Tyler jaty...@yahoo-inc.comwrote:

  Is this simply a display issue, or have I lost replication?


  Almost certainly just a display issue. Do nodetool -h localhost
 getendpoints keyspace columnfamily 0, which will tell you the
 endpoints for the non-transformed key 0. It should give you 3 endpoints.
 You could also do this test with a known existing key and then go to those
 nodes and verify that they have that data on disk via sstable2json.

  (FWIW, it is an odd display issue/bug if it is one. Because it has
 reverted to pre-1.1 behavior...)

  =Rob



Re: Compression ratio

2013-07-12 Thread Yuki Morishita
it's compressed/original.

https://github.com/apache/cassandra/blob/cassandra-1.1.11/src/java/org/apache/cassandra/io/sstable/SSTableMetadata.java#L124

On Fri, Jul 12, 2013 at 10:02 AM, cem cayiro...@gmail.com wrote:
 Hi All,

 Can anyone explain the compression ratio?

 Is it the compressed data / original or original/ compressed ? Or
 something else.

 thanks a lot.

 Best Regards,
 Cem



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: Compression ratio

2013-07-12 Thread cem
Thank you very much!


On Fri, Jul 12, 2013 at 5:59 PM, Yuki Morishita mor.y...@gmail.com wrote:

 it's compressed/original.


 https://github.com/apache/cassandra/blob/cassandra-1.1.11/src/java/org/apache/cassandra/io/sstable/SSTableMetadata.java#L124

 On Fri, Jul 12, 2013 at 10:02 AM, cem cayiro...@gmail.com wrote:
  Hi All,
 
  Can anyone explain the compression ratio?
 
  Is it the compressed data / original or original/ compressed ? Or
  something else.
 
  thanks a lot.
 
  Best Regards,
  Cem



 --
 Yuki Morishita
  t:yukim (http://twitter.com/yukim)



Re: How many DCs can you have in a cluster?

2013-07-12 Thread sankalp kohli
More than the DC, I think you will be bound  by number of replicas. I dont
know how it will work in case of 10-20 replication factor specially for
range queries.


On Thu, Jul 11, 2013 at 7:14 PM, Blair Zajac bl...@orcaware.com wrote:

 In this C* Summit 2013 talk titled A Deep Dive Into How Cassandra
 Resolves Inconsistent Data [1], Jason Brown of Netflix mentions that they
 have 5 data centers  in the same cluster, two in the US, one in Europe, one
 in Brazil and one in Asia (I'm going from memory now since I don't want to
 watch the video again).

 Is there a practical limit on how many different data centers one can have
 in a single cluster?

 Thanks,
 Blair

 [1] http://www.youtube.com/watch?**v=VRZk-NhfX18list=**
 PLqcm6qE9lgKJzVvwHprow9h7KMpb5**hcUUindex=57http://www.youtube.com/watch?v=VRZk-NhfX18list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUUindex=57



AUTO : Samuel CARRIERE is out of the office (retour 07/08/2013)

2013-07-12 Thread Samuel CARRIERE


Je suis absent(e) du bureau jusqu'au 07/08/2013




Remarque : ceci est une réponse automatique à votre message  Compression
ratio envoyé le 12/07/2013 17:02:11.

C'est la seule notification que vous recevrez pendant l'absence de cette
personne.

Re: How many DCs can you have in a cluster?

2013-07-12 Thread Blair Zajac
Yes, there's going to be a lot of replicas in total, but the replication factor 
will be 3 in each DC.  Will it still be an issue?

Blair

On Jul 12, 2013, at 10:58 AM, sankalp kohli kohlisank...@gmail.com wrote:

 More than the DC, I think you will be bound  by number of replicas. I dont 
 know how it will work in case of 10-20 replication factor specially for range 
 queries.
 
 
 On Thu, Jul 11, 2013 at 7:14 PM, Blair Zajac bl...@orcaware.com wrote:
 In this C* Summit 2013 talk titled A Deep Dive Into How Cassandra Resolves 
 Inconsistent Data [1], Jason Brown of Netflix mentions that they have 5 
 data centers  in the same cluster, two in the US, one in Europe, one in 
 Brazil and one in Asia (I'm going from memory now since I don't want to 
 watch the video again).
 
 Is there a practical limit on how many different data centers one can have 
 in a single cluster?
 
 Thanks,
 Blair
 
 [1] 
 http://www.youtube.com/watch?v=VRZk-NhfX18list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUUindex=57
 


Re: Timeout reading row from CF with collections

2013-07-12 Thread Paul Ingalls
Yep, that was it.  I built from the cassandra 1.2 branch and no more timeouts.  
Thanks for getting that fix into 1.2!

Paul

On Jul 12, 2013, at 1:20 AM, Sylvain Lebresne sylv...@datastax.com wrote:

 My bet is that you're hitting 
 https://issues.apache.org/jira/browse/CASSANDRA-5677.
 
 --
 Sylvain
 
 
 On Fri, Jul 12, 2013 at 8:17 AM, Paul Ingalls paulinga...@gmail.com wrote:
 I'm running into a problem trying to read data from a column family that 
 includes a number of collections.  
 
 Cluster details:
 4 nodes running 1.2.6 on VMs with 4 cpus and 7 Gb of ram.
 raid 0 striped across 4 disks for the data and logs
 each node has about 500 MB of data currently loaded
 
 Here is the schema:
 
 create table user_scores
 (
   user_id varchar,
   post_type varchar,
   score double,
   team_to_score_map mapvarchar, double,
   affiliation_to_score_map mapvarchar, double,
   campaign_to_score_map mapvarchar, double,
   person_to_score_map mapvarchar, double,
   primary key(user_id, post_type)
 )
 with compaction =
 {
   'class' : 'LeveledCompactionStrategy',
   'sstable_size_in_mb' : 10
 };
 
 I used the leveled compaction strategy as I thought it would help with read 
 latency…
 
 Here is a trace of a simple select against the cluster when it had nothing 
 else was reading or writing (cpu was  2%):
 
  activity| 
 timestamp| source | source_elapsed
 -+--++
   execute_cql3_query | 
 05:51:34,557 |  100.69.176.51 |  0
 Message received from /100.69.176.51 | 
 05:51:34,195 | 100.69.184.134 |102
  Executing single-partition query on user_scores | 
 05:51:34,199 | 100.69.184.134 |   3512
 Acquiring sstable references | 
 05:51:34,199 | 100.69.184.134 |   3741
  Merging memtable tombstones | 
 05:51:34,199 | 100.69.184.134 |   3890
  Key cache hit for sstable 5 | 
 05:51:34,199 | 100.69.184.134 |   4040
  Seeking to partition beginning in data file | 
 05:51:34,199 | 100.69.184.134 |   4059
   Merging data from memtables and 1 sstables | 
 05:51:34,200 | 100.69.184.134 |   4412
  Parsing select * from user_scores where user_id='26257166' LIMIT 1; | 
 05:51:34,558 |  100.69.176.51 | 91
   Peparing statement | 
 05:51:34,558 |  100.69.176.51 |238
Enqueuing data request to /100.69.184.134 | 
 05:51:34,558 |  100.69.176.51 |567
   Sending message to /100.69.184.134 | 
 05:51:34,558 |  100.69.176.51 |979
 Request complete | 
 05:51:54,562 |  100.69.176.51 |   20005209
 
 You can see that I increased the timeout and it still fails.  This seems to 
 happen with rows that have maps with a larger number of entries.  It is very 
 reproducible with my current data set.
 
 Any ideas on why I can't query for a row?
 
 Thanks!
 
 Paul
 
 
 



hot sstables evicted from page cache on compaction causing high latency

2013-07-12 Thread John Watson
Having a real issue where at the completion of large compactions, it will
evict hot sstables from the kernel page cache causing huge read latency
while it is backfilled.

https://dl.dropboxusercontent.com/s/149h7ssru0dapkg/Screen%20Shot%202013-07-12%20at%201.46.19%20PM.png

Blue line - page cache
Green line - disk read latency (ms)
Red line - CF read latency (ms)

The beginning of both high latency plateaus correspond with the completion
of a compaction.

Seems like applying/enabling this will help?
https://issues.apache.org/jira/browse/CASSANDRA-4937

- C* 1.2.6
- 3 Nodes
- 24G RAM (8G heap)
- (2) 3TB 7.2k disks using JBOD feature of C*


Re: Alternate major compaction

2013-07-12 Thread Robert Coli
On Thu, Jul 11, 2013 at 9:43 PM, Takenori Sato ts...@cloudian.com wrote:

 I made the repository public. Now you can checkout from here.

 https://github.com/cloudian/support-tools

 checksstablegarbage is the tool.

 Enjoy, and any feedback is welcome.


Thanks very much, useful tool!

Out of curiousity, what does writesstablekeys do that the upstream tool
sstablekeys does not?

=Rob


Re: Representation of dynamically added columns in table (column family) schema using cqlsh

2013-07-12 Thread Eric Stevens
If you're creating dynamic columns via Thrift interface, they will not be
reflected in the CQL3 schema.  I would recommend not mixing paradigms like
that, either stick with CQL3 or Thrift / cassandra-cli.  With compact
storage creates column families which can be interacted with meaningfully
via Thrift, but you'll be lacking any metadata on those columns to interact
with them via cql.


On Fri, Jul 12, 2013 at 11:13 AM, Shahab Yunus shahab.yu...@gmail.comwrote:

 A basic question and it seems that I have a gap in my understanding.

 I have a simple table in Cassandra with multiple column families. I add
 new columns to each of these column families on the fly. When I view (using
 the 'DESCRIBE table' command) the schema of a particular column family, I
 see only one entry for column (bolded below). What is the reason for that?
 The column that I am adding have string names and byte values, written
 using Hector 1.1-3 (
 HFactory.createColumn(...) method).

 CREATE TABLE mytable (
   key text,
   *column1* ascii,
   value blob,
   PRIMARY KEY (key, column1)
 ) WITH COMPACT STORAGE AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=1.00 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh 3.0.2
 Cassandra 1.2.5
 CQL spec 3.0.0
 Thrift protocol 19.36.0


 Given this, I can also only query on this one column1 or value using the
 'SELECT' statement.

 The OpsCenter on the other hand, displays multiple columns as
 expected. Basically the demarcation of multiple columns i clearer.

 Thanks a lot.

 Regards,
 Shahab



Minimum CPU and RAM for Cassandra and Hadoop Cluster

2013-07-12 Thread Martin Arrowsmith
Dear Cassandra experts,

I have an HP Proliant ML350 G8 server, and I want to put virtual
servers on it. I would like to put the maximum number of nodes
for a Cassandra + Hadoop cluster. I was wondering - what is the
minimum RAM and memory per node I that I need to have Cassandra + Hadoop
before the performance decreases are not worth the extra nodes?

Also, what is the suggested typical number of CPU cores / Node ? Would
it make sense to have 1 core / node ? Less than that ?

Any insight is appreciated! Thanks very much for your time!

Martin


Re: Representation of dynamically added columns in table (column family) schema using cqlsh

2013-07-12 Thread Shahab Yunus
Thanks Eric for the explanation.

Regards,
Shahab


On Fri, Jul 12, 2013 at 11:13 AM, Shahab Yunus shahab.yu...@gmail.comwrote:

 A basic question and it seems that I have a gap in my understanding.

 I have a simple table in Cassandra with multiple column families. I add
 new columns to each of these column families on the fly. When I view (using
 the 'DESCRIBE table' command) the schema of a particular column family, I
 see only one entry for column (bolded below). What is the reason for that?
 The column that I am adding have string names and byte values, written
 using Hector 1.1-3 (
 HFactory.createColumn(...) method).

 CREATE TABLE mytable (
   key text,
   *column1* ascii,
   value blob,
   PRIMARY KEY (key, column1)
 ) WITH COMPACT STORAGE AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=1.00 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};

 cqlsh 3.0.2
 Cassandra 1.2.5
 CQL spec 3.0.0
 Thrift protocol 19.36.0


 Given this, I can also only query on this one column1 or value using the
 'SELECT' statement.

 The OpsCenter on the other hand, displays multiple columns as
 expected. Basically the demarcation of multiple columns i clearer.

 Thanks a lot.

 Regards,
 Shahab



Re: Node tokens / data move

2013-07-12 Thread Radim Kolar

its possible to change num_tokens on node with data?

i changed it and restarted node but it still has same amount in nodetool 
status.


Re: Alternate major compaction

2013-07-12 Thread Takenori Sato
It's light. Without -v option, you can even run it against just a SSTable
file without needing the whole Cassandra installation.

- Takenori


On Sat, Jul 13, 2013 at 6:18 AM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Jul 11, 2013 at 9:43 PM, Takenori Sato ts...@cloudian.com wrote:

 I made the repository public. Now you can checkout from here.

 https://github.com/cloudian/support-tools

 checksstablegarbage is the tool.

 Enjoy, and any feedback is welcome.


 Thanks very much, useful tool!

 Out of curiousity, what does writesstablekeys do that the upstream tool
 sstablekeys does not?

 =Rob



Re: Rhombus - A time-series object store for Cassandra

2013-07-12 Thread Ananth Gundabattula
Hello Rob,

Thanks for the pointer. I have a couple of queries:

How does this project compare to the KairosDb project on github ( For one
I see that Rhombus supports multi column query which is cool whereas
kairos time series DB/OpenTSDB do not seem to have such a feature -
although we can use the tags to achieve something similar ? )

Are there any roll ups performed automatically by Rhombus ?

 Can we control the TTL of the data being inserted ?

I am looking at the some of the time series based projects for production
use preferably running on top of cassandra and was wondering if Rhombus
can be seen as a pure time series optimized schema or something more than
that ? 

Regards,
Ananth 




On 7/12/13 7:15 AM, Rob Righter rob.righ...@pardot.com wrote:

Hello,

Just wanted to share a project that we have been working on. It's a
time-series object store for Cassandra. We tried to generalize the
common use cases for storing time-series data in Cassandra and
automatically handle the denormalization, indexing, and wide row
sharding. It currently exists as a Java Library. We have it deployed
as a web service in a Dropwizard app server with a REST style
interface. The plan is to eventually release that Dropwizard app too.

The project and explanation is available on Github at:
https://github.com/Pardot/Rhombus

I would love to hear feedback.

Many Thanks,
Rob