Re: can't start cqlsh on new Amazon node

2012-11-08 Thread Tamar Fraenkel
Hi
A bit more info on that
I have one working setup with
python-cql1.0.9-1
python-thrift  0.6.0-2~riptano1
cassandra1.0.8

The setup where cqlsh is not working has:
python-cql1.0.10-1
python-thrift  0.6.0-2~riptano1
cassandra1.0.11

Maybe this will give someone a hint of what the problem may be and how to
solve it.
Thanks!

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Thu, Nov 8, 2012 at 9:38 AM, Tamar Fraenkel ta...@tok-media.com wrote:

 Nope...
 Same error:

 *cqlsh --debug --cql3 localhost 9160*

 Using CQL driver: module 'cql' from
 '/usr/lib/pymodules/python2.6/cql/__init__.pyc'
 Using thrift lib: module 'thrift' from
 '/usr/lib/pymodules/python2.6/thrift/__init__.pyc'
 Connection error: Invalid method name: 'set_cql_version'

 I believe it is some version mismatch. But this was DataStax AMI, I
 thought all should be coordinated, and I am not sure what to check for.


 Thanks,

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956





 On Thu, Nov 8, 2012 at 4:56 AM, Jason Wee peich...@gmail.com wrote:

 should it be --cql3 ?
 http://www.datastax.com/docs/1.1/dml/using_cql#start-cql3



 On Wed, Nov 7, 2012 at 11:16 PM, Tamar Fraenkel ta...@tok-media.comwrote:

 Hi!
 I installed new cluster using DataStax AMI with --release 1.0.11, so I
 have cassandra 1.0.11 installed.
 Nodes have python-cql 1.0.10-1 and python2.6

 Cluster works well, BUT when I try to connect to the cqlsh I get:
 *cqlsh --debug --cqlversion=2 localhost 9160*
 Using CQL driver: module 'cql' from
 '/usr/lib/pymodules/python2.6/cql/__init__.pyc'
 Using thrift lib: module 'thrift' from
 '/usr/lib/pymodules/python2.6/thrift/__init__.pyc'
 Connection error: Invalid method name: 'set_cql_version'
 *
 *This is the same if I chose cqlversion=3*

 *Any idea how to solve?*

 *Thanks,*

 Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956






tokLogo.png

Storage limit for a particular user on Cassandra

2012-11-08 Thread mallikharjun.vemana
Hi,

Is there a way we can limit the data of a particular user on the Cassandra 
cluster?

Say for example, I have three users namely, Jsmith, Elvis, Dilbert configured 
in my Cassandra deployment.
And I wanted to limit the data usage for them as follows.

Jsmith - 1 GB
Elvis - 2 GB
Dilbert - 500 MB

Is there a way to achieve by fine tuning the configuration?
If not, any workarounds?

Thanks,
~Mallik.



Compact and Repair

2012-11-08 Thread Henrik Schröder
Hi,

We recently ran a major compaction across our cluster, which reduced the
storage used by about 50%. This is fine, since we do a lot of updates to
existing data, so that's the expected result.

The day after, we ran a full repair -pr across the cluster, and when that
finished, each storage node was at about the same size as before the major
compaction. Why does that happen? What gets transferred to other nodes, and
why does it suddenly take up a lot of space again?

We haven't run repair -pr regularly, so is this just something that happens
on the first weekly run, and can we expect a different result next week? Or
does repair always cause the data to grow on each node? To me it just
doesn't seem proportional?


/Henrik


Re: Strange delay in query

2012-11-08 Thread André Cruz
On Nov 7, 2012, at 12:15 PM, André Cruz andre.c...@co.sapo.pt wrote:

 This error also happens on my application that uses pycassa, so I don't think 
 this is the same bug.

I have narrowed it down to a slice between two consecutive columns. Observe 
this behaviour using pycassa:

 DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
  column_count=2, 
 column_start=uuid.UUID('13957152-234b-11e2-92bc-e0db550199f4')).keys()
DEBUG 2012-11-08 11:55:51,170 pycassa_library.pool:30 6849 139928791262976 
Connection 52905488 (xxx:9160) was checked out from pool 51715344
DEBUG 2012-11-08 11:55:53,415 pycassa_library.pool:37 6849 139928791262976 
Connection 52905488 (xxx:9160) was checked in to pool 51715344
[UUID('13957152-234b-11e2-92bc-e0db550199f4'), 
UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')]

A two column slice took more than 2s to return. If I request the next 2 column 
slice:

 DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
  column_count=2, 
 column_start=uuid.UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')).keys()
DEBUG 2012-11-08 11:57:32,750 pycassa_library.pool:30 6849 139928791262976 
Connection 52904912 (xxx:9160) was checked out from pool 51715344
DEBUG 2012-11-08 11:57:32,774 pycassa_library.pool:37 6849 139928791262976 
Connection 52904912 (xxx:9160) was checked in to pool 51715344
[UUID('40b7ae4e-2449-11e2-8610-e0db550199f4'), 
UUID('a364b028-2449-11e2-8882-e0db550199f4')]

This takes 20msec... Is there a rational explanation for this different 
behaviour? Is there some threshold that I'm running into? Is there any way to 
obtain more debugging information about this problem?

Thanks,
André

Re: Compact and Repair

2012-11-08 Thread Henrik Schröder
No, we're not using columns with TTL, and I performed a major compaction
before the repair, so there shouldn't be vast amounts of tombstones moving
around.

And the increase happened during the repair, the nodes gained ~20-30GB each.


/Henrik


On Thu, Nov 8, 2012 at 12:40 PM, horschi hors...@gmail.com wrote:

 Hi,

 is it possible that your repair is overrepairing due to any of the issues
 discussed here:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/repair-compaction-and-tombstone-rows-td7583481.html?


 I've seen repair increasing the load on my cluster, but what you are
 describing sounds like a lot to me.

 Does this increase happen due to repair entirely? Or was the load maybe
 increasing gradually over the week and you just checked for the first time?

 cheers,
 Christian



 On Thu, Nov 8, 2012 at 11:55 AM, Henrik Schröder skro...@gmail.comwrote:

 Hi,

 We recently ran a major compaction across our cluster, which reduced the
 storage used by about 50%. This is fine, since we do a lot of updates to
 existing data, so that's the expected result.

 The day after, we ran a full repair -pr across the cluster, and when that
 finished, each storage node was at about the same size as before the major
 compaction. Why does that happen? What gets transferred to other nodes, and
 why does it suddenly take up a lot of space again?

 We haven't run repair -pr regularly, so is this just something that
 happens on the first weekly run, and can we expect a different result next
 week? Or does repair always cause the data to grow on each node? To me it
 just doesn't seem proportional?


 /Henrik





Re: Compact and Repair

2012-11-08 Thread Alain RODRIGUEZ
Did you change the RF or had a node down since you repaired last time ?


2012/11/8 Henrik Schröder skro...@gmail.com

 No, we're not using columns with TTL, and I performed a major compaction
 before the repair, so there shouldn't be vast amounts of tombstones moving
 around.

 And the increase happened during the repair, the nodes gained ~20-30GB
 each.


 /Henrik



 On Thu, Nov 8, 2012 at 12:40 PM, horschi hors...@gmail.com wrote:

 Hi,

 is it possible that your repair is overrepairing due to any of the issues
 discussed here:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/repair-compaction-and-tombstone-rows-td7583481.html?


 I've seen repair increasing the load on my cluster, but what you are
 describing sounds like a lot to me.

 Does this increase happen due to repair entirely? Or was the load maybe
 increasing gradually over the week and you just checked for the first time?

 cheers,
 Christian



 On Thu, Nov 8, 2012 at 11:55 AM, Henrik Schröder skro...@gmail.comwrote:

 Hi,

 We recently ran a major compaction across our cluster, which reduced the
 storage used by about 50%. This is fine, since we do a lot of updates to
 existing data, so that's the expected result.

 The day after, we ran a full repair -pr across the cluster, and when
 that finished, each storage node was at about the same size as before the
 major compaction. Why does that happen? What gets transferred to other
 nodes, and why does it suddenly take up a lot of space again?

 We haven't run repair -pr regularly, so is this just something that
 happens on the first weekly run, and can we expect a different result next
 week? Or does repair always cause the data to grow on each node? To me it
 just doesn't seem proportional?


 /Henrik






Re: Compact and Repair

2012-11-08 Thread Henrik Schröder
No, we haven't changed RF, but it's been a very long time since we repaired
last, so we're guessing this is an effect of not running repair regularly,
and that doing it regularly will fix it. It would just be nice to know.

Also, running major compaction after the repair made the data size shrink
back to what it was before, soe clearly a lot of junk data was sent over on
that repair, most probably tombstones of some kind, as discussed in the
other thread.


/Henrik


On Thu, Nov 8, 2012 at 1:53 PM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 Did you change the RF or had a node down since you repaired last time ?


 2012/11/8 Henrik Schröder skro...@gmail.com

 No, we're not using columns with TTL, and I performed a major compaction
 before the repair, so there shouldn't be vast amounts of tombstones moving
 around.

 And the increase happened during the repair, the nodes gained ~20-30GB
 each.


 /Henrik



 On Thu, Nov 8, 2012 at 12:40 PM, horschi hors...@gmail.com wrote:

 Hi,

 is it possible that your repair is overrepairing due to any of the
 issues discussed here:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/repair-compaction-and-tombstone-rows-td7583481.html?


 I've seen repair increasing the load on my cluster, but what you are
 describing sounds like a lot to me.

 Does this increase happen due to repair entirely? Or was the load maybe
 increasing gradually over the week and you just checked for the first time?

 cheers,
 Christian



 On Thu, Nov 8, 2012 at 11:55 AM, Henrik Schröder skro...@gmail.comwrote:

 Hi,

 We recently ran a major compaction across our cluster, which reduced
 the storage used by about 50%. This is fine, since we do a lot of updates
 to existing data, so that's the expected result.

 The day after, we ran a full repair -pr across the cluster, and when
 that finished, each storage node was at about the same size as before the
 major compaction. Why does that happen? What gets transferred to other
 nodes, and why does it suddenly take up a lot of space again?

 We haven't run repair -pr regularly, so is this just something that
 happens on the first weekly run, and can we expect a different result next
 week? Or does repair always cause the data to grow on each node? To me it
 just doesn't seem proportional?


 /Henrik







How to insert composite column in CQL3?

2012-11-08 Thread Alan Ristić
Hi there!

I'm strugguling to figure out (for quite few hours now) how can I insert
for example column with TimeUUID name and empy value in CQL3 in fictional
table. And what's the table design? I'm interested in syntax (e.g. example).

I'm trying to do something like Matt Dennis did here (*Cassandra NYC 2011:
Matt Dennis - Data Modeling Workshop*):
http://www.youtube.com/watch?v=OzBJrQZjge0t=9m45s

Is that even possible in CQL3? Tnx.

Lp,
*Alan Ristić*


Re: Strange delay in query

2012-11-08 Thread Andrey Ilinykh
What is the size of columns? Probably those two are huge.


On Thu, Nov 8, 2012 at 4:01 AM, André Cruz andre.c...@co.sapo.pt wrote:

 On Nov 7, 2012, at 12:15 PM, André Cruz andre.c...@co.sapo.pt wrote:

  This error also happens on my application that uses pycassa, so I don't
 think this is the same bug.

 I have narrowed it down to a slice between two consecutive columns.
 Observe this behaviour using pycassa:

 
 DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
 column_count=2,
 column_start=uuid.UUID('13957152-234b-11e2-92bc-e0db550199f4')).keys()
 DEBUG 2012-11-08 11:55:51,170 pycassa_library.pool:30 6849 139928791262976
 Connection 52905488 (xxx:9160) was checked out from pool 51715344
 DEBUG 2012-11-08 11:55:53,415 pycassa_library.pool:37 6849 139928791262976
 Connection 52905488 (xxx:9160) was checked in to pool 51715344
 [UUID('13957152-234b-11e2-92bc-e0db550199f4'),
 UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')]

 A two column slice took more than 2s to return. If I request the next 2
 column slice:

 
 DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
 column_count=2,
 column_start=uuid.UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')).keys()
 DEBUG 2012-11-08 11:57:32,750 pycassa_library.pool:30 6849 139928791262976
 Connection 52904912 (xxx:9160) was checked out from pool 51715344
 DEBUG 2012-11-08 11:57:32,774 pycassa_library.pool:37 6849 139928791262976
 Connection 52904912 (xxx:9160) was checked in to pool 51715344
 [UUID('40b7ae4e-2449-11e2-8610-e0db550199f4'),
 UUID('a364b028-2449-11e2-8882-e0db550199f4')]

 This takes 20msec... Is there a rational explanation for this different
 behaviour? Is there some threshold that I'm running into? Is there any way
 to obtain more debugging information about this problem?

 Thanks,
 André


leveled compaction and tombstoned data

2012-11-08 Thread B. Todd Burruss
we are having the problem where we have huge SSTABLEs with tombstoned data
in them that is not being compacted soon enough (because size tiered
compaction requires, by default, 4 like sized SSTABLEs).  this is using
more disk space than we anticipated.

we are very write heavy compared to reads, and we delete the data after N
number of days (depends on the column family, but N is around 7 days)

my question is would leveled compaction help to get rid of the tombstoned
data faster than size tiered, and therefore reduce the disk space usage?

thx


Re: leveled compaction and tombstoned data

2012-11-08 Thread Radim Kolar

Dne 8.11.2012 19:12, B. Todd Burruss napsal(a):
my question is would leveled compaction help to get rid of the 
tombstoned data faster than size tiered, and therefore reduce the disk 
space usage?


leveled compaction will kill your performance. get patch from jira for 
maximum sstable size per CF and force cassandra to make smaller tables, 
they expire faster.




Re: How to insert composite column in CQL3?

2012-11-08 Thread Alan Ristić
Ok, this article answered all the confusion in my head:
http://www.datastax.com/dev/blog/thrift-to-cql3

It's a must read for noobs (like me). It perfectly explains mappings and
diffs between internals and CQL3(abstractions). First read this and THEN go
study all the resources out there ;)

Lp,
Alan Ristić

Lp,
*Alan Ristić*

*m*: 040 423 688



2012/11/8 Alan Ristić alan.ris...@gmail.com

 Hi there!

 I'm strugguling to figure out (for quite few hours now) how can I insert
 for example column with TimeUUID name and empy value in CQL3 in fictional
 table. And what's the table design? I'm interested in syntax (e.g. example).

 I'm trying to do something like Matt Dennis did here (*Cassandra NYC
 2011: Matt Dennis - Data Modeling Workshop*):
 http://www.youtube.com/watch?v=OzBJrQZjge0t=9m45s

 Is that even possible in CQL3? Tnx.

 Lp,
 *Alan Ristić*




Kundera 2.2 released

2012-11-08 Thread Amresh Kumar Singh
Hi All,

We are happy to announce release of Kundera 2.2.

Kundera is a JPA 2.0 based, object-datastore mapping library for NoSQL 
datastores. The idea behind Kundera is to make working with NoSQL Databases
drop-dead simple and fun. It currently supports Cassandra, HBase, MongoDB and 
relational databases.

Major Changes in this release:
---
* Geospatial Persistence and Queries for MongoDB
* Composite keys support for Cassandra and MongoDB
* Cassandra 1.1.6 migration
* Support for enum data type
* Named and Native queries support for REST based access

Github Issues Fixes (https://github.com/impetus-opensource/Kundera/issues):
--
Issue 136 - JPQL queries without WHERE clause or parameters fail
Issue 135 - MongoDB: enable WriteConcern, Safe mode and other properties on 
operation level.
Issue 133 - Externalize the database connection configuration
Issue 132 - problem in loading entity metadata when giving class name in class 
tag of persistence.xml
Issue 130 - Row not fully deleted from cassandra on em.remove(obj) - then 
cannot reinsert row with same key

We have revamped our wiki, so you might want to have a look at it here:
https://github.com/impetus-opensource/Kundera/wiki

To download, use or contribute to Kundera, visit:
http://github.com/impetus-opensource/Kundera

Latest released tag version is 2.2. Kundera maven libraries are now available 
at: https://oss.sonatype.org/content/repositories/releases/com/impetus

Sample codes and examples for using Kundera can be found here:
http://github.com/impetus-opensource/Kundera-Examples
and
https://github.com/impetus-opensource/Kundera/tree/trunk/kundera-tests

Thank you all for your contributions!

Regards,
Kundera Team.



Neustar VP and Impetus CEO to present on 'Innovative information services 
powered by Cloud and Big Data technologies'at Cloud Expo - Santa Clara, Nov 
6th. http://www.impetus.com/events#2.

Check out Impetus contribution to build Luminar - a new business unit at 
Entravision. http://lf1.me/MS/


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: leveled compaction and tombstoned data

2012-11-08 Thread B. Todd Burruss
we are running Datastax enterprise and cannot patch it.  how bad is
kill performance?  if it is so bad, why is it an option?


On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar h...@filez.com wrote:
 Dne 8.11.2012 19:12, B. Todd Burruss napsal(a):

 my question is would leveled compaction help to get rid of the tombstoned
 data faster than size tiered, and therefore reduce the disk space usage?

 leveled compaction will kill your performance. get patch from jira for
 maximum sstable size per CF and force cassandra to make smaller tables, they
 expire faster.



Re: leveled compaction and tombstoned data

2012-11-08 Thread Aaron Turner
kill performance is relative.  Leveled Compaction basically costs 2x disk
IO.  Look at iostat, etc and see if you have the headroom.

There are also ways to bring up a test node and just run Level Compaction
on that.  Wish I had a URL handy, but hopefully someone else can find it.

Also, if you're not using compression, check it out.

On Thu, Nov 8, 2012 at 11:20 AM, B. Todd Burruss bto...@gmail.com wrote:

 we are running Datastax enterprise and cannot patch it.  how bad is
 kill performance?  if it is so bad, why is it an option?


 On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar h...@filez.com wrote:
  Dne 8.11.2012 19:12, B. Todd Burruss napsal(a):
 
  my question is would leveled compaction help to get rid of the
 tombstoned
  data faster than size tiered, and therefore reduce the disk space usage?
 
  leveled compaction will kill your performance. get patch from jira for
  maximum sstable size per CF and force cassandra to make smaller tables,
 they
  expire faster.
 




-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix 
Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero


Re: leveled compaction and tombstoned data

2012-11-08 Thread Jeremy Hanna
LCS works well in specific circumstances, this blog post gives some good 
considerations: http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

On Nov 8, 2012, at 1:33 PM, Aaron Turner synfina...@gmail.com wrote:

 kill performance is relative.  Leveled Compaction basically costs 2x disk 
 IO.  Look at iostat, etc and see if you have the headroom.
 
 There are also ways to bring up a test node and just run Level Compaction on 
 that.  Wish I had a URL handy, but hopefully someone else can find it.
 
 Also, if you're not using compression, check it out.
 
 On Thu, Nov 8, 2012 at 11:20 AM, B. Todd Burruss bto...@gmail.com wrote:
 we are running Datastax enterprise and cannot patch it.  how bad is
 kill performance?  if it is so bad, why is it an option?
 
 
 On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar h...@filez.com wrote:
  Dne 8.11.2012 19:12, B. Todd Burruss napsal(a):
 
  my question is would leveled compaction help to get rid of the tombstoned
  data faster than size tiered, and therefore reduce the disk space usage?
 
  leveled compaction will kill your performance. get patch from jira for
  maximum sstable size per CF and force cassandra to make smaller tables, they
  expire faster.
 
 
 
 
 -- 
 Aaron Turner
 http://synfin.net/ Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  
 Windows
 Those who would give up essential Liberty, to purchase a little temporary 
 Safety, deserve neither Liberty nor Safety.  
 -- Benjamin Franklin
 carpe diem quam minimum credula postero
 



Re: leveled compaction and tombstoned data

2012-11-08 Thread Brandon Williams
On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner synfina...@gmail.com wrote:
 There are also ways to bring up a test node and just run Level Compaction on
 that.  Wish I had a URL handy, but hopefully someone else can find it.

This rather handsome fellow wrote a blog about it:
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling

-Brandon


Re: leveled compaction and tombstoned data

2012-11-08 Thread Ben Coverston
http://www.datastax.com/docs/1.1/operations/tuning#testing-compaction-and-compression

Write Survey mode.

After you have it up and running you can modify the column family mbean to
use LeveledCompactionStrategy on that node to see how your hardware/load
fares with LCS.


On Thu, Nov 8, 2012 at 11:33 AM, Aaron Turner synfina...@gmail.com wrote:

 kill performance is relative.  Leveled Compaction basically costs 2x
 disk IO.  Look at iostat, etc and see if you have the headroom.

 There are also ways to bring up a test node and just run Level Compaction
 on that.  Wish I had a URL handy, but hopefully someone else can find it.

 Also, if you're not using compression, check it out.


 On Thu, Nov 8, 2012 at 11:20 AM, B. Todd Burruss bto...@gmail.com wrote:

 we are running Datastax enterprise and cannot patch it.  how bad is
 kill performance?  if it is so bad, why is it an option?


 On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar h...@filez.com wrote:
  Dne 8.11.2012 19:12, B. Todd Burruss napsal(a):
 
  my question is would leveled compaction help to get rid of the
 tombstoned
  data faster than size tiered, and therefore reduce the disk space
 usage?
 
  leveled compaction will kill your performance. get patch from jira for
  maximum sstable size per CF and force cassandra to make smaller tables,
 they
  expire faster.
 




 --
 Aaron Turner
 http://synfin.net/ Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix 
 Windows
 Those who would give up essential Liberty, to purchase a little temporary
 Safety, deserve neither Liberty nor Safety.
 -- Benjamin Franklin
 carpe diem quam minimum credula postero




-- 
Ben Coverston
DataStax -- The Apache Cassandra Company


Re: leveled compaction and tombstoned data

2012-11-08 Thread Ben Coverston
Also to answer your question, LCS is well suited to workloads where
overwrites and tombstones come into play. The tombstones are _much_ more
likely to be merged with LCS than STCS.

I would be careful with the patch that was referred to above, it hasn't
been reviewed, and from a glance it appears that it will cause an infinite
compaction loop if you get more than 4 SSTables at max size.



On Thu, Nov 8, 2012 at 11:41 AM, Brandon Williams dri...@gmail.com wrote:

 On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner synfina...@gmail.com wrote:
  There are also ways to bring up a test node and just run Level
 Compaction on
  that.  Wish I had a URL handy, but hopefully someone else can find it.

 This rather handsome fellow wrote a blog about it:

 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling

 -Brandon




-- 
Ben Coverston
DataStax -- The Apache Cassandra Company


Re: leveled compaction and tombstoned data

2012-11-08 Thread B. Todd Burruss
thanks for the links!  i had forgotten about live sampling

On Thu, Nov 8, 2012 at 11:41 AM, Brandon Williams dri...@gmail.com wrote:
 On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner synfina...@gmail.com wrote:
 There are also ways to bring up a test node and just run Level Compaction on
 that.  Wish I had a URL handy, but hopefully someone else can find it.

 This rather handsome fellow wrote a blog about it:
 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling

 -Brandon


Re: leveled compaction and tombstoned data

2012-11-08 Thread B. Todd Burruss
@ben, thx, we will be deploying 2.2.1 of DSE soon and will try to
setup a traffic sampling node so we can test leveled compaction.

we essentially keep a rolling window of data written once.  it is
written, then after N days it is deleted, so it seems that leveled
compaction should help

On Thu, Nov 8, 2012 at 11:53 AM, B. Todd Burruss bto...@gmail.com wrote:
 thanks for the links!  i had forgotten about live sampling

 On Thu, Nov 8, 2012 at 11:41 AM, Brandon Williams dri...@gmail.com wrote:
 On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner synfina...@gmail.com wrote:
 There are also ways to bring up a test node and just run Level Compaction on
 that.  Wish I had a URL handy, but hopefully someone else can find it.

 This rather handsome fellow wrote a blog about it:
 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling

 -Brandon


Re: Hinted Handoff runs every ten minutes

2012-11-08 Thread Mike Heffner
Is there a ticket open for this for 1.1.6?

We also noticed this after upgrading from 1.1.3 to 1.1.6. Every node runs a
0 row hinted handoff every 10 minutes. N-1 nodes hint to the same node,
while that node hints to another node.


On Tue, Oct 30, 2012 at 1:35 PM, Vegard Berget p...@fantasista.no wrote:

 Hi,

 I have the exact same problem with 1.1.6.  HintsColumnFamily consists of
 one row (Rowkey 00, nothing more).   The problem started after upgrading
 from 1.1.4 to 1.1.6.  Every ten minutes HintedHandoffManager starts and
 finishes  after sending 0 rows.

 .vegard,



 - Original Message -
 From:
 user@cassandra.apache.org

 To:
 user@cassandra.apache.org
 Cc:

 Sent:
 Mon, 29 Oct 2012 23:45:30 +0100

 Subject:
 Re: Hinted Handoff runs every ten minutes


 Dne 29.10.2012 23:24, Stephen Pierce napsal(a):
  I'm running 1.1.5; the bug says it's fixed in 1.0.9/1.1.0.
 
  How can I check to see why it keeps running HintedHandoff?
 you have tombstone is system.HintsColumnFamily use list command in
 cassandra-cli to check




-- 

  Mike Heffner m...@librato.com
  Librato, Inc.


Multiple keyspaces vs Multiple CFs

2012-11-08 Thread sankalp kohli
Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100
keyspaces with 1 CF each.
I am talking in terms of memory footprint.
Also I would be interested to know how much better one is over other.

Thanks,
Sankalp


Re: Multiple keyspaces vs Multiple CFs

2012-11-08 Thread Edward Capriolo
it is better to have one keyspace unless you need to replicate the
keyspaces differently. The main reason for this is that changing
keyspaces requires an RPC operation. Having 10 keyspaces would mean
having 10 connection pools.

On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli kohlisank...@gmail.com wrote:
 Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100
 keyspaces with 1 CF each.
 I am talking in terms of memory footprint.
 Also I would be interested to know how much better one is over other.

 Thanks,
 Sankalp


Re: Multiple keyspaces vs Multiple CFs

2012-11-08 Thread sankalp kohli
Which connection pool are you talking about?


On Thu, Nov 8, 2012 at 2:19 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 it is better to have one keyspace unless you need to replicate the
 keyspaces differently. The main reason for this is that changing
 keyspaces requires an RPC operation. Having 10 keyspaces would mean
 having 10 connection pools.

 On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli kohlisank...@gmail.com
 wrote:
  Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100
  keyspaces with 1 CF each.
  I am talking in terms of memory footprint.
  Also I would be interested to know how much better one is over other.
 
  Thanks,
  Sankalp



Read during digest mismatch

2012-11-08 Thread sankalp kohli
Hi,
Lets say I am reading with consistency TWO and my replication is 3. The
read is eligible for global read repair. It will send a request to get data
from one node and a digest request to two.
If there is a digest mismatch, what I am reading from the code looks like
it will get the data from all three nodes and do a resolve of the data
before returning to the client.

Is it correct or I am readind the code wrong?

Also if this is correct, look like if the third node is in other DC, the
read will slow down even when the consistency was TWO?

Thanks,
Sankalp


Re: Strange delay in query

2012-11-08 Thread Josep Blanquer
Can it be that you have tons and tons of tombstoned columns in the middle
of these two? I've seen plenty of performance issues with wide
rows littered with column tombstones (you could check with dumping the
sstables...)

Just a thought...

Josep M.

On Thu, Nov 8, 2012 at 12:23 PM, André Cruz andre.c...@co.sapo.pt wrote:

 These are the two columns in question:

 = (super_column=13957152-234b-11e2-92bc-e0db550199f4,
  (column=attributes, value=, timestamp=1351681613263657)
  (column=blocks,
 value=A4edo5MhHvojv3Ihx_JkFMsF3ypthtBvAZkoRHsjulw06pez86OHch3K3OpmISnDjHODPoCf69bKcuAZSJj-4Q,
 timestamp=1351681613263657)
  (column=hash,
 value=8_p2QaeRaX_QwJbUWQ07ZqlNHei7ixu0MHxgu9oennfYOGfyH6EsEe_LYO8V8EC_1NPL44Gx8B7UhYV9VSb7Lg,
 timestamp=1351681613263657)
  (column=icon, value=image_jpg, timestamp=1351681613263657)
  (column=is_deleted, value=true, timestamp=1351681613263657)
  (column=is_dir, value=false, timestamp=1351681613263657)
  (column=mime_type, value=image/jpeg, timestamp=1351681613263657)
  (column=mtime, value=1351646803, timestamp=1351681613263657)
  (column=name, value=/Mobile Photos/Photo 2012-10-28 17_13_50.jpeg,
 timestamp=1351681613263657)
  (column=revision, value=13957152-234b-11e2-92bc-e0db550199f4,
 timestamp=1351681613263657)
  (column=size, value=1379001, timestamp=1351681613263657)
  (column=thumb_exists, value=true, timestamp=1351681613263657))
 = (super_column=40b7ae4e-2449-11e2-8610-e0db550199f4,
  (column=attributes, value={posix: 420}, timestamp=1351790781154800)
  (column=blocks,
 value=9UCDkHNb8-8LuKr2bv9PjKcWCT0v7FCZa0ebNSflES4-o7QD6eYschVaweCKSbR29Dq2IeGl_Cu7BVnYJYphTQ,
 timestamp=1351790781154800)
  (column=hash,
 value=kao2EV8jw_wN4EBoMkCXZWCwg3qQ0X6m9_X9JIGkEkiGKJE_JeKgkdoTAkAefXgGtyhChuhWPlWMxl_tX7VZUw,
 timestamp=1351790781154800)
  (column=icon, value=text_txt, timestamp=1351790781154800)
  (column=is_dir, value=false, timestamp=1351790781154800)
  (column=mime_type, value=text/plain, timestamp=1351790781154800)
  (column=mtime, value=1351378576, timestamp=1351790781154800)
  (column=name, value=/Documents/VIMDocument.txt,
 timestamp=1351790781154800)
  (column=revision, value=40b7ae4e-2449-11e2-8610-e0db550199f4,
 timestamp=1351790781154800)
  (column=size, value=13, timestamp=1351790781154800)
  (column=thumb_exists, value=false, timestamp=1351790781154800))


 I don't think their size is an issue here.

 André

 On Nov 8, 2012, at 6:04 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 What is the size of columns? Probably those two are huge.


 On Thu, Nov 8, 2012 at 4:01 AM, André Cruz andre.c...@co.sapo.pt wrote:

 On Nov 7, 2012, at 12:15 PM, André Cruz andre.c...@co.sapo.pt wrote:

  This error also happens on my application that uses pycassa, so I don't
 think this is the same bug.

 I have narrowed it down to a slice between two consecutive columns.
 Observe this behaviour using pycassa:

 
 DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
 column_count=2,
 column_start=uuid.UUID('13957152-234b-11e2-92bc-e0db550199f4')).keys()
 DEBUG 2012-11-08 11:55:51,170 pycassa_library.pool:30 6849
 139928791262976 Connection 52905488 (xxx:9160) was checked out from pool
 51715344
 DEBUG 2012-11-08 11:55:53,415 pycassa_library.pool:37 6849
 139928791262976 Connection 52905488 (xxx:9160) was checked in to pool
 51715344
 [UUID('13957152-234b-11e2-92bc-e0db550199f4'),
 UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')]

 A two column slice took more than 2s to return. If I request the next 2
 column slice:

 
 DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
 column_count=2,
 column_start=uuid.UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')).keys()
 DEBUG 2012-11-08 11:57:32,750 pycassa_library.pool:30 6849
 139928791262976 Connection 52904912 (xxx:9160) was checked out from pool
 51715344
 DEBUG 2012-11-08 11:57:32,774 pycassa_library.pool:37 6849
 139928791262976 Connection 52904912 (xxx:9160) was checked in to pool
 51715344
 [UUID('40b7ae4e-2449-11e2-8610-e0db550199f4'),
 UUID('a364b028-2449-11e2-8882-e0db550199f4')]

 This takes 20msec... Is there a rational explanation for this different
 behaviour? Is there some threshold that I'm running into? Is there any way
 to obtain more debugging information about this problem?

 Thanks,
 André






Re: Multiple keyspaces vs Multiple CFs

2012-11-08 Thread Edward Capriolo
Any connection pool. Imagine if you have 10 column families in 10
keyspaces. You pull a connection off the pool and the odds are 1 in 10
of it being connected to the keyspace you want. So 9 out of 10 times
you have to have a network round trip just to change the keyspace, or
you have to build a keyspace aware connection pool.
Edward

On Thu, Nov 8, 2012 at 5:36 PM, sankalp kohli kohlisank...@gmail.com wrote:
 Which connection pool are you talking about?


 On Thu, Nov 8, 2012 at 2:19 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

 it is better to have one keyspace unless you need to replicate the
 keyspaces differently. The main reason for this is that changing
 keyspaces requires an RPC operation. Having 10 keyspaces would mean
 having 10 connection pools.

 On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli kohlisank...@gmail.com
 wrote:
  Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100
  keyspaces with 1 CF each.
  I am talking in terms of memory footprint.
  Also I would be interested to know how much better one is over other.
 
  Thanks,
  Sankalp




Re: Multiple keyspaces vs Multiple CFs

2012-11-08 Thread Edward Capriolo
In the old days the API looked like this.

  client.insert(Keyspace1,
 key_user_id,
   new ColumnPath(Standard1, null, name.getBytes(UTF-8)),
  Chris Goffinet.getBytes(UTF-8),
   timestamp,
   ConsistencyLevel.ONE);

but now it works like this

/pay attention to this below -/
client.set_keyspace(keyspace1);
/pay attention to this above -/
  client.insert(
 key_user_id,
 new ColumnPath(Standard1, null,
name.getBytes(UTF-8)),
  Chris Goffinet.getBytes(UTF-8),
  timestamp,
  ConsistencyLevel.ONE);

So each time you switch keyspaces you make a network round trip.

On Thu, Nov 8, 2012 at 6:17 PM, sankalp kohli kohlisank...@gmail.com wrote:
 I am a bit confused. One connection pool I know is the one which
 MessageService has to other nodes. Then there will be incoming connections
 via thrift from clients. How are they affected by multiple keyspaces?


 On Thu, Nov 8, 2012 at 3:14 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

 Any connection pool. Imagine if you have 10 column families in 10
 keyspaces. You pull a connection off the pool and the odds are 1 in 10
 of it being connected to the keyspace you want. So 9 out of 10 times
 you have to have a network round trip just to change the keyspace, or
 you have to build a keyspace aware connection pool.
 Edward

 On Thu, Nov 8, 2012 at 5:36 PM, sankalp kohli kohlisank...@gmail.com
 wrote:
  Which connection pool are you talking about?
 
 
  On Thu, Nov 8, 2012 at 2:19 PM, Edward Capriolo edlinuxg...@gmail.com
  wrote:
 
  it is better to have one keyspace unless you need to replicate the
  keyspaces differently. The main reason for this is that changing
  keyspaces requires an RPC operation. Having 10 keyspaces would mean
  having 10 connection pools.
 
  On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli kohlisank...@gmail.com
  wrote:
   Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100
   keyspaces with 1 CF each.
   I am talking in terms of memory footprint.
   Also I would be interested to know how much better one is over other.
  
   Thanks,
   Sankalp
 
 




Re: Multiple keyspaces vs Multiple CFs

2012-11-08 Thread sankalp kohli
I think this code is from the thrift part. I use hector. In hector, I can
create multiple keyspace objects for each keyspace and use them when I want
to talk to that keyspace. Why will it need to do a round trip to the server
for each switch.


On Thu, Nov 8, 2012 at 3:28 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 In the old days the API looked like this.

   client.insert(Keyspace1,
  key_user_id,
new ColumnPath(Standard1, null, name.getBytes(UTF-8)),
   Chris Goffinet.getBytes(UTF-8),
timestamp,
ConsistencyLevel.ONE);

 but now it works like this

 /pay attention to this below -/
 client.set_keyspace(keyspace1);
 /pay attention to this above -/
   client.insert(
  key_user_id,
  new ColumnPath(Standard1, null,
 name.getBytes(UTF-8)),
   Chris Goffinet.getBytes(UTF-8),
   timestamp,
   ConsistencyLevel.ONE);

 So each time you switch keyspaces you make a network round trip.

 On Thu, Nov 8, 2012 at 6:17 PM, sankalp kohli kohlisank...@gmail.com
 wrote:
  I am a bit confused. One connection pool I know is the one which
  MessageService has to other nodes. Then there will be incoming
 connections
  via thrift from clients. How are they affected by multiple keyspaces?
 
 
  On Thu, Nov 8, 2012 at 3:14 PM, Edward Capriolo edlinuxg...@gmail.com
  wrote:
 
  Any connection pool. Imagine if you have 10 column families in 10
  keyspaces. You pull a connection off the pool and the odds are 1 in 10
  of it being connected to the keyspace you want. So 9 out of 10 times
  you have to have a network round trip just to change the keyspace, or
  you have to build a keyspace aware connection pool.
  Edward
 
  On Thu, Nov 8, 2012 at 5:36 PM, sankalp kohli kohlisank...@gmail.com
  wrote:
   Which connection pool are you talking about?
  
  
   On Thu, Nov 8, 2012 at 2:19 PM, Edward Capriolo 
 edlinuxg...@gmail.com
   wrote:
  
   it is better to have one keyspace unless you need to replicate the
   keyspaces differently. The main reason for this is that changing
   keyspaces requires an RPC operation. Having 10 keyspaces would mean
   having 10 connection pools.
  
   On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli 
 kohlisank...@gmail.com
   wrote:
Is it better to have 10 Keyspaces with 10 CF in each keyspace. or
 100
keyspaces with 1 CF each.
I am talking in terms of memory footprint.
Also I would be interested to know how much better one is over
 other.
   
Thanks,
Sankalp
  
  
 
 



Re: Loading data on-demand in Cassandra

2012-11-08 Thread sal
Pierre Chalamet pierre at chalamet.net writes:

 
 Hi,You do not need to have 700 Go of data in RAM. Cassandra is able to store
on disks and query from there if data is not cached in memory. Caches are
maintained by C* by itself but you still have to some configuration.Supposing
you want to store around 800 Go and with a RF=3, you will need at least 6
servers if you want to store all data of your db (keeping max 400 Go per server)
: 800x3/400=6.There is no native implementation of trigger in C*. Anyway, there
is an extension bringing this feature:
https://github.com/hmsonline/cassandra-triggers. This should allow you to be
notified of mutations (ie: not query). Some peoples on this ML are involved in
this, maybe they could help on this.Cheers,- Pierre
 From:  Oliver Plohmann oliver at objectscape.org
 
 Date: Sun, 12 Aug 2012 21:24:43 +0200
 To: user at cassandra.apache.org
 ReplyTo:  user at cassandra.apache.org
 
 Subject: Loading data on-demand in Cassandra
 
 Hello,
 I'm looking a bit into Cassandra to
   see whether it would be something to go with for my company. I
   searched through the Internet, looked through the FAQs, etc. but
   there are still some few open questions. Hope I don't bother
   anybody with the usual beginner questions ...
 Is there a way to do load-on-demand
   of data in Cassandra? For the time being, we cannot afford to
   built up a cluster that holds our 700 GB SQL-Database in RAM. So
   we need to be able to load data on-demand from our relational
   database. Can this be done in Cassandra? Then there also needs to
   be a way to unload data in order to reclaim RAM space. Would be
   nice if it were possible to register for an asynchronous
   notification in case some value was changed. Can this be done?
 Thanks for any answers.
   Regards, Oliver
   

I would consider looking into distributed caching technology (ehcache, gemfire)






unsubscribe

2012-11-08 Thread Jeremy McKay





smime.p7s
Description: S/MIME cryptographic signature


Re: get_range_slice gets no rowcache support?

2012-11-08 Thread Manu Zhang
I did overlook something. get_range_slice will invoke cfs.getRawCachedRow
instead of cfs.getThroughCache. Hence, no row will be cached if it's not
present in the row cache. Well, this puzzles me further as to that how the
range of rows is expected to get stored into the row cache in the first
place.

Would someone please clarify it for me? Thanks in advance.


On Thu, Nov 8, 2012 at 3:23 PM, Manu Zhang owenzhang1...@gmail.com wrote:

 I've asked this question before. And after reading the source codes, I
 find that get_range_slice doesn't query rowcache before reading from
 Memtable and SSTable. I just want to make sure whether I've overlooked
 something. If my observation is correct, what's the consideration here?


Re: Multiple keyspaces vs Multiple CFs

2012-11-08 Thread Edward Capriolo
It is not as bad with hector, but still each Keyspace object is
another socket open to Cassandra. If you have 500 webservers and 10
keyspaces. Instead of having 5000 connections you now have 5000.

On Thu, Nov 8, 2012 at 6:35 PM, sankalp kohli kohlisank...@gmail.com wrote:
 I think this code is from the thrift part. I use hector. In hector, I can
 create multiple keyspace objects for each keyspace and use them when I want
 to talk to that keyspace. Why will it need to do a round trip to the server
 for each switch.


 On Thu, Nov 8, 2012 at 3:28 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

 In the old days the API looked like this.

   client.insert(Keyspace1,
  key_user_id,
new ColumnPath(Standard1, null,
 name.getBytes(UTF-8)),
   Chris Goffinet.getBytes(UTF-8),
timestamp,
ConsistencyLevel.ONE);

 but now it works like this

 /pay attention to this below -/
 client.set_keyspace(keyspace1);
 /pay attention to this above -/
   client.insert(
  key_user_id,
  new ColumnPath(Standard1, null,
 name.getBytes(UTF-8)),
   Chris Goffinet.getBytes(UTF-8),
   timestamp,
   ConsistencyLevel.ONE);

 So each time you switch keyspaces you make a network round trip.

 On Thu, Nov 8, 2012 at 6:17 PM, sankalp kohli kohlisank...@gmail.com
 wrote:
  I am a bit confused. One connection pool I know is the one which
  MessageService has to other nodes. Then there will be incoming
  connections
  via thrift from clients. How are they affected by multiple keyspaces?
 
 
  On Thu, Nov 8, 2012 at 3:14 PM, Edward Capriolo edlinuxg...@gmail.com
  wrote:
 
  Any connection pool. Imagine if you have 10 column families in 10
  keyspaces. You pull a connection off the pool and the odds are 1 in 10
  of it being connected to the keyspace you want. So 9 out of 10 times
  you have to have a network round trip just to change the keyspace, or
  you have to build a keyspace aware connection pool.
  Edward
 
  On Thu, Nov 8, 2012 at 5:36 PM, sankalp kohli kohlisank...@gmail.com
  wrote:
   Which connection pool are you talking about?
  
  
   On Thu, Nov 8, 2012 at 2:19 PM, Edward Capriolo
   edlinuxg...@gmail.com
   wrote:
  
   it is better to have one keyspace unless you need to replicate the
   keyspaces differently. The main reason for this is that changing
   keyspaces requires an RPC operation. Having 10 keyspaces would mean
   having 10 connection pools.
  
   On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli
   kohlisank...@gmail.com
   wrote:
Is it better to have 10 Keyspaces with 10 CF in each keyspace. or
100
keyspaces with 1 CF each.
I am talking in terms of memory footprint.
Also I would be interested to know how much better one is over
other.
   
Thanks,
Sankalp
  
  
 
 




Indexing Data in Cassandra with Elastic Search

2012-11-08 Thread Brian O'Neill
For those looking to index data in Cassandra with Elastic Search, here
is what we decided to do:
http://brianoneill.blogspot.com/2012/11/big-data-quadfecta-cassandra-storm.html

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


read request distribution

2012-11-08 Thread Wei Zhu
Hi All,
I am doing a benchmark on a Cassandra. I have a three node cluster with RF=3. I 
generated 6M rows with sequence  number from 1 to 6m, so the rows should be 
evenly distributed among the three nodes disregarding the replicates. 

I am doing a benchmark with read only requests, I generate read request for 
randomly generated keys from 1 to 6M. Oddly, nodetool cfstats, reports that one 
node has only half the requests as the other one and the third node sits in the 
middle. So the ratio is like 2:3:4. The node with the most read requests 
actually has the smallest latency and the one with the least read requests 
reports the largest latency. The difference is pretty big, the fastest is 
almost double the slowest.

All three nodes have the exactly the same hardware and the data size on each 
node are the same since the RF is three and all of them have the complete data. 
I am using Hector as client and the random read request are in millions. I 
can't think of a reasonable explanation.  Can someone please shed some lights?

Thanks.
-Wei


Re: composite column validation_class question

2012-11-08 Thread Wei Zhu
Any thoughts?

Thanks.
-Wei





 From: Wei Zhu wz1...@yahoo.com
To: Cassandr usergroup user@cassandra.apache.org 
Sent: Wednesday, November 7, 2012 12:47 PM
Subject: composite column validation_class question
 

Hi All,
I am trying to design my schema using composite column. One thing I am a bit 
confused is how to define validation_class for the composite column, or is 
there a way to define it?
for the composite column, I might insert different value based on the column 
name, for example
I will insert date for column created: 

set user[1]['7:1:100:created'] = 1351728000; 

and insert String for description

set user[1]['7:1:100:desc'] = my description; 

I don't see a way to define validation_class for composite column. Am I right?

Thanks.
-Wei