[RELEASE] Apache Cassandra 1.2.16 released

2014-03-31 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.2.16.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a maintenance/bug fix release[1] on the 1.2 series. As
always,
please pay attention to the release notes[2] and Let us know[3] if you were
to
encounter any problem.

Enjoy!

[1]: http://goo.gl/E5Q9Cq (CHANGES.txt)
[2]: http://goo.gl/bQJhms (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Meaning of token column in system.peers and system.local

2014-03-31 Thread Theo Hultberg
your assumption about 256 tokens per node is correct.

as for you second question, it seems to me like most of your assumptions
are correct, but I'm not sure I understand them correctly. hopefully
someone else can answer this better. tokens are a property of the cluster
and not the keyspace. the first replica of any token will be the same for
all keyspaces, but with different replication factors the other replicas
will differ.

when you query the system.local and system.peers tables you must make sure
that you don't connect to other nodes. I think the inconsistency you think
you found is because the first and second queries went to different nodes.
the java driver will connect to all nodes and load balance requests by
default.

T#


On Mon, Mar 31, 2014 at 4:06 AM, Clint Kelly clint.ke...@gmail.com wrote:

 BTW one other thing that I have not been able to debug today that maybe
 someone can help me with:

 I am using a three-node Cassandra cluster with Vagrant.  The nodes in my
 cluster are 192.168.200.11, 192.168.200.12, and 192.168.200.13.

 If I use cqlsh to connect to 192.168.200.11, I see unique sets of tokens
 when I run the following three commands:

 select tokens from system.local
 select tokens from system.peers where peer=192.168.200.12
 select tokens from system.peers where peer=192.168.200.13

 This is what I expect.  However, when I tried making an application with
 the Java driver that does the following:


- Create a Session by connecting to 192.168.200.11
- From that session, select tokens from system.local
- From that session, select tokens, peer from system.peers

 Now I get the exact-same set of tokens from system.local and from the row
 in system.peers in which peer=192.168.200.13.

 Anyone have any idea why this would happen?  I'm not sure how to debug
 this.  I see the following log from the Java driver:

 14/03/30 19:05:24 DEBUG com.datastax.driver.core.Cluster: Starting new
 cluster with contact points [/192.168.200.11]
 14/03/30 19:05:24 INFO com.datastax.driver.core.Cluster: New Cassandra
 host /192.168.200.13 added
 14/03/30 19:05:24 INFO com.datastax.driver.core.Cluster: New Cassandra
 host /192.168.200.12 added

 I'm running Cassandra 2.0.6 in the virtual machine and I built my
 application with version 2.0.1 of the driver.

 Best regards,
 Clint







 On Sun, Mar 30, 2014 at 4:51 PM, Clint Kelly clint.ke...@gmail.comwrote:

 Hi all,


 I am working on a Hadoop InputFormat implementation that uses only the
 native protocol Java driver and not the Thrift API.  I am currently trying
 to replicate some of the behavior of
 *Cassandra.client.describe_ring(myKeyspace)* from the Thrift API.  I
 would like to do the following:

- Get a list of all of the token ranges for a cluster
- For every token range, determine the replica nodes on which the
data in the token range resides
- Estimate the number of rows for every range of tokens
- Groups ranges of tokens on common replica nodes such that we can
create a set of input splits for Hadoop with total estimated line counts
that are reasonably close to the requested split size

 Last week I received some much-appreciated help on this list that pointed
 me to using the system.peers table to get the list of token ranges for the
 cluster and the corresponding hosts.  Today I created a three-node C*
 cluster in Vagrant (https://github.com/dholbrook/vagrant-cassandra) and
 tried inspecting some of the system tables.  I have a couple of questions
 now:

 1. *How many total unique tokens should I expect to see in my cluster?*
 If I have three nodes, and each node has a cassandra.yaml with num_tokens =
 256, then should I expect a total of 256*3 = 768 distinct vnodes?

 2. *How does the creation of vnodes and their assignment to nodes relate
 to the replication factor for a given keyspace?*  I never thought about
 this until today, and I tried to reread the documentation on virtual nodes,
 replication in Cassandra, etc., and now I am sadly still confused.  Here is
 what I think I understand.  :)

- Given a row with a partition key, any client request for an
operation on that row will go to a coordinator node in the cluster.
- The coordinator node will compute the token value for the row and
from that determine a set of replica nodes for that token.
   - One of the replica nodes I assume is the node that owns the
   vnode with the token range that encompasses the token
   - The identity of the owner of this virtual node is a
   cross-keyspace property
   - And the other replicas were originally chosen based on the
   replica-placement strategy
   - And therefore the other replicas will be different for each
   keyspace (because replication factors and replica-placement strategy 
 are
   properties of a keyspace)

 3. What do the values in the token column in system.peers and
 system.local refer to then?

- Since these tables appear to be global, and 

Re: Meaning of token column in system.peers and system.local

2014-03-31 Thread Clint Kelly
Hi Theo,

Thanks for your response.  I understand what you are saying with
regard to the load balancing.  I posted my question to the DataStax
list and one of the folks there answered it.  I put his response below
(for anyone who may be curious):

Sylvain Lebresne sylv...@datastax.com

4:03 AM (4 hours ago)

to java-driver-us.
The system tables are a bit specific in the sense that they are local
to the node that coordinate the query. And by default the java driver
round robin the queries over the node of the cluster. The result is
that more likely than not, your two system queries (on system.local
and system.peers) do not reach the same coordinator, hence what you
see.

It's possible to enforce that both query goes to the same coordinator
by mean of modifying/providing a custom load balancing policy. You
could for instance write a wrapper Statement class, that allow to
specify which node is supposed to be contacted, and then write a
custom load balancing policy that recognize this wrapper class and
force the user provided host if there is one (and say fallback on
another load balancing policy otherwise). Or, simpler but somewhat
less flexible, if all you want is to have 2 requests go to the same
coordinator (which is enough to get all tokens of a cluster really),
then you can make sure to use TokenAwarePolicy (a good idea anyway),
and make sure both query have the same routing key (whatever it is
is not all that important, you can use an empty ByteBuffer), see
SimpleStatement.setRoutingKey().

Note that I would agree that what's suggested above is slightly
involved and could be supported more natively by the driver. And I
do plan on exposing the cluster tokens more simply in particular
(probably directly from the Host object, it's just a todo not yet
done. And I'll probably add the load balancing stuff + Statement
wrapper I describe above, because that's probably somewhat generally
useful for debugging for instance.  Still, it's possible to do
currently, just a bit more involved than is probably necessary.

--
Sylvain

On Mon, Mar 31, 2014 at 3:30 AM, Theo Hultberg t...@iconara.net wrote:
 your assumption about 256 tokens per node is correct.

 as for you second question, it seems to me like most of your assumptions are
 correct, but I'm not sure I understand them correctly. hopefully someone
 else can answer this better. tokens are a property of the cluster and not
 the keyspace. the first replica of any token will be the same for all
 keyspaces, but with different replication factors the other replicas will
 differ.

 when you query the system.local and system.peers tables you must make sure
 that you don't connect to other nodes. I think the inconsistency you think
 you found is because the first and second queries went to different nodes.
 the java driver will connect to all nodes and load balance requests by
 default.

 T#


 On Mon, Mar 31, 2014 at 4:06 AM, Clint Kelly clint.ke...@gmail.com wrote:

 BTW one other thing that I have not been able to debug today that maybe
 someone can help me with:

 I am using a three-node Cassandra cluster with Vagrant.  The nodes in my
 cluster are 192.168.200.11, 192.168.200.12, and 192.168.200.13.

 If I use cqlsh to connect to 192.168.200.11, I see unique sets of tokens
 when I run the following three commands:

 select tokens from system.local
 select tokens from system.peers where peer=192.168.200.12
 select tokens from system.peers where peer=192.168.200.13

 This is what I expect.  However, when I tried making an application with
 the Java driver that does the following:

 Create a Session by connecting to 192.168.200.11
 From that session, select tokens from system.local
 From that session, select tokens, peer from system.peers

 Now I get the exact-same set of tokens from system.local and from the row
 in system.peers in which peer=192.168.200.13.

 Anyone have any idea why this would happen?  I'm not sure how to debug
 this.  I see the following log from the Java driver:

 14/03/30 19:05:24 DEBUG com.datastax.driver.core.Cluster: Starting new
 cluster with contact points [/192.168.200.11]
 14/03/30 19:05:24 INFO com.datastax.driver.core.Cluster: New Cassandra
 host /192.168.200.13 added
 14/03/30 19:05:24 INFO com.datastax.driver.core.Cluster: New Cassandra
 host /192.168.200.12 added

 I'm running Cassandra 2.0.6 in the virtual machine and I built my
 application with version 2.0.1 of the driver.

 Best regards,
 Clint







 On Sun, Mar 30, 2014 at 4:51 PM, Clint Kelly clint.ke...@gmail.com
 wrote:

 Hi all,


 I am working on a Hadoop InputFormat implementation that uses only the
 native protocol Java driver and not the Thrift API.  I am currently trying
 to replicate some of the behavior of
 Cassandra.client.describe_ring(myKeyspace) from the Thrift API.  I would
 like to do the following:

 Get a list of all of the token ranges for a cluster
 For every token range, determine the replica nodes on which the data in
 the token range resides
 Estimate 

Re: cassandra 2.0.6 refuses to start

2014-03-31 Thread Marcin Cabaj
Hi Tim,

exec is a shell builtin command, what kind of shell do you use?
Please run:
$ echo $SHELL
$ exec




On Sat, Mar 29, 2014 at 11:10 PM, Tim Dunphy bluethu...@gmail.com wrote:

  hey all..

 love using the cassandra database.  however I've just installed 2.0.6 onto
 a new host running CentOS 6.5 and when I try to run ./bin/cassandra -f
 (from within the cassandra directory) I see this weird error I've never
 seen before

 ./bin/cassandra: line 146: exec: : not found

 What the heck??? exec is a pretty basica comand you find on all unix
 systems or so I thought!

  Really confused here.. can anyone offer some help me get cassandra up and
 running on this host?

 Thanks,

 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B




Re: Cassandra Chef cookbook - weird bug with broadcast_address: 10.0.2.15

2014-03-31 Thread Marcin Cabaj
Hi Clint,

I'm guessing you are using vagrant. The thing is cassandra-chef-cookbook
use template cassandra.yaml.erb, where you can find:
broadcast_address: %= node[:cassandra][:broadcast_address] % which in
turn is equal to node[:ipaddress].
Value of node[:ipaddress] depends on how do you configure networking in
vagrant/vbox, with default networking configuration node[:ipaddress] is
equal 10.0.2.15 hence your broadcast_address.
You can setup networking in different way or setup attribute
node[:cassandra][:broadcast_address] manually.



On Mon, Mar 31, 2014 at 3:03 AM, Clint Kelly clint.ke...@gmail.com wrote:

 All,

 Has anyone used the Cassandra Chef cookbook
 https://github.com/michaelklishin/cassandra-chef-cookbook and seen
 broadcast_address: 10.0.2.15 in /etc/cassandra/cassandra.yaml?  I looked
 through the source code for the cookbook and I have no idea how this is
 happening.

 I was able to fix this by just commenting out the broadcast_address in the
 template for /etc/cassandra/cassanrda.yaml and moving on, but this is
 pretty strange!

 Best regards,
 Clint



Re: Cassandra Chef cookbook - weird bug with broadcast_address: 10.0.2.15

2014-03-31 Thread Clint Kelly
Hi Marcin,

You are correct that I am using Vagrant.  Sorry for not specifying that.

OMG you are correct.  I spent about an hour over the weekend trying to
figure out what was going on.  I got confused because listen_address
is also set to node[:ipaddress], and listen_address was always set
correctly, but that is because it was set directly.  Oh my goodness.
Thanks for your help, this is really really embarrassing!

Best regards,
Clint

On Mon, Mar 31, 2014 at 8:27 AM, Marcin Cabaj marcin.ca...@datasift.com wrote:
 Hi Clint,

 I'm guessing you are using vagrant. The thing is cassandra-chef-cookbook use
 template cassandra.yaml.erb, where you can find:
 broadcast_address: %= node[:cassandra][:broadcast_address] % which in
 turn is equal to node[:ipaddress].
 Value of node[:ipaddress] depends on how do you configure networking in
 vagrant/vbox, with default networking configuration node[:ipaddress] is
 equal 10.0.2.15 hence your broadcast_address.
 You can setup networking in different way or setup attribute
 node[:cassandra][:broadcast_address] manually.



 On Mon, Mar 31, 2014 at 3:03 AM, Clint Kelly clint.ke...@gmail.com wrote:

 All,

 Has anyone used the Cassandra Chef cookbook
 https://github.com/michaelklishin/cassandra-chef-cookbook and seen
 broadcast_address: 10.0.2.15 in /etc/cassandra/cassandra.yaml?  I looked
 through the source code for the cookbook and I have no idea how this is
 happening.

 I was able to fix this by just commenting out the broadcast_address in the
 template for /etc/cassandra/cassanrda.yaml and moving on, but this is pretty
 strange!

 Best regards,
 Clint




Re: cassandra 2.0.6 refuses to start

2014-03-31 Thread Tim Dunphy
Hi Marcin,

 Thanks! I'm running the bash shell. And for some reason it also looks like
bash does understand 'exec'.

 [root@beta:~] #echo $SHELL
/bin/bash
[root@beta:~] #exec


Why it suddenly looses that understanding when it runs the cassandra start
script, I have no clue.

I even tried changing the script from sh to bash (!#/bin/sh to
!#/bin/bash). No luck.

Thanks
Tim


On Mon, Mar 31, 2014 at 11:17 AM, Marcin Cabaj marcin.ca...@datasift.comwrote:

 Hi Tim,

 exec is a shell builtin command, what kind of shell do you use?
 Please run:
 $ echo $SHELL
 $ exec




 On Sat, Mar 29, 2014 at 11:10 PM, Tim Dunphy bluethu...@gmail.com wrote:

  hey all..

 love using the cassandra database.  however I've just installed 2.0.6
 onto a new host running CentOS 6.5 and when I try to run ./bin/cassandra -f
 (from within the cassandra directory) I see this weird error I've never
 seen before

 ./bin/cassandra: line 146: exec: : not found

 What the heck??? exec is a pretty basica comand you find on all unix
 systems or so I thought!

  Really confused here.. can anyone offer some help me get cassandra up
 and running on this host?

 Thanks,

 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B





-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Row cache for writes

2014-03-31 Thread Wayne Schroeder
I found a lot of documentation about the read path for key and row caches, but 
I haven't found anything in regard to the write path.  My app has the need to 
record a large quantity of very short lived temporal data that will expire 
within seconds and only have a small percentage of the rows accessed before 
they expire.  Ideally, and I have done the math, I would like the data to never 
hit disk and just stay in memory once written until it expires.  How might I 
accomplish this?  I am not concerned about data consistency at all on this so 
if I could even avoid the commit log, that would be even better.

My main concern is that I don't see any evidence that writes end up in the 
cache—that it takes at least one read to get it into the cache.  I also realize 
that, assuming I don't cause SSTable writes due to sheer quantity, that the 
data would be in memory anyway.

Has anyone done anything similar to this that could provide direction?

Wayne



Re: cassandra 2.0.6 refuses to start

2014-03-31 Thread Tim Dunphy

 Have you tried to run it as another user, not root?


Yep! With no change in result. I get the exact same error message running
as a non-privileged user.

Thanks
Tim


On Mon, Mar 31, 2014 at 12:08 PM, Marcin Cabaj marcin.ca...@datasift.comwrote:

 Have you tried to run it as another user, not root?


 On Mon, Mar 31, 2014 at 4:52 PM, Tim Dunphy bluethu...@gmail.com wrote:

 Hi Marcin,

  Thanks! I'm running the bash shell. And for some reason it also looks
 like bash does understand 'exec'.

  [root@beta:~] #echo $SHELL
 /bin/bash
 [root@beta:~] #exec


 Why it suddenly looses that understanding when it runs the cassandra
 start script, I have no clue.

 I even tried changing the script from sh to bash (!#/bin/sh to
 !#/bin/bash). No luck.

 Thanks
 Tim


 On Mon, Mar 31, 2014 at 11:17 AM, Marcin Cabaj marcin.ca...@datasift.com
  wrote:

 Hi Tim,

 exec is a shell builtin command, what kind of shell do you use?
 Please run:
 $ echo $SHELL
 $ exec




 On Sat, Mar 29, 2014 at 11:10 PM, Tim Dunphy bluethu...@gmail.comwrote:

  hey all..

 love using the cassandra database.  however I've just installed 2.0.6
 onto a new host running CentOS 6.5 and when I try to run ./bin/cassandra -f
 (from within the cassandra directory) I see this weird error I've never
 seen before

 ./bin/cassandra: line 146: exec: : not found

 What the heck??? exec is a pretty basica comand you find on all unix
 systems or so I thought!

  Really confused here.. can anyone offer some help me get cassandra up
 and running on this host?

 Thanks,

 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B





 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B





-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: Row cache for writes

2014-03-31 Thread Robert Coli
On Mon, Mar 31, 2014 at 9:37 AM, Wayne Schroeder 
wschroe...@pinsightmedia.com wrote:

 I found a lot of documentation about the read path for key and row caches,
 but I haven't found anything in regard to the write path.  My app has the
 need to record a large quantity of very short lived temporal data that will
 expire within seconds and only have a small percentage of the rows accessed
 before they expire.  Ideally, and I have done the math, I would like the
 data to never hit disk and just stay in memory once written until it
 expires.  How might I accomplish this?


http://en.wikipedia.org/wiki/Memcached

=Rob


Re: Row cache for writes

2014-03-31 Thread Wayne Schroeder
Perhaps I should clarify my question.  Is this possible / how might I 
accomplish this with cassandra?

Wayne


On Mar 31, 2014, at 12:58 PM, Robert Coli 
rc...@eventbrite.commailto:rc...@eventbrite.com
 wrote:

On Mon, Mar 31, 2014 at 9:37 AM, Wayne Schroeder 
wschroe...@pinsightmedia.commailto:wschroe...@pinsightmedia.com wrote:
I found a lot of documentation about the read path for key and row caches, but 
I haven't found anything in regard to the write path.  My app has the need to 
record a large quantity of very short lived temporal data that will expire 
within seconds and only have a small percentage of the rows accessed before 
they expire.  Ideally, and I have done the math, I would like the data to never 
hit disk and just stay in memory once written until it expires.  How might I 
accomplish this?

http://en.wikipedia.org/wiki/Memcached

=Rob




Re: Meaning of token column in system.peers and system.local

2014-03-31 Thread Robert Coli
On Sun, Mar 30, 2014 at 4:51 PM, Clint Kelly clint.ke...@gmail.com wrote:

 1. *How many total unique tokens should I expect to see in my cluster?*
 If I have three nodes, and each node has a cassandra.yaml with num_tokens =
 256, then should I expect a total of 256*3 = 768 distinct vnodes?


Yes. Generally, vnodes are just like nodes, except there are more of them
one of them per physical node.


 2. *How does the creation of vnodes and their assignment to nodes relate
 to the replication factor for a given keyspace?*


The same way that it would if you created the same number of nodes with a
rack-unaware (simple) snitch. If you have racks configured, it does the
rack thing with vnodes... which is less clear than in the CASSANDRA-3810
non-vnodes rack-aware no-op case, but logically the same.

 3. What do the values in the token column in system.peers and
 system.local refer to then?

Node primary range ownership. Each node, v or not, has one and exactly one
token. The space between its token and the next token is the primary
range it is responsible for.


- 4. Is there any other way, without using Thift, to get as much
information as possible about what nodes contain replicas of data for all
of the token ranges in a given cluster.

 I don't know the CQL answer to this, but for JMX there is
getNaturalEndpoints.

=Rob


Re: Read performance in map data type

2014-03-31 Thread Robert Coli
On Fri, Mar 28, 2014 at 7:41 PM, Apoorva Gaurav
apoorva.gau...@myntra.comwrote:

 Yes primary key is (studentID, subjectID). I had dropped the test table,
 recreating and populating it post which will share the cfhistogram. In such
 case is there any practical limit on the rows I should fetch, for e.g.
 should I do


Until this bug is fixed upstream, dropping and recreating a table may
create unexpected behavior.

https://issues.apache.org/jira/browse/CASSANDRA-5202

=Rob


Re: cassandra 2.0.6 refuses to start

2014-03-31 Thread Michael Shuler

Is SELinux enabled?


Re: Opscenter help?

2014-03-31 Thread Drew from Zhrodague
	I have to reply to myself, since there were no helpful responses. 
Perhaps these notes can help other people from pulling their hair out.


	When installing the datastax-agent on EL6, it is required to do a bunch 
of things to correct a bunch of stuff.


	[] yum install mx4j log4j datastax-agent - do not install 
opscenter-agent, this is a stale/old package it seems. Ignore log4j 
errors on starting the agent. You may have to download the 
datastax-agent and force an install via rpm - it will conflict with 
sudo, because of the existance of the dir, /etc/sudo.d/
	[] You may have to fix the UID/GIDs. Older packages set the cassandra 
UID/GID both to 217. Installing the agent conveniently mutilated them 
for us. I had to change the datastax-agent UID and GID back to 215, 
cassnandra UID/GID back to 217, and re-chown these directories:

() /var/log/cassandra
() /var/lib/cassandra
() /var/run/cassandra
() /var/log/datastax-agent
() /var/lib/datastax-agent
() /var/run/datastax-agent
	[] If you can't get datastax-agent to install via opscenter 
automatically, you'll need to configure it manually:

() yum install datastax-agent
() mkdir -p /var/lib/datastax-agent/conf/
		() echo stomp_interface: 10.113.143.189  
/var/lib/datastax-agent/conf/address.yaml

[] restart the datastax-agent, and it should now work.


Cassandra Upgrading notes:

	[] We upgraded packages from apache-cassandra11 to cassandra12 - the 
new RPM is not configured as a replacement for the old one.

[] Check your directory permissions as above, and in your data-storage 
dir.
	[] If you use a symlink or mount point to store your actual data on, it 
will be overwritten with an empty directory. Make sure to fix this, or 
you'll be in for a surprise.
	[] Make sure to install Sun Java - we use alternatives --config java, 
but have to do this to set the default JAVA_HOME: echo export 
JAVA_HOME=/usr/java/jdk1.7.0_51  /etc/profile.d/other-java.sh



	I wasn't able to properly tag my questions on StackExchange, which 
required me to have a reputation value above 300. I dunno why it brought 
up that red box stating that I needed to have over 300 points to add a 
new tag, those tags seem to be in there today.



On 3/12/14, 2:51 PM, Drew from Zhrodague wrote:

 I am having a hard time installing the Datastax Opscenter agents on
EL6 and EL5 hosts. Where is an appropriate place to ask for help?
Datastax has move their forums to Stack Exchange, which seems to be a
waste of time, as I don't have enough reputation points to properly tag
my questions.

 The agent installation seems to be broken:
 [] agent rpm conflicts with sudo
 [] install from opscenter does not work, even if manually
installing the rpm (requres --force, conflicts with sudo)
 [] error message re: log4j #noconf
 [] Could not find the main class: opsagent.opsagent. Program will
exit.
 [] No other (helpful/more in-depth) documentation exists





--

Drew from Zhrodague
post-apocalyptic ad-hoc industrialist
d...@zhrodague.net


Re: Row cache for writes

2014-03-31 Thread Ashok Ghosh
On Mar 31, 2014 12:38 PM, Wayne Schroeder wschroe...@pinsightmedia.com
wrote:

 I found a lot of documentation about the read path for key and row caches,
 but I haven't found anything in regard to the write path.  My app has the
 need to record a large quantity of very short lived temporal data that will
 expire within seconds and only have a small percentage of the rows accessed
 before they expire.  Ideally, and I have done the math, I would like the
 data to never hit disk and just stay in memory once written until it
 expires.  How might I accomplish this?  I am not concerned about data
 consistency at all on this so if I could even avoid the commit log, that
 would be even better.

 My main concern is that I don't see any evidence that writes end up in the
 cache--that it takes at least one read to get it into the cache.  I also
 realize that, assuming I don't cause SSTable writes due to sheer quantity,
 that the data would be in memory anyway.

 Has anyone done anything similar to this that could provide direction?

 Wayne




Help collecting Cassandra examples

2014-03-31 Thread James Horey
Hello all,

I’m trying to collect and organize Cassandra applications for educational 
purposes. I’m hoping that by collating these applications in a single place, 
new users will be able to get up to speed a bit easier. If you know of a great 
application (should be open-source and preferably up to date), please shoot me 
an email or send a pull request using the GitHub page below. 

https://github.com/opencore/cassandra-examples

Thanks!
James

Re: Read performance in map data type

2014-03-31 Thread Apoorva Gaurav
Thanks Robert, Is there a workaround, as in our test setups we keep
dropping and recreating tables.


On Mon, Mar 31, 2014 at 11:51 PM, Robert Coli rc...@eventbrite.com wrote:

 On Fri, Mar 28, 2014 at 7:41 PM, Apoorva Gaurav apoorva.gau...@myntra.com
  wrote:

 Yes primary key is (studentID, subjectID). I had dropped the test table,
 recreating and populating it post which will share the cfhistogram. In such
 case is there any practical limit on the rows I should fetch, for e.g.
 should I do


 Until this bug is fixed upstream, dropping and recreating a table may
 create unexpected behavior.

 https://issues.apache.org/jira/browse/CASSANDRA-5202

 =Rob





-- 
Thanks  Regards,
Apoorva